Full Detail · Traceable Logic

Review of Data Players in an Era of AI

Financials: FMP (SEC-sourced), Jun 9 2026 · Estimates flagged per row · Not investment advice

How to read it. The two left columns carry the logic — the nature of the data bullets explain the moat (scarcity, provenance, why it can't be scraped); the AI-unlock bullets track concrete evidence of monetization (licensing deals, MCP/API exposure, grounded products) versus where it's still just potential. The tier under each is the 1–10 score. The three financial columns then ground the discrepancy: green EV/Sales = cheap for the profile, red = rich.

The full table

◆ owner (corpus is the asset) · ○ operator (data is an input). Scroll sideways — the Company column stays pinned.

Company · IR Data Financials (FMP, Jun 9 2026) Assessment
Nature of the data · tier Data trajectory (stock vs flow) Position on the AI-unlock curve · tier Current AI contracts & counterparties Possibilities for additional contracts AI risks — what stands to lose Market cap EV / Sales YoY rev growth Valuation & discrepancy Convexity & why Other endogenous concerns Hype factor (market awareness) Catalysts
Financial-market data
CME GroupCME○ operator
IR / presentations ↗
Data 6 · Neutral
  • Derivatives pricing & trade data
  • High-margin byproduct of the exchange
Growing
  • Derivatives data flow grows with record volumes
AI 5 · Neutral
  • Sells valuable data, but it's not the thesis
~ desk note
  • Sells market data conventionally
  • Derivatives data into quant/agent stacks
  • Minimal — clearing/execution moat unaffected
~$93B
~ EV est
~15x +6% Disc 3 · Low
  • Premium, well-understood
  • Owner-ish, but data isn't the re-rate
Low
  • Priced, data not the driver
  • Volume cyclicality; FMX (BGC) attacking rates franchise
Low
Not a data-AI story
  • Volume cycles; data pricing
FactSetFDS◆ owner
IR / presentations ↗
Data 6 · Neutral
  • Entity-linked financial data: fundamentals, estimates, ownership, transcripts
  • 'Symbology' deep ticker-linking is the connective tissue agents need
  • But much content is aggregated/licensed, not owned — caps the moat
  • Workflow terminals for buy/sell-side
Steady flow
  • Coverage expands steadily; much content aggregated, not originated
AI 7 · High
  • Conversational FactSet Mercury shipped; 48/50 top clients on AI tools
  • Clean, entity-linked data is ideal RAG fuel for finance copilots
  • Up-ish the curve
  • Aggregated data limits licensing leverage
~ desk note
  • FactSet Mercury + transcript AI; aggregated content limits licensing
  • Symbology/entity-linking as agent infrastructure
  • The terminal seat is the product — agents directly substitute analyst workflows
  • Aggregated (non-owned) content gives least pricing defense
~$9.0B
✓ FMP
~4.3x +5% Disc 6 · Neutral
  • ~4.3x EV/Sales on +5% growth — quality at a modest multiple
  • Aggregated (non-owned) data caps the moat
  • Modest favorable gap on metrics
Moderate
  • Quality franchise at a modest multiple — some re-rate optionality
  • Aggregated content caps the upside
  • Balanced
  • Content-licensing input costs (incl. CUSIP) squeeze margins
  • CEO transition; retention metrics softening
Med — as threat
De-rated with the info-services group in Feb 2026
  • Retention metrics; Mercury adoption
Intercontinental Exch.ICE◆ owner
IR / presentations ↗
Data 8 · High
  • Dominant US mortgage data (Black Knight/Ellie Mae) — origination/servicing graph
  • Pricing & fixed-income reference data
  • Hard-to-replicate corpus inside an 'exchange' wrapper
Cyclical flow
  • Mortgage data flows with origination cycle; pricing data steady
AI 6 · Neutral
  • Steadily productizing pricing/reference data
  • Mortgage data graph is AI-relevant
  • Mid on the curve
~ desk note
  • In-product mortgage-AI; data feeds sold conventionally
  • Mortgage-graph grounding for housing/credit agents
  • Minimal — transaction infrastructure; some data products commoditized
~$80B
~ EV est
~10x +6% Disc 4 · Neutral
  • A real owner screens miss (files as an exchange)
  • Mostly priced
Low
  • Quality priced
  • Limited asymmetry
  • Mortgage tech is deeply cyclical — bought at the top
  • Black Knight deal debt still being digested
Low
Read as an exchange, never as a data-AI play
  • Mortgage cycle; IMB platform wins
Moody'sMCO◆ owner
IR / presentations ↗
Data 9 · High
  • Credit ratings (MIS) + Moody's Analytics
  • Orbis: largest private-company database (~500M entities)
  • Default histories + ownership graph — decision-grade
  • Essential grounding for credit agents, KYC, supply-chain AI
Growing
  • Orbis entity graph keeps expanding (~500M+ entities)
  • Ratings/transcript flow continuous; issuance cyclical
AI 9 · High
  • Early OpenAI partnership; Research Assistant copilot
  • MCP distribution into Claude/ChatGPT/Copilot
  • Packaging data for agentic workflows — furthest on distribution
  • High — arguably best-executed, hence richly priced
✓ deep dive
  • Early OpenAI partnership; Research Assistant copilot
  • MCP distribution into Claude/ChatGPT/Copilot
  • No raw licensing — productized access only
  • Agentic KYC/credit-memo workflows priced per seat
  • Orbis private-company graph as agent grounding
  • Analytics research/tools face AI commoditization; ratings are regulatorily protected
  • KYC/compliance products meet AI-native challengers
~$79B
✓ FMP
~11x +9% Disc 2 · Low
  • Best business + furthest-along AI
  • ~11x sales / ~40x earnings to match
  • Thinnest discount; DCFs flag it rich
Low
  • Best business, thinnest discount, DCF flags it rich
  • Limited upside → low convexity
  • Ratings revenue rides the debt-issuance cycle
  • Duopoly position invites periodic antitrust/regulatory attention
High
Best-executed AI strategy is consensus; it's in the ~11x
  • Agentic product attach rates
  • Ratings issuance cycle
  • Orbis monetization moves
MorningstarMORN◆ owner
IR / presentations ↗
Data 7 · High
  • Fund/ETF data, star & analyst ratings; DBRS credit ratings
  • PitchBook private-markets/VC dataset is the scarce crown jewel
  • Fund data feeds advisor copilots
Growing
  • PitchBook's private-company universe compounds with VC/PE activity
AI 5 · Neutral
  • Mo chatbot + PitchBook AI features
  • Monetization mostly stays in-product
  • Mid on the curve
~ desk note
  • Mo assistant; PitchBook AI features; in-product only
  • PitchBook private-market data licensing to AI deal tools
  • Fund research commoditized by AI summarization; ratings brand defensible
  • PitchBook data scraping/inference by AI tools
~$7.0B
✓ FMP
~3.4x +8% Disc 6 · Neutral
  • ~$7.0B cap, ~3.4x EV/Sales on +8% growth
  • Cheap for a PitchBook-owning franchise
  • Market under-paying for the private-markets data
Moderate–High
  • PitchBook AI-deal-sourcing optionality, cheaply priced
  • No hard catalyst
  • Cheap enough to tilt positive
  • Founder (Mansueto) voting control
  • PitchBook decelerated with the VC downturn; DBRS is issuance-cyclical
Low
PitchBook's AI value ~absent from the narrative
  • PitchBook growth; advisor-AI launches
MSCIMSCI◆ owner
IR / presentations ↗
Data 8 · High
  • Indices (World, EM) portfolios are built and measured against
  • ESG/climate ratings, Barra factor/risk models, Burgiss private-asset data
  • Benchmarks + factor models are chokepoints
  • Index licensing is a recurring toll-road
Growing
  • Index/factor data grows with markets; private-asset (Burgiss) expanding fast
AI 7 · High
  • IndexAI connector; 'train clients' LLMs' roadmap
  • Solid enterprise APIs
  • Less aggressive than S&P/Moody's
  • Mid/high — capable but measured
~ desk note
  • IndexAI connector; 'train clients' LLMs' roadmap — no licensing $ disclosed
  • Benchmark/factor licensing to agent platforms
  • ESG/analytics tools commoditized by AI; index licensing protected
~$44B
✓ FMP
~16x +10% Disc 3 · Low
  • ~14x sales — one of the richest here
  • Priced as the premium compounder it is
  • No discount to the quality
Low
  • Richest multiple here
  • Least convex — priced for the quality
  • Client concentration in fee-pressured asset managers
  • US political backlash against ESG products
Med
AI seen as feature, not thesis
  • Index flows; ESG/private-asset data attach
NasdaqNDAQ◆ owner
IR / presentations ↗
Data 7 · High
  • 100+ proprietary market-data feeds
  • Index & analytics data products
  • A licensing toll-road like S&P benchmarks
Growing
  • Market data grows with volumes; Verafin fraud signals compound
AI 6 · Neutral
  • Feeds quant/agent workflows
  • Productized data
  • Up the curve on data productization
~ desk note
  • Verafin AI (fraud), market-data feeds; no LLM licensing line
  • Surveillance/fraud agents; index licensing
  • Minimal core risk; market-data products face some AI substitution
~$49B
~ EV est
~12x +8% Disc 3 · Low
  • Premium valuation reflects the toll-road
  • Quality owner, little discount
Low
  • Priced toll-road
  • Limited asymmetry
  • Adenza acquisition debt + integration
  • Crypto-listings exposure adds volatility
Low
AI in products, not in the multiple
  • Fin-crime AI growth; data ARR
S&P GlobalSPGI◆ owner
IR / presentations ↗
Data 9 · High
  • Credit ratings, Capital IQ fundamentals/transcripts, Platts benchmarks
  • S&P Dow Jones Indices + Mobility (CARFAX)
  • Benchmarks are licensing toll-roads AI can't route around
  • The grounding layer any financial LLM/agent needs
Growing
  • Daily benchmark prints, transcripts, fundamentals — relentless flow
  • CARFAX events + Mobility add new streams
AI 9 · High
  • Kensho LLM-ready API live since Nov 2024; 300+ customers
  • Anthropic MCP connector + Claude Cowork plugin (Feb 2026)
  • Cohere North partnership (Jun 8, 2026) — sovereign/regulated AI
  • Distribution into Claude, ChatGPT, Gemini, Copilot
  • The most aggressive everywhere-the-agents-are strategy
✓ deep dive
  • Kensho LLM-ready API (Nov 2024), 300+ customers (launch)
  • Claude Cowork plugin + Anthropic MCP (Kensho)
  • Cohere North partnership, Jun 8 2026 (PR)
  • Per-seat / usage pricing for agentic data access
  • Benchmark licensing to agent platforms (toll-road extension)
  • Private-markets data into AI workflows
  • Capital IQ desktop seats at risk as agents answer directly (why it sells the data INTO agents)
  • Ratings & indices largely insulated
~$126B
✓ FMP
~8.8x +8% Disc 3 · Low
  • ~8.8x EV/Sales on +8% growth — premium largely intact
  • Top-tier AI execution already recognized in the multiple
  • Quality fully priced; no metric discrepancy
Low
  • Quality + best-in-class AI execution already in the multiple
  • Limited discrepancy on metrics
  • Modest two-way payoff
  • IHS Markit integration legacy; Mobility (CARFAX) is auto-cyclical
  • Index fee compression a slow structural drag
Med-High
AI execution is consensus among analysts; the multiple carries only a modest sector AI-threat discount
  • AI-access revenue disclosure (none yet)
  • More agent-platform embeds
  • Ratings cycle + index flows
Professional-information data (legal · tax · IT advisory)
GartnerIT◆ owner
IR / presentations ↗
Data 8 · High
  • 45+ yrs of proprietary syndicated IT/business research from ~2,000 analysts
  • Magic Quadrants & Hype Cycles are de-facto standards CIOs buy on
  • Price, salary & contract benchmarks from thousands of engagements
  • Behind a hard paywall — not on the open web, not freely scrapeable
  • >75% of contract value multi-year recurring, embedded in workflows
Steady — watch the flow
  • Analyst output paced by headcount; inquiry/benchmark data grows with clients
  • CV slowdown = the inflow risk: fewer clients → less peer data
AI 5 · Neutral
  • Two-sided: AI could commoditize 'advice' or make its data the grounding layer
  • Rolling out AskGartner inside client licenses
  • Has NOT licensed its corpus to labs — keeps it walled
  • Contract-value growth slowed to ~1–5% — the market's disruption tell
  • Early on the curve; data-as-grounding thesis unproven
✓ deep dive
  • None — AskGartner ships inside existing client licenses
  • AskGartner live across research portal (example)
  • Corpus-grounded agent for enterprises (license upsell)
  • Selective API access to benchmarks/peer data
  • Price/SLA tiers for AI-assisted research
  • The core product IS advice — generalist AI is a direct substitute
  • Seat-based research licenses are the exposed surface
  • Conferences/consulting more defensible
~$10.5B
✓ FMP
~2.1x +4% Disc 8 · High
  • ~2.1x EV/Sales for a 77%-gross-margin, mostly-recurring franchise
  • The multiple embeds a full AI-disruption outcome; CV growth ~1–5% is the operational tell
  • Cheapest quality owner on the board on metrics
High · quality-convex
  • Profitable recurring base at ~2x sales bounds the downside
  • Large upside if AI proves additive to the franchise
  • Cheap quality + two-sided AI = positive convexity
  • Conference/consulting segments are macro-cyclical
  • EPS growth leans on buybacks; sales-force productivity in question
High — as threat
Narrative casts Gartner as an AI casualty; AskGartner and the paywalled corpus get little credit
  • Contract-value growth stabilization (the single tell)
  • AskGartner engagement disclosures
  • Buyback pace
Thomson ReutersTRI◆ owner
IR / presentations ↗
Data 9 · High
  • Westlaw: case law, statutes, annotations built over a century
  • Editorial headnotes/KeyCite are irreplicable human layers
  • Practical Law, Checkpoint (tax), Reuters News
  • Legal/tax = highest-value, lowest-hallucination-tolerance use cases
Steady compounding
  • Case law grows with the courts — slow, perpetual accretion
  • Editorial annotations (headnotes/KeyCite) compound on top
AI 8 · High
  • CoCounsel scaling fast — ~1M AI users
  • AI-native Westlaw does grounded retrieval over its corpus
  • Monetizes the data itself
  • High — clear legal-AI leader
✓ deep dive
  • No corpus licensing — deliberate walled strategy
  • CoCounsel: 1M professionals, 107 countries (Feb 2026) (PR)
  • Building proprietary LLM for regulated use cases
  • Selective agent-platform access to Westlaw (MCP-style)
  • CoCounsel 10x user target = the in-product unlock
  • Tax/audit agentic suites later in 2026
  • Legal research workflow is the AI battleground — Harvey, Legora, generalist agents
  • Westlaw seat pricing under pressure if agents do the research
  • Reuters news commoditized by AI summarization
~$78B*
✓ FMP
~9.0x +7% Disc 3 · Low
  • ~9x EV/Sales on +7% growth — a modest AI-threat discount against its quality
  • CoCounsel at 1M users is distribution the multiple under-credits
  • Premium franchise; the discount is partial, not deep
Low
  • Priced quality; AI leadership reflected
  • Limited convexity
  • Woodbridge (Thomson family) controls ~70% — governance is theirs
  • Print/legacy declines largely done; tax season concentration
High — as threat
Market narrative treats agentic AI as a threat to legal-research seats; CoCounsel distribution under-credited
  • CoCounsel next-gen GA + adoption metrics
  • ACV growth reacceleration (the proof point)
  • Competitive data vs Harvey/Legora/Claude Cowork
Credit · identity · risk data
EquifaxEFX◆ owner
IR / presentations ↗
Data 8 · High
  • The Work Number — unique employer-sourced income/employment records
  • Verified income/employment ground-truth no LLM can infer
  • Utility/telecom payment data extends the picture
  • Gating data for lending, hiring, benefits
  • Contributory — employers feed it (network effects)
Compounding
  • The Work Number records keep growing via payroll integrations
  • Every paycheck is a new record — true flow asset
AI 6 · Neutral
  • EFX.AI built into new product models
  • FCRA permissible-purpose rules cap AI exposure
  • Monetization stays inside regulated rails
  • Mid — gated by regulation, not capability
  • Re-rate is cyclical more than AI-driven
~ desk note
  • EFX.AI in-product; FCRA limits external exposure
  • Verified-income rails for lending/hiring agents (permissioned)
  • AI cash-flow underwriting could route around bureau scores at the margin
  • AI-driven synthetic-identity fraud raises cost of trust
~$20B
✓ FMP
~4.0x +7% Disc 7 · High
  • ~$20B cap, ~4.0x EV/Sales on +7% growth
  • Cheap for the owner of The Work Number
  • Re-rates on the lending/hiring cycle + verified-income AI demand
High
  • Unique income/employment data at a low multiple
  • FCRA caps direct licensing, but the asset is irreplaceable
  • Cheap + cyclical-recovery optionality = convex
  • 2017 breach legacy = elevated security/regulatory burden
  • Mortgage + hiring volumes are the real earnings driver near-term
  • CFPB / FCRA scrutiny is permanent
Low
AI angle absent; mortgage cycle dominates the narrative
  • Mortgage/hiring recovery; TWN records growth; any agent-rail pilots
ExperianEXPN.L◆ owner
IR / presentations ↗
Data 8 · High
  • Third global credit bureau + marketing/identity/fraud data
  • Best organic growth of the three bureaus
  • Verified credit/identity data with network effects
Growing
  • Same bureau flow; strongest organic data investment of the three
AI 6 · Neutral
  • AI products across credit & fraud
  • FCRA-style rules cap ecosystem exposure
  • Mid on the curve
~ desk note
  • Ascend platform AI; in-product
  • Same permissioned-rails option as EFX/TRU
  • Same as the other bureaus; strongest product diversification of the three
~$45B*
~ EV est
~6.5x +7% Disc 6 · Neutral
  • Reasonable bureau multiple
  • Only friction is access (London listing)
  • Quality peer to EFX/TRU
Moderate
  • Quality + reasonable price
  • Regulation caps the convex upside
  • Balanced
  • UK listing discount; Brazil FX exposure
Low
UK listing keeps it out of the AI conversation
  • Cycle; NA mortgage volumes
FICOFICO◆ owner
IR / presentations ↗
Data 6 · Neutral
  • The FICO score — decisioning standard embedded in US credit
  • More algorithm/standard than raw corpus
  • But the score is a data product with monopoly economics
Derived flow
  • Scores recompute on bureau flow; FICO originates little raw data
AI 6 · Neutral
  • Own FFM foundation model
  • AI lending agents still need an accepted standard
  • Mortgage-pricing change is a catalyst
  • Not a corpus play
~ desk note
  • FICO Foundation Model (FFM) announced; platform AI
  • Score-as-API inside lending agents
  • The central AI risk case: AI-native underwriting bypassing the Score
  • Lenders' in-house models + FHFA score competition (VantageScore 4.0)
~$28B
✓ FMP
~14x +15% Disc 5 · Neutral
  • ~14x EV/Sales on +15% growth — still premium on metrics
  • Moat contested (VantageScore push, AI underwriting)
  • Two-sided
Moderate
  • De-rated standard with mortgage-pricing optionality
  • But expensive on sales (~14x)
  • Two-sided
  • Pricing-power backlash: FHFA pushing VantageScore competition in mortgages
  • Revenue concentrated in B2B scores; software segment unloved
Med
Debate is pricing power, not AI
  • Mortgage-score pricing; platform ARR
LiveRampRAMP◆ owner
IR / presentations ↗
Data 7 · High
  • Identity graph & data-collaboration network (25k+ publishers)
  • Clean-room identity for the post-cookie/AI-data era
Maintained
  • Identity graph is refresh-maintenance, not accumulation
AI 6 · Neutral
  • Well-placed for AI-data era
  • But the story is now M&A
~ desk note
  • Identity/clean-room infra relevant to AI data flows
  • Acquisition pending — risk transfers to Publicis
~$2.3B
~ EV est
~3.0x +10% Disc 2 · Low
  • Being acquired ~$2.5B by Publicis
  • Off the board as a standalone bet
  • Signal: ad-holdcos paying up for identity data
Low
  • Taken out — payoff capped by the deal price
  • Deal-close risk is the only variable left (~$38.50 cash)
Low
Story is now the Publicis acquisition
  • Deal close (~$38.50)
TransUnionTRU◆ owner
IR / presentations ↗
Data 7 · High
  • Credit bureau + identity resolution (Neustar)
  • Links offline identity to digital identifiers
  • Identity graphs matter more as AI agents transact
  • Contributory bureau data with network effects
Growing
  • Credit + identity events flow with economic activity
AI 5 · Neutral
  • OneTru platform, TruIQ agents
  • Identity products quietly AI-relevant
  • FCRA-capped exposure like Equifax
  • Mid on the curve
~ desk note
  • OneTru platform, TruIQ agents; in-product
  • Identity verification for AI-agent transactions
  • Same bypass risk as EFX; identity products partly hedge it
~$13.5B
✓ FMP
~3.9x +8% Disc 6 · Neutral
  • Cheapest of the three bureaus
  • Modest favorable gap
  • Same regulatory ceiling
Moderate
  • Cheapest bureau + cycle/identity optionality
  • FCRA caps the convex upside
  • Balanced
  • Neustar deal leverage; UK consumer business weak
  • Same CFPB overhang
Low
Same as EFX — cycle story, not AI story
  • Cycle turn; Neustar identity products
VeriskVRSK◆ owner
IR / presentations ↗
Data 9 · High
  • Decades of contributory claims, loss & property/peril data
  • Nearly all US P&C insurers both feed and buy it back
  • Catastrophe models built on the loss history
  • Near-monopoly; no AI lab can rebuild it
Steady compounding
  • Contributory model: every insurer claim feeds it, by contract
  • Cat-event data grows with each season
AI 5 · Neutral
  • Generative/agentic AI in underwriting/claims products
  • Consortium-locked — not licensed to the open ecosystem
  • Value unlock in-product, not via licensing
  • Mid — deepest moat, deliberately walled
~ desk note
  • Consortium AI in underwriting/claims products
  • Walled option: claims-history grounding for insurance agents
  • Insurers building AI on their own claims data could weaken the consortium pull
~$24B
✓ FMP
~9.0x +7% Disc 5 · Neutral
  • Deep moat, but ~12x sales / ~7% growth
  • Fully paid for
  • Quality High, value Low-ish
Low–Moderate
  • Near-monopoly data, but premium & walled
  • Bounded downside, limited upside
  • Low asymmetry
  • Consortium members push back on pricing; class actions over contributory data use
Low-Med
Quality priced; AI not separately valued
  • Product attach; pricing renewals
Healthcare · life-sciences data
CencoraCOR○ operator
IR / presentations ↗
Data 5 · Neutral
  • Pharmacy/dispensing & distribution data
  • Optimizes thin-margin logistics
Steady flow
  • Distribution data tracks volumes
AI 3 · Low
  • Logistics input, not sold
~ desk note
  • Logistics AI internal
  • Low — physical distribution
~$54B
~ EV est
~0.1x +10% Disc 4 · Neutral
  • Fair defensive distributor
  • Data-rich, not a data owner
Low
  • Defensive, data not a driver
  • Drug-pricing policy; thin-margin model
Low
Not an AI story
  • Distribution volumes
Definitive Health.DH◆ owner
IR / presentations ↗
Data 7 · High
  • Healthcare commercial intel: providers, claims, affiliations, install-base
  • 'The ZoomInfo of healthcare' — sells intelligence to life-sciences/med-tech
  • A pure data owner, not a marketplace
  • Continuously refreshed healthcare-entity graph
Slowing
  • Refresh continues but shrinking revenue funds less data collection
AI 6 · Neutral
  • Real owner, but AI is as much threat as tailwind
  • Limited AI productization so far
  • Mid/behind — business being repriced
  • Erosion risk from AI-generated provider signal
~ desk note
  • None disclosed
  • Healthcare-commercial grounding data for pharma AI
  • AI-generated provider intelligence directly substitutes the core product — erosion already visible
~$0.1B
~ EV est
~2.0x −8% Disc 6 · Neutral
  • ~$0.1B cap, ~2x EV/Sales on declining revenue
  • Distressed micro-cap; the data is better than the equity
  • Cheap for existential reasons
High · distressed
  • Distressed micro-cap → option on stabilization or M&A
  • Declining revenue is the live left tail
  • Cheap healthcare-commercial data if it survives
  • PE overhang (Advent), serial goodwill writedowns, micro-cap liquidity
Low
Micro-cap; no AI narrative attaches
  • Revenue stabilization; strategic review odds
DoximityDOCS◆ owner*
IR / presentations ↗
Data 6 · Neutral
  • Verified network of most US physicians
  • The asset is the audience/engagement, not a corpus
  • Workflow tools for doctors
Saturated graph
  • Most US physicians already on it — the graph is mature
  • Engagement/newsfeed data still grows; the asset is breadth, not flow
AI 6 · Neutral
  • Strong AI tools (Doximity GPT), huge engagement
  • But no AI revenue in guidance
  • Data asset is the audience, not a corpus
~ desk note
  • Doximity GPT free for physicians; ad AI in-product
  • Clinician-verified channel for healthcare AI distribution
  • Physician attention shifting to AI clinical tools (OpenEvidence et al.)
  • Pharma ad budgets could follow attention into AI channels
~$3.8B
✓ FMP
~5.6x +13% Disc 5 · Neutral
  • ~$3.8B cap, ~5.6x EV/Sales on +13% growth
  • Far from the ~18x I'd assumed — reasonable now
  • Verified clinician graph; audience-not-corpus caps licensing
Moderate
  • Verified clinician graph + AI tools, now at a fair multiple
  • Audience-not-corpus caps the data-licensing upside
  • Balanced after the de-rate
  • Pharma ad-budget concentration; engagement metrics are the whole story
Med
Was priced for AI hopes; now reset to fair
  • Ad market; AI tool engagement
ElevanceELV○ operator
IR / presentations ↗
Data 6 · Neutral
  • Claims/care-management data via Carelon
  • Latent separable data asset
  • Used to lower its own medical costs
Steady flow
  • Claims flow with membership; flat membership = flat flow
AI 4 · Neutral
  • AI care-management lowers internal costs
  • Closest operator to a separable data asset
  • Still not pure-play
~ desk note
  • Carelon internal AI
  • Separable claims-data asset (never signaled)
  • Low direct risk; AI mostly a cost lever
~$92B
~ EV est
~0.4x +5% Disc 5 · Neutral
  • Cheap, but on insurer fundamentals
  • Latent data optionality (Carelon)
  • Cyclical
Moderate
  • De-rated insurer with latent data optionality
  • Cyclical, not a data re-rate
  • Mildly positive
  • Medical-cost trend + Medicaid redeterminations; ACA subsidy politics
Low
Insurer story
  • Medical-cost trend; Carelon growth
GoodRxGDRX○ operator
IR / presentations ↗
Data 5 · Neutral
  • Rx-pricing & consumer prescription-behavior data
  • Unique data, but an input to a discount platform
  • Platform under structural pressure
Steady flow
  • Pricing data flows; nothing accumulating in value
AI 4 · Neutral
  • Data feeds the platform; not licensed as a corpus
  • Limited AI productization
~ desk note
  • None disclosed
  • Rx-pricing data into consumer-health agents
  • AI agents compare drug prices directly, disintermediating the front end
~$0.9B
~ EV est
~1.3x ~flat Disc 4 · Neutral
  • Cheap, but pressured core
  • Marginal owner with hard-to-monetize data
Moderate · binary
  • Cheap with stabilization optionality
  • But structural pressure on the core
  • Binary-ish
  • PBM dependence — a single partner change (Kroger '22) cratered it once
Low
No AI narrative
  • Platform stabilization
Guardant HealthGH◆ owner
IR / presentations ↗
Data 9 · High
  • Liquid-biopsy genomic + clinical-outcomes data in oncology
  • Proprietary, scarce — a direct Tempus peer
  • Longitudinal molecular profiles track tumor evolution
  • Cannot be assembled from public sources
Compounding fast
  • Test volumes +25–35%/yr; each test extends longitudinal profiles
AI 6 · Neutral
  • Pharma data partnerships + co-development, earlier-stage
  • Smart Platform multiomic insights
  • Building the 'co-develop on our data' motion
  • Mid — monetization layer still forming
~ desk note
  • Pharma data partnerships (earlier-stage than Tempus); Smart Platform
  • Tempus-style co-builds on liquid-biopsy data
  • Interpretation commoditizes; raw assay + outcomes data is the defensible part
~$17B
✓ FMP
~17x +33% Disc 5 · Neutral
  • Scarce data, but ~12x sales and unprofitable
  • Analyst upside exists
  • Expensive growth, not cheap
Moderate
  • Scarce-data optionality, but ~12x sales + unprofitable cap it
  • More a growth bet than an option
  • Balanced, positive tilt
  • Cash burn continues; patent litigation history with Natera
  • Screening (Shield) economics still unproven at scale
Med
Priced as diagnostics growth; data angle secondary
  • MRD reimbursement; pharma deal announcements
IQVIAIQV◆ owner
IR / presentations ↗
Data 9 · High
  • World's largest pharmacy-claims & prescription dataset (ex-IMS Health)
  • Population-scale real-world evidence across global Rx
  • Clinical-trial operational data as the largest CRO (ex-Quintiles)
  • De-identified, compliance-grade — built under HIPAA/GDPR, unscrapeable
  • Sold to virtually every major pharma
Compounding
  • Rx/claims flow is continuous and population-scale
  • Trial operational data compounds with every study run
AI 7 · High
  • IQVIA.ai unified agentic platform (Mar 2026): 150+ agents deployed
  • NVIDIA partnership since Jan 2025 — custom foundation models on its data
  • 19 of top 20 pharma already using IQVIA agents; 100+ AI patents
  • Builds agents ON the data rather than licensing it out
  • No longer latent — monetization architecture is live
✓ deep dive
  • NVIDIA partnership (Jan 2025) → IQVIA.ai platform, Mar 2026 (PR)
  • 150+ agents live; 19 of top-20 pharma using them (report)
  • 100+ AI patents; agents built ON proprietary data, not licensed out
  • Agent subscriptions as a separate revenue line
  • RWE feeds for medical LLMs (compliance-wrapped)
  • Trial-design agents priced on outcomes
  • CRO services half is labor-heavy — AI compresses what pharma will pay for it
  • Pharma in-housing analytics with AI tools
~$31B
✓ FMP
~2.7x +6% Disc 7 · High
  • $16.3B FY25 rev, +5.9% (~7% TTM)
  • Low-single-digit sales multiple for unique data
  • ~$13B net debt is the caveat
High
  • ~2.7x EV/Sales on +6% growth — cheap for the scarcest Rx data
  • Locked in compliance contracts; low AI surface today
  • Cheap + latent-unlock optionality = convex
  • ~$13B net debt limits flexibility
  • CRO bookings cyclical; pharma R&D budgets squeezed (IRA effects)
Low → rising
Cheapest scarce-data name; IQVIA.ai barely registers in the multiple yet
  • Next earnings: ~late July 2026 (Q1 reported May 5 — beat; EPS guide raised)
  • IQVIA.ai adoption: now 192 agents / 64 use cases; watch for monetization disclosure (Q1 call)
  • R&DS backlog $32.7B (+5.3%); Q4 book-to-bill 1.18x — bookings reacceleration is the proof point
  • $1.2B buyback remaining ($552M done in Q1)
  • Duke obesity-trials collaboration (Feb 2026) — fastest-growing trial category
  • De-leveraging from 3.62x / $13.9B net debt frees the multiple
NateraNTRA◆ owner
IR / presentations ↗
Data 9 · High
  • Genetic-testing / cfDNA data (MRD, prenatal, transplant)
  • Large, fast-growing proprietary genomic dataset
  • Outcome-linked longitudinal data is the durable asset
  • Same scarce-data position as Guardant/Tempus
Compounding fast
  • Fastest test-volume growth in the group; outcome links accrue with time
AI 6 · Neutral
  • Owns the data; data-layer monetization still maturing
  • Strong clinical-validation pipeline feeds the dataset
  • Files as diagnostics, so screens miss it
  • Mid on the curve
~ desk note
  • Data feeds pharma trials; in-product AI
  • Outcome-linked genomic licensing
  • Same as GH — value migrates from interpretation to the longitudinal data
~$32B
✓ FMP
~12x +36% Disc 5 · Neutral
  • Irreplaceable data, ~12x sales on ~30% growth
  • Quality High; multiple says priced, not discounted
  • Volatile equity
Moderate
  • Data optionality vs a rich ~12x multiple
  • Roughly balanced, slight positive tilt
  • Reimbursement concentration (Medicare MRD decisions)
  • Billing-practice scrutiny; GH litigation
Med
Same — growth story, data unpriced
  • MRD adoption; new indications
Tempus AITEM◆ owner
IR / presentations ↗
Data 9 · High
  • Multimodal clinical + genomic data (~500-PB) pairing sequencing with clinical records
  • Scarcest, most valuable category for biomedical AI — unscrapeable
  • Built explicitly as an AI data company
  • 140% net revenue retention on Insights/data
  • Linked outcomes data is what makes it irreplaceable
Compounding fast
  • ~300PB and growing; every test adds linked clinical+genomic data (Q1 letter)
  • Sequencing volumes growing ~25–30%/yr — the corpus is the byproduct of revenue
AI 8 · High
  • $200M AstraZeneca/Pathos deal (Apr 2025): largest oncology foundation model
  • Total remaining contract value >$1B; non-exclusive — can resell the motion
  • Data customers: AZ, Novartis, Merck KGaA, Takeda, Boehringer, United Therap.
  • Illumina collaboration trains genomic algorithms on its multimodal data
  • Insights (data licensing) growing ~58%
✓ deep dive
  • $200M AstraZeneca/Pathos data+model deal over 3 yrs (PR)
  • Total remaining contract value >$1B (Q1 letter)
  • Data customers: Novartis, Merck KGaA, Takeda, Boehringer, United Therap.
  • Illumina algorithm-training collaboration
  • Non-exclusive foundation-model co-builds with other pharma
  • Expansion beyond oncology (cardio, neuro)
  • Open-source pathology consortium as a funnel
  • Pharma could in-house modeling after learning from co-builds
  • Interpretation layer could commoditize; the data itself is the hedge
~$8.5B
✓ FMP
~6.5x +83% Disc 6 · Neutral
  • ~$8.5B cap, ~6.5x EV/Sales on +83% growth
  • Strikingly cheap for the growth + scarcest biomedical data
  • Priced like a normal growth co, not the data monopoly it's building
High · growth optionality
  • Foundation-model + licensing optionality could make it the oncology-AI data layer
  • Rich multiple + cash burn are the downside
  • Large, real optionality = convex growth bet
  • Founder super-voting control; Pathos is Lefkofsky-affiliated (related-party optics on the $200M deal)
  • Convertible debt + only just adj-EBITDA positive
  • Short-seller scrutiny history (data-quality claims)
High
AI is in the name and the multiple — but >$1B RCV arguably still under-modeled
  • Next earnings: ~early Aug 2026 (Q1 reported May 5 — guidance raised) (Q1 8-K)
  • 2026 guide raised to $1.59–1.60B revenue / ~$65M adj EBITDA — the leverage inflection
  • MRD volume ~6,500 tests in Q1, +500% YoY — reimbursement decisions are the swing
  • TCV >$1.1B; 70+ pharma data customers — watch new (non-exclusive) co-builds
  • Insights (data licensing) +44% in Q1 — the annuity compounding
Veeva SystemsVEEV◆ owner
IR / presentations ↗
Data 7 · High
  • Life-sciences CRM + proprietary OpenData/Link (HCP & reference data)
  • Pharma depends on its reference data
  • A separable corpus inside the SaaS
Growing
  • OpenData/Link refreshed continuously; usage data grows with seats
AI 7 · High
  • AI embedded in pharma workflows
  • Up the curve
  • Vertical-SaaS leader
✓ deep dive
  • AI agents shipping across CRM/Vault (Dec 2025 wave)
  • OpenData/Link reference data feeds its own AI
  • Agent pricing on top of seats
  • Link data into pharma AI pipelines
  • Vertical-SaaS pricing under the same agentic pressure as all seats ('SaaS-pocalypse')
  • AI app-builders lower barriers to bespoke pharma tools
~$27B
✓ FMP
~7.7x +16% Disc 5 · Neutral
  • Premium SaaS multiple
  • Data is a real, under-discussed asset
  • Equity priced for quality
Low
  • Premium SaaS; data underrated but equity priced
  • Limited asymmetry
  • Salesforce→own-platform CRM migration is a multi-year execution risk
  • Core TAM maturing; growth depends on new apps
Med
Read as a quality SaaS with AI features, not a data owner
  • Agent adoption metrics
  • Vault CRM migration completion
Consumer · user-generated · marketplace data
CarvanaCVNA○ operator
IR / presentations ↗
Data 4 · Neutral
  • Transactional used-car e-commerce & trade data
  • Tunes its own pricing/inventory
Growing
  • Transaction/pricing data grows with units; internal
AI 3 · Low
  • Input, not the product
~ desk note
  • Internal pricing AI
  • Low direct AI risk
~$76B
~ EV est
~3.5x +30% Disc 3 · Low
  • Volatile, richly valued
  • Weak fit for the screen
Low
  • High beta but valued on retail growth, not data
  • Garcia family control + related-party history; leverage rebuilt the equity once already
Low
Retail story
  • Unit economics
CoStar GroupCSGP◆ owner
IR / presentations ↗
Data 8 · High
  • Verified CRE comps/property data, 35-yr research army
  • LoopNet, Apartments.com, Homes.com
  • Unscrapeable, walled inside terminals
Compounding
  • Research army keeps verifying; comps accumulate permanently
  • Zonda adds a housing-data stream
AI 3 · Low
  • Walled, litigious; data locked in terminals — minimal AI surface
  • Heavy Homes.com ad spend
  • Strategic data, low AI surface area
✓ deep dive
  • None — deliberately walled; litigious vs scrapers
  • Zonda acquisition ($800M) extends housing data
  • The big withheld option: licensed CRE grounding for real-estate AI
  • Homes.com AI search features
  • AI aggregation/scraping pressure on listings; Google entering for-sale listings (BTIG flag)
  • Verified CRE comps hardest to substitute
~$14B
✓ FMP
~4.0x +19% Disc 6 · Neutral
  • ~$14B cap, ~4.0x EV/Sales on +19% growth
  • Much cheaper than I'd shown; Homes.com spend masks margins
  • Unscrapeable CRE data at a reasonable price
Moderate–High
  • Unscrapeable CRE data, now cheap
  • Low AI surface + heavy ad spend cap near-term
  • Re-rate optionality as Homes.com spend rolls off
  • Homes.com spend is an act of will (founder-CEO); activist pressure has surfaced
  • Serial litigation posture cuts both ways
Low
AI never part of the story; the data optionality is free at ~4x
  • Homes.com spend roll-off (margin catalyst)
  • Zonda integration
  • Any posture change on data access
DuolingoDUOL◆ owner*
IR / presentations ↗
Data 6 · Neutral
  • One of the largest learning-interaction datasets (50M+ DAU)
  • Granular data on how people learn, err & retain across 100+ courses
  • Used in-product to tune pedagogy — not licensed
  • Value captured as engagement, not a sellable corpus
Compounding
  • Learning interactions scale with DAUs (50M+, growing)
  • Every exercise answered is new pedagogy data
AI 6 · Neutral
  • AI-first (Gen-AI 'Max', AI video calls)
  • Shipped 148 courses in a year via generative AI
  • Unlock shows up as engagement/ARPU, not a licensing line
  • Mid — AI deepens the product moat
✓ deep dive
  • None out; heavy OpenAI/GenAI consumer (Max, AI courses)
  • 148 AI-generated courses shipped in a year
  • Learning-data licensing (never signaled)
  • AI-tutor pricing tiers
  • ChatGPT as a free language tutor — the central substitution threat
  • Defense: gamification + structure, not content
~$5.5B
✓ FMP
~4.0x +39% Disc 6 · Neutral
  • ~4.0x EV/Sales on +39% growth — cheap on growth metrics
  • AI-disruption fear embedded in the multiple
  • A growth franchise at a non-growth multiple
Moderate–High
  • ~4x on +39% growth bounds the downside if growth holds
  • Upside if AI features lift engagement/ARPU
  • Positive convexity on metrics
  • Founder control; monetization-vs-engagement tension
  • Still expensive on earnings even after the crash
High — as threat
Narrative says ChatGPT kills language learning; the AI-first operating model is ignored
  • DAU/booking growth stabilization
  • Max attach rate
  • Energy/engagement metrics
MercadoLibreMELI○ operator
IR / presentations ↗
Data 6 · Neutral
  • LatAm marketplace purchase + fintech/credit data
  • Powers its own ads/lending (input)
Compounding
  • Purchase + credit data compounds with GMV growth
AI 3 · Low
  • AI for marketplace/credit optimization
  • Not a sold corpus
~ desk note
  • Internal AI for ads/credit
  • Low; AI mostly an internal lever
~$83B
~ EV est
~3.5x +35% Disc 3 · Low
  • Premium growth stock
  • Data doesn't re-rate it
Moderate
  • High growth, but valued on the business, not the data
  • LatAm FX/political risk; credit-book quality through cycles
Low
Not a data play
  • LatAm growth; fintech credit
NetflixNFLX○ operator
IR / presentations ↗
Data 7 · High
  • Viewing/interaction data across ~300M members
  • Real moat for recs/greenlighting
  • Strictly internal — never licensed
Growing
  • Viewing data grows with engagement; internal-only
AI 2 · Low
  • Never licensed; AI = better curation only
  • Internal-use data
~ desk note
  • Internal only — never licensed
  • GenAI lowers content-production barriers for rivals (long-term)
~$343B
~ EV est
~8.0x +14% Disc 2 · Low
  • Premium mega-cap on subscriber economics
  • n/a as a data play
Low
  • Priced mega-cap, data internal
  • Content-spend discipline vs growth; live/sports costs
Low
Recs AI assumed, not valued separately
  • Sub growth; ads tier
RedditRDDT◆ owner
IR / presentations ↗
Data 9 · High
  • ~100k+ communities, two decades of upvote-ranked human conversation
  • Largest archive of authentic opinion, troubleshooting, niche expertise
  • Exactly what LLMs lack: recommendations, lived experience, long-tail Q&A
  • Surfaces disproportionately in AI answers
  • Classified as social media, not 'data services'
Compounding
  • DAU still growing; posts/comments compound the archive daily
  • Two decades of vote-ranked history can't be replicated retroactively
AI 9 · High
  • $203M aggregate contract value disclosed at IPO (Google + OpenAI)
  • ~$130M/yr run-rate ≈ 10% of revenue; Google ~$60M/yr, OpenAI ~$70M/yr
  • #1 most-cited source across AI models (~3x Wikipedia)
  • Google renewal under negotiation — pushing usage-based pricing
  • Litigates unlicensed scrapers (incl. Perplexity suit)
✓ deep dive
  • $203M aggregate disclosed at IPO (TechCrunch)
  • Google ~$60M/yr; OpenAI ~$70M/yr ≈ 10% of revenue (SEL)
  • 2–3 yr terms struck Jan 2024 — now in renewal window
  • Google renewal at usage-based rates (mgmt: 'open for business')
  • Anthropic / Meta / xAI remain unlicensed
  • Dynamic per-citation pricing models
  • Int'l + vertical (commerce intent) licensing
  • Google AI Overviews already cut logged-out traffic (the 2025 user-growth scare)
  • AI-generated content pollution threatens corpus authenticity
  • Meta forums app targets the community moat
~$34B
✓ FMP
~13x +69% Disc 5 · Neutral
  • Best corpus + fastest unlock, but ~13x sales
  • ~65% growth supports it — priced FOR growth
  • Quality off the charts; valuation not a gap
Moderate
  • Big growth/licensing optionality = upside call
  • But ~13x sales means real drawdown if growth slows
  • Net mildly positive from the licensing option
  • Community/moderator revolt risk is structural (2023 API blackout precedent)
  • Altman's stake = governance optics
  • Ad business still ~90% of revenue and competitive
High
The AI-data story IS the stock; renewal terms are the swing
  • Google contract renewal & structure (report)
  • Scraper litigation incl. Perplexity suit
  • Meta forums app traction (the bear case)
  • Data-licensing line in quarterly prints
TripAdvisorTRIP◆ owner
IR / presentations ↗
Data 4 · Neutral
  • ~1B travel reviews; Viator experiences marketplace
  • Widely scraped & substitutable
  • Reviews feed AI trip-planning agents
Slowing risk
  • ~1B cumulative, but contributions follow visits — and AI answers divert visits
  • The corpus ages if the flywheel slows
AI 4 · Neutral
  • Perplexity partnership (Jan 2025) now a measurable booking channel
  • ChatGPT app launch partner (Oct 2025) for trip planning
  • Distribution-into-AI strategy, not paid corpus licensing
  • Viator + TheFork now >50% of revenue — the real value
✓ deep dive
  • Perplexity partnership, Jan 2025 — hotels customer-acquisition channel (PR)
  • ChatGPT app launch partner, Oct 2025 (report)
  • Paid licensing of the review corpus (currently given for distribution)
  • Viator inventory as the bookable layer inside AI agents
  • AI trip planners bypass the site entirely — the core meta business is the casualty
  • Viator/TheFork partially insulated (fulfillment, not discovery)
~$1.4B
~ EV est
~0.7x +3% Disc 5 · Neutral
  • ~$1.4B cap, ~0.7x EV/Sales on +3% growth
  • Very cheap, but reviews are being disintermediated
  • Value is Viator/TheFork, not the review corpus
Low–Moderate
  • Cheap but melting
  • Weak optionality
  • Viator faces GetYourGuide/Klook competition; legacy meta declines
  • Post-Liberty structure leaves strategic questions
Med
AI read as existential threat; partnerships seen as defensive, not monetizing
  • AI-channel booking disclosures
  • Membership program launch
  • Viator/TheFork growth (the real value)
YelpYELP◆ owner
IR / presentations ↗
Data 6 · Neutral
  • ~300M geocoded local-business reviews
  • Structured local sentiment for 'best X near me'
  • Classified as internet content, not data services
Steady flow — not melting
  • 22M new reviews in 2025 (vs 21M in '24); corpus 330M, +7% YoY (FY25 PR)
  • What's melting is consumption (app engagement), not contribution — yet
  • Risk: contribution follows traffic with a lag
AI 6 · Neutral
  • Signed OpenAI agreement (disclosed Feb 2026)
  • Perplexity has used Yelp local data since Mar 2024
  • 'Other revenue' +17% on data licensing & transactions
  • Expanding Yelp Assistant; Hatch acquisition (AI front-desk)
  • Core local-ad business still the eroding center
✓ deep dive
  • OpenAI agreement signed (Feb 2026, undisclosed) (FY25 PR)
  • Perplexity has integrated Yelp local data since Mar 2024
  • Data licensing inside 'Other revenue' (+17%)
  • More assistant integrations (Gemini, Claude, Alexa-class)
  • Usage-priced local-data API
  • Transactional referrals from AI answers
  • AI assistants answer 'best X near me' without a Yelp visit — ad impressions leak
  • Google's AI search squeezes the top of funnel
~$1.3B
✓ FMP
~0.9x +3% Disc 6 · Neutral
  • ~$1.3B cap, ~0.9x EV/Sales on +3% growth
  • <1x sales — cheap, but the ad core is eroding
  • AI-distribution optionality vs value-trap risk
Moderate · binary
  • Cheap with AI-distribution optionality
  • vs an eroding core
  • Binary-ish
  • Own antitrust fight with Google (plaintiff) — outcome cuts both ways
  • SMB advertiser churn; restaurant/retail ads already shrinking
Low-Med
OpenAI deal is new and barely in the price; story still read as 'Google victim'
  • OpenAI deal revenue contribution
  • 'Other revenue' growth each quarter
  • Services-ads resilience vs AI search
ZillowZ/ZG◆ owner*
IR / presentations ↗
Data 5 · Neutral
  • Zestimate + listing data + largest US housing audience
  • Much listing data is MLS-shared, not fully proprietary
  • Consumer housing intent data
Churning flow
  • Listings turn over rather than accumulate; Zestimate history compounds quietly
AI 5 · Neutral
  • Strong in-app AI; partial data moat
  • Real-estate AI agents could use it
  • Mid on the curve
~ desk note
  • In-app AI (natural-language search); MLS data shared
  • Housing-intent data for real-estate agents
  • AI agents could search listings directly; Zillow's audience moat = the defense
  • Low risk to Zestimate itself
~$8.6B
✓ FMP
~3.1x +16% Disc 6 · Neutral
  • ~$8.6B cap, ~3.1x EV/Sales on +16% growth
  • Cheaper than I'd shown; partial (MLS-shared) moat
  • Housing-cycle leverage on top
Moderate
  • Housing-cycle optionality
  • Partial moat caps the data upside
  • Balanced
  • NAR commission-settlement reshapes agent economics — its customers' wallets
  • Housing-cycle beta; Showcase/mortgage execution
Med
AI features noted, not a data thesis
  • Housing cycle; Showcase attach
Peer-reviewed journal publishing data
RELXRELX◆ owner
IR / presentations ↗
Data 10 · High
  • Elsevier science (The Lancet, Cell, Scopus) — peer-reviewed at scale
  • LexisNexis legal + LexisNexis Risk Solutions (identity/fraud)
  • Three of the most defensible corpora on earth in one company
  • Scientific literature is critical for frontier capability
Growing
  • Global science output grows mid-single-digit %/yr; submissions rising
  • Caveat: AI-generated paper flood is a quality-control burden
AI 8 · High
  • Lexis+AI, Scopus AI, ClinicalKey AI, Protégé all live
  • Embeds data in grounded retrieval vs raw training access
  • Among the best-positioned grounded-AI owners
  • High — productization mature & shipping
✓ deep dive
  • No raw licensing; grounded products only
  • Lexis+ AI, Scopus AI, ClinicalKey AI, Protégé all shipped
  • Elsevier corpus licensing remains a withheld option (big if ever)
  • Agent-access tiers to Scopus/Lexis
  • Risk-data feeds into KYC agents
  • Lexis faces the same legal-AI insurgency as Westlaw
  • Elsevier: AI summarization + open access erode subscription rationale
  • Risk division most insulated
~$92B*
✓ FMP
~9.0x +7% Disc 4 · Neutral
  • ~9x EV/Sales on +7% growth — a small AI-threat discount embedded
  • Grounded AI products shipping across all three corpora
  • Durable compounder; thesis is durability, not deep value
Low
  • Fully-valued premium compounder
  • Durable, but limited asymmetry either way
  • Open-access mandates (Plan S) pressure Elsevier's model
  • Exhibitions segment is cyclical
High — as threat
Same legal-AI threat narrative as TRI; grounded-product execution under-credited
  • Lexis+ AI penetration disclosures
  • Any Elsevier AI-licensing posture change
  • FY guide post-crash
WileyWLY◆ owner
IR / presentations ↗
Data 7 · High
  • Peer-reviewed STM journals/books; Cochrane co-publishing
  • Vetted scientific text — what labs pay for to lift capability
  • A 'smaller Elsevier' — quality corpus, narrower than RELX
  • Editorial vetting + citation links add provenance
  • Proprietary, not freely on the open web
Growing
  • Submissions +25%, output +13% — the journal flow is accelerating (Q1 PR)
  • Caveat: some of that surge is AI-assisted writing — vetting is the product
AI 7 · High
  • $92M lifetime AI-licensing revenue; $29M in Q1 FY26 alone
  • Anthropic strategic partnership (Sep 2025) + projects with 3 top tech cos
  • Recurring inference pilots: pharma, chemical, space-exploration cos
  • One of the only names with disclosed, recurring AI revenue
  • Recurring AI line gives it proven monetization few peers can show
✓ deep dive
  • $92M lifetime AI revenue; $29M in Q1 FY26 (PR)
  • Anthropic strategic partnership (Sep 2025)
  • Projects with 3 of the largest tech cos (unnamed)
  • Recurring inference pilots: pharma, chemical, space
  • Convert pilots → recurring corporate R&D subscriptions
  • License on behalf of partner publishers (agency model)
  • Agent-citation / RAG licensing beyond training
  • AI summarization reduces per-article reading; open access erodes paywalls
  • AI-written paper flood strains (and ironically validates) peer review
~$2.3B
✓ FMP
~1.9x ~flat Disc 7 · High
  • ~1.9x EV/Sales with disclosed, recurring AI-licensing revenue ($92M lifetime)
  • Flat underlying top line is the offset
  • Cheap on metrics for the rare proven AI licensor
Moderate
  • Low multiple + proven licensing = bounded downside with optional upside
  • Flat core growth caps the slope
  • Asymmetry modest but positive
  • Library budget pressure + consolidation of academic spend
  • Post-divestiture portfolio still re-finding growth
High
AI-licensing story is prominent in coverage; expectations now elevated
  • Next earnings: Tue June 16, 2026, pre-market — FY26 Q4 + FY27 guide (notice)
  • AI recurring revenue <10% of AI revenue today; mgmt expects the proportion to triple next year (Q3 call)
  • OpenEvidence partnership: 5-yr multimillion licensing + Wiley equity stake
  • Nexus licensing service at 36 publishing partners — the agency model scaling
  • Emerald Publishing acquisition (Jun 2, 2026) adds proprietary research corpus
  • Q3 raised margin/EPS guidance to high end; ~4.5% dividend while you wait
Wolters KluwerWTKWY◆ owner
IR / presentations ↗
Data 9 · High
  • Legal, tax, health & regulatory information + workflow (CCH, UpToDate)
  • UpToDate is a premier point-of-care clinical reference
  • Authoritative corpora like RELX/Thomson Reuters
  • Subscription, deeply embedded in workflows
Steady flow
  • Regulatory/tax/clinical updates are a built-in perpetual flow
AI 7 · High
  • AI workflow tools shipping across segments
  • Same grounded-AI position as RELX/TRI
  • Up the curve, productizing its corpus
~ desk note
  • AI embedded in UpToDate/CCH; no corpus licensing
  • Clinical-grounding deals for medical AI (UpToDate is the prize)
  • UpToDate's clinical-reference franchise faces AI-native rivals (e.g. OpenEvidence)
  • Tax/legal workflow seats exposed like TRI/RELX
~$38B*
✓ FMP
~6.0x +6% Disc 3 · Low
  • Premium compounder
  • AI quality understood & paid for
  • Durability, not discount
Low
  • Durable but fully valued
  • Limited asymmetry
  • CEO transition (long-tenured McKinstry era ended)
  • Health segment competition intensifying
Med
Quality understood; AI optionality not separately priced
  • UpToDate AI products; FY guide
Research analytics · IP · content data
ClarivateCLVT◆ owner
IR / presentations ↗
Data 7 · High
  • Web of Science — citation graph linking ~2B scientific citations
  • Derwent (patents) + Cortellis (drug-pipeline intelligence)
  • ProQuest academic content: dissertations, archives, ebooks
  • Valuable for research/IP agents — 'a poor man's Elsevier'
  • Data quality seen as better than the company's execution
Steady flow
  • Citations/patents grow with global publishing — steady, not accelerating
AI 5 · Neutral
  • Signed access deals (Anthropic) + MCP exposure
  • AI research assistants in pipeline, slow to ship
  • Citation + patent networks useful for IP/research AI
  • ~$4.5B net debt constrains reinvestment
  • Behind on the curve — data ready before the company
✓ deep dive
  • Anthropic access agreement + MCP exposure for Web of Science
  • No disclosed $; debt limits investment
  • Patent/citation grounding for research agents
  • ProQuest licensing to labs
  • AI literature tools (Elicit, Semantic Scholar) bypass Web of Science discovery
  • Patent search AI-commoditized
~$1.5B
✓ FMP
~2.3x −4% Disc 7 · High
  • ~2.3x EV/Sales (EV ~$5.7B, mostly debt) on $2.46B rev
  • Equity (~$1.5B) is a small levered stub
  • Cheap on sales, but the debt is the risk
High · distressed option
  • Small equity stub over ~$4.5B debt ≈ a call option on the enterprise
  • Bounded loss, multi-bagger upside if it de-levers/monetizes
  • Convex but a high-probability left tail — size accordingly
  • ~$4.5B debt wall dominates everything
  • PE overhang; serial restructurings and writedowns
Low
Debt story drowns the data story entirely
  • De-leveraging milestones
  • Any AI-licensing disclosure
  • Segment divestitures
Getty ImagesGETY◆ owner
IR / presentations ↗
Data 8 · High
  • ~500M licensed, rights-cleared, caption-annotated images & video
  • Exclusive editorial archives spanning a century
  • iStock + Unsplash extend the catalog across tiers
  • Rights-cleared image–text pairs = ideal multimodal training data
  • Legal indemnification is the product AI builders need
Strong flow + archive
  • 160k+ events covered/yr; ~600k creators; thousands of assets ingested daily (Q2 PR)
  • Editorial is a daily flow machine, not just a vault — FY25 grew both segments
  • Risk is creative-side inflow: genAI erodes contributor economics
AI 6 · Neutral
  • Perplexity multi-yr display deal (Oct 2025)
  • Generative tools with NVIDIA; licensed-data posture vs scrapers
  • Shutterstock merger (UK-cleared May 2026) adds its lab licensing deals
  • Litigation (Stability AI) continues to define the rights frontier
  • Licensing not yet replacing what AI takes from stock demand
✓ deep dive
  • Perplexity multi-yr display deal, Oct 2025 — undisclosed $ (PR)
  • NVIDIA-powered licensed generative tools (Getty/iStock)
  • Shutterstock brings lab deals (OpenAI, Meta, Apple, Amazon) post-merger
  • Post-merger: consolidated licensed-visual-data vendor to every lab
  • Display/attribution deals with other AI search products
  • Indemnified training data as a product line
  • GenAI image substitution is already in the creative numbers
  • Editorial (real events) is the un-generatable refuge
~$0.3B
✓ FMP
~1.5x +4% Disc 7 · High
  • ~$0.3B cap — a deep-distress equity stub over ~$1.3B+ debt
  • ~1.5x EV/Sales on ~$0.9B revenue
  • Cheap + levered = a lottery ticket on the data
High · lottery
  • Distressed, levered equity on ideal data — near-binary
  • Multiplies on a licensing/M&A catalyst, or drifts to zero
  • Steeply convex, lowest-conviction high-convexity name
  • ~$1.3B+ debt; controlled company (Getty family + Koch)
  • Shutterstock integration risk; CMA found UK editorial concerns (remedies)
Med-High
Every AI headline attaches to it; the balance sheet, not awareness, is the constraint
  • Shutterstock merger close (UK-cleared May 2026)
  • Combined AI-licensing revenue line
  • Stability AI litigation outcomes
  • Debt refinancing
PearsonPSO◆ owner
IR / presentations ↗
Data 6 · Neutral
  • Education content, assessment & learning-outcome data
  • Proprietary curriculum + testing content
  • Education is an AI-disruption epicenter
Steady flow
  • Assessment/courseware data flows with enrollment
AI 5 · Neutral
  • AI partnerships to license/embed content
  • Two-sided disruption: tutoring threat + licensing optionality
  • Mid on the curve
~ desk note
  • AI partnerships announced 2025 with Microsoft, Google Cloud & AWS for learning products
  • Curriculum licensing into AI tutors; assessment data moats
  • AI tutors substitute courseware — the existential half of the two-sided story
  • Assessment/credentialing more defensible
~$9.3B
~ EV est
~2.0x +3% Disc 5 · Neutral
  • Disruption discount
  • Optionality + threat both real
  • Owner with genuine two-sidedness
Moderate
  • Content-licensing/AI-tutoring optionality
  • vs a real disruption threat
  • Two-sided convexity
  • Enrollment cliffs + OPM decline in higher ed
  • Multi-year strategic rebuild under newer CEO
Med
Two-sided: tutoring threat vs licensing option
  • Enrollment AI products; partnership revenue
Geospatial · sensor data
BlackSkyBKSY◆ owner
IR / presentations ↗
Data 8 · High
  • High-frequency satellite imagery + Spectra AI geospatial intelligence
  • Rapid-revisit imagery over own and third-party sensors
  • Growing multi-year defense backlog
  • Same theme as Planet, earlier in scaling
Compounding
  • Constellation growth (Gen-3) raises capture rate; archive accrues
AI 7 · High
  • $100M+, 7-yr international defense contract (Jan 2025)
  • $30M+ multi-year Gen-3 tactical ISR deal (Q3 2025)
  • Backlog $323M, 91% international
  • Spectra AI analytics layer over own + third-party sensors
  • Same curve as Planet, earlier and cheaper-cap stage
✓ deep dive
  • $100M+, 7-yr int'l defense contract, Jan 2025 (PR)
  • $30M+ Gen-3 tactical ISR deal (Q3 25); backlog $323M, 91% int'l
  • Gen-3 constellation upsells
  • US budget normalization
  • Spectra analytics licensing to allied gov'ts
  • Same — AI raises the value of the sensor flow
~$1.2B
✓ FMP
~12x +4% Disc 4 · Neutral
  • ~$1.2B cap, EV ~$1.25B on ~$107M revenue
  • ~12x EV/Sales on ~flat revenue — richly valued, not cheap
  • Defense backlog is the story; the price is not a discount
Moderate
  • Unique imagery + defense backlog = real optionality
  • But ~12x sales on flat revenue means you pay up for it
  • Not the bounded-downside cheap option it first looked like
  • Dilution history; international customer concentration
  • Gen-3 execution timeline risk
Med-High
Defense-AI story increasingly recognized; ~12x sales already pays for it
  • Gen-3 launch & tasking milestones
  • US budget resolution
  • New int'l capacity commitments
LeidosLDOS△ borderline
IR / presentations ↗
Data 3 · Low
  • Works on gov geospatial/intel data it doesn't own
  • Palantir-type: analytics layer on others' data
n-a
  • Doesn't own the data it works on
AI 4 · Neutral
  • AI analysis agents on others' data
  • Services, not a data owner
~ desk note
  • AI services on government data it doesn't own
  • AI compresses services labor pricing — the classic services squeeze
~$16B
~ EV est
~1.3x +6% Disc 4 · Neutral
  • Cheap services multiple
  • Not a data-owner screen fit
Low
  • Services multiple, no data optionality
  • Recompete cycles; budget continuing-resolution exposure
Low
Services multiple, services story
  • Award cycles
Planet LabsPL◆ owner
IR / presentations ↗
Data 9 · High
  • Images the entire landmass daily (~3.5m), plus high-res SkySat/Pelican (~50cm)
  • A unique multi-year temporal archive no competitor has
  • Change-over-time is the moat — can't retroactively collect history
  • Increasingly delivered as AI-ready analytics
  • Defense & intelligence is the fastest-growing buyer
Compounding by design
  • Whole-Earth scan daily — the archive grows every 24h by construction
  • New satellites add resolution/cadence; history can't be re-collected
AI 7 · High
  • Anthropic partnership (Mar 2025): Claude applied to satellite imagery
  • First prime win on NGA Luno ($12.8M, maritime AI analytics)
  • MDA SHIELD IDIQ prime — eligible for Golden Dome task orders
  • Backlog ~$900M (+79% YoY); Q4 revenue +41%
  • AI analytics is the product; defense is the buyer
✓ deep dive
  • Anthropic partnership (Mar 2025): Claude on satellite imagery (report)
  • NGA Luno prime win $12.8M (SpaceNews)
  • MDA SHIELD IDIQ prime (Golden Dome-eligible); backlog ~$900M
  • Golden Dome task orders
  • AI-analytics subscriptions over the archive (insurance, ag)
  • More foundation-model partnerships on temporal imagery
  • Low — AI is the accelerant, not the threat; risk is capex/competition not AI
~$10.4B
✓ FMP
~28x +26% Disc 4 · Neutral
  • ~28x EV/Sales, pre-profit — the data and backlog are the appeal, not the multiple
  • Backlog ~$900M anchors forward revenue
  • Rich on every metric
High · optionality
  • Unique archive + ramping $906M defense backlog = large 'if it scales' upside
  • Pre-profit/capital intensity is the downside
  • Strong positive convexity
  • SPAC-era dilution legacy; Pelican capex cycle
  • Government contract concentration & timing lumps
High
AI + defense premium fully in the ~28x; expectations are the risk
  • Next earnings: ~early Sept 2026 (FQ1'27 reported Jun 4 — record print) (Q1 8-K)
  • FY27 guide raised to $425–441M (+41% mid); Q2 guide $102–107M with adj-EBITDA breakeven-to-positive
  • Backlog $906M (+72%), RPO $816M (+81%); ~40% of backlog converts within 12 months
  • Pelican cadence: 3 launched in Q1 incl Sweden's first sovereign recon satellite
  • $731M cash funds the capex cycle; NGA $22M extension; Golden Dome task orders the option
Spire / SatellogicSPIR/SATL◆ owner
IR / presentations ↗
Data 6 · Neutral
  • Weather/maritime/RF data (Spire); hyperspectral imagery (Satellogic)
  • Niche proprietary sensor data
  • Early and capital-intensive
Compounding
  • Continuous sensor flow (weather/RF/hyperspectral); small base
AI 5 · Neutral
  • Real but early sensor datasets
  • On the curve but small
~ desk note
  • Niche gov/defense sensor contracts
  • Weather/RF data into forecasting AI
  • Low AI risk; survival risk is capital, not AI
~$0.5B
~ EV est
~3.0x +20% Disc 5 · Neutral
  • Speculative micro-caps
  • Watchlist-only owners
  • High risk, thin coverage
High · lottery
  • Micro-cap sensor data — binary
  • Large upside if a dataset scales, fat left tail
  • High-variance convexity
  • Cash runway and listing-compliance history — survival-grade risks
Low
Below the radar entirely
  • Contract wins; cash runway
Sports data
Genius SportsGENI◆ owner
IR / presentations ↗
Data 8 · High
  • Exclusive official league-data rights (NFL, NCAA, EPL)
  • Now the NCAA's official data provider
  • The other half of the official-sports-data duopoly with Sportradar
  • Growing media/ad data layer (post-Legend acquisition)
  • Multi-year rights = a hard moat
Growing
  • More leagues, deeper tracking (player-level optical) each season
AI 7 · High
  • AI for fan engagement and betting integrity products
  • Media/ad data layer monetizes the rights twice
  • Growing ~25%; up the curve like Sportradar
  • Owner actively monetizing, not just holding
✓ deep dive
  • No corpus licensing; exclusive NFL/NCAA/EPL rights in-product
  • BetVision + media/ad data layer (Legend acq.)
  • Second monetization of rights via media/ads
  • AI integrity & fan-engagement products
  • Same; rights moat holds, services layer competitive
~$1.7B
✓ FMP
~3.6x +31% Disc 6 · Neutral
  • ~$1.7B cap, ~3.6x EV/Sales on +31% growth
  • Cheap for the growth + the official-data rights duopoly
  • Media/ad layer monetizes the rights twice
High
  • Rights moat + media-data optionality = asymmetric upside
  • Growth-priced, so not deeply cheap
  • Convex if the media layer scales
  • NFL warrant dilution; rights renewals can reset economics
  • Only recently profitable
Low
Same as Sportradar — the duopoly's AI angle is unpriced
  • Next earnings: ~early Aug 2026 (Q1 reported May 8)
  • Legend closed May 1 → FY26 guide ~$990M–$1.01B rev / $270–280M EBITDA (~28% margin) (Q1 call)
  • NFL rights locked through Super Bowl 2030; GeniusIQ to automate the full rights portfolio by end-2027
  • Prediction markets: market makers onboarded in Q1 on low-latency feeds
  • Targets: positive GAAP net income 2027; ≥60% uFCF conversion by 2028; ~$100M H2'26 cash flow
SportradarSRAD◆ owner
IR / presentations ↗
Data 8 · High
  • Official, licensed sports-data rights — 900k+ events, 80+ sports
  • Real-time play-by-play feeds, pre-match & live odds, streaming
  • Multi-year exclusive league contracts = hard-to-replicate moat
  • Half of a duopoly with Genius for official betting data
  • The data backbone of the global betting industry
Growing
  • Event coverage (900k+/yr) and in-play depth keep expanding
AI 7 · High
  • AI for in-play personalization, risk/trading, content generation
  • Higher-margin products (MTS, 4Sight) lift take-rates
  • A genuine owner monetizing its corpus
  • Up the curve; AI deepens products vs a new licensing line
  • Recent Kalshi deal extends into prediction markets
✓ deep dive
  • No corpus licensing — official-data rights monetized in-product
  • Kalshi deal extends feeds into prediction markets
  • AI in-play products lift take-rates (4Sight, MTS)
  • Prediction-market data feeds scale
  • Betting operators in-housing AI models could squeeze value-add services
  • Official rights protect the raw feed itself
~$4.9B
✓ FMP
~3.0x +12% Disc 6 · Neutral
  • ~3.0x EV/Sales on ~12% growth for a rights-duopoly owner
  • Reasonable on metrics for the moat
  • Fair-to-slightly-cheap
Moderate
  • ~3x sales on duopoly rights gives a floor
  • Upside from take-rate growth on new products
  • Balanced, slight positive tilt
  • Rights-cost inflation: leagues extract more each renewal
  • Founder (Koerl) control; bookmaker customer concentration
Low
Priced as a betting vendor; data-rights duopoly rarely framed as AI
  • Next earnings: ~early Aug 2026 (Q1 reported early May)
  • NOW: FIFA World Cup (Jun–Jul 2026) — major in-play/MTS volume event (Q1 call)
  • FY26 reaffirmed: 23–25% cc revenue growth / 34–37% EBITDA growth
  • Prediction markets 'imminent, potentially material' — H2 ramp
  • IMG Arena synergies above 25% target; >700k streamed matches in 2026; H2 restructuring for leverage
  • Short-seller reports — CEO pushed back on call; monitor, don't ignore
Ad · measurement · web data
DoubleVerify / ComscoreDV/SCOR◆ owner
IR / presentations ↗
Data 6 · Neutral
  • Ad-verification/fraud data (DV, healthier franchise)
  • Cross-platform audience measurement (Comscore, distressed)
  • Proprietary measurement data
Flow with ad spend
  • Verification events track media volumes
AI 6 · Neutral
  • Measurement owners; AI + walled gardens pressure the moat
  • DV is the credible franchise; SCOR a broken business
~ desk note
  • AI-content verification products (DV)
  • Verification layer for AI-generated ad content
  • AI-generated content/MFA sites flood verification (volume up, value contested)
  • Walled gardens self-verify
~$1.6B
✓ FMP
~2.0x +14% Disc 7 · High
  • ~$1.6B cap, ~2.0x EV/Sales on +14% growth
  • Cheap for an ad-verification data owner (DV)
  • DV the franchise; SCOR the distressed lottery leg
High
  • ~2x sales for a profitable measurement owner
  • AI + walled gardens pressure the moat
  • Cheap enough to be convex
  • Ad-budget cyclicality; IAS rivalry compresses pricing; SCOR is balance-sheet-fragile
Low-Med
De-rated with adtech; AI angle minor
  • DV growth; SCOR restructuring
SimilarwebSMWB◆ owner
IR / presentations ↗
Data 5 · Neutral
  • Panel/clickstream traffic, keyword, conversion estimates for nearly every site
  • The dataset everyone uses to track digital behavior — incl. AI-search traffic
  • Broad coverage, but modeled/estimated, not a first-party record
  • Continuously updated digital-intelligence feeds
Continuous panel
  • Clickstream flow is constant but panel-based — quality needs constant defense
  • Privacy/cookie shifts are structural headwinds to collection
AI 8 · High
  • Sells data feeds/APIs + MCP integrations into AI workflows
  • Uniquely positioned to measure (and feed) the AI-search era
  • Ahead for its size — high AI exposure per dollar
  • Catch: modeled data less defensible than owned
✓ deep dive
  • Sells AI/clickstream datasets + MCP integrations into AI workflows
  • The standard source for tracking ChatGPT/Gemini traffic share
  • AI-data ARR as a disclosed line
  • Agent-platform data feeds
  • Strategic acquirer interest (data fits many buyers)
  • AI search shrinks open-web traffic — shrinking the thing it measures
  • Collection (panels/extensions) gets harder as browsing shifts to agents
~$0.36B
✓ FMP
~0.7x +15% Disc 7 · High
  • ~$0.36B cap, EV ~$0.21B on ~$283M revenue
  • ~0.7x EV/Sales — strikingly cheap, even for modeled data
  • Deep-value + AI-licensing optionality; small & illiquid
High · deep-value
  • <1x EV/Sales with AI-licensing pull = asymmetric
  • Small, illiquid, modeled (non-owned) data = the risk
  • Cheap enough that convexity tilts positive
  • Nano-cap liquidity; SBC heavy; privacy rules threaten collection methods
Med
Its datasets are quoted everywhere; the equity is ignored at ~0.7x EV/S
  • Next earnings: ~mid-Aug 2026 (Q1 reported May 13)
  • Second large LLM training contract expected 'over the coming quarters' (Q1 6-K)
  • AI revenue trajectory: 11% of Q4 revenue, ~3x YoY — does it keep compounding?
  • RPO $297.7M (+18%); multi-year ARR at 64% — contract-quality migration
  • FY26 guide $307–315M; low end already raised once
The Trade DeskTTD○ operator
IR / presentations ↗
Data 6 · Neutral
  • Ad-bidding/bidstream data + UID2 identity framework
  • Powers its own bidding (demand-side platform)
  • Vast behavioral data, but an input
High flow
  • Bidstream data scales with ad volume; ephemeral by nature
AI 6 · Neutral
  • Stewards the UID2 identity standard
  • Identity-data optionality, not a corpus sale
  • De-rated; case on ad-platform fundamentals
~ desk note
  • Kokai AI in-platform; UID2 stewardship
  • UID2 as identity layer for agentic commerce
  • AI walled-garden answers shrink open-web inventory — the de-rate driver
  • Agentic ad-buying could compress DSP take rates
~$9.4B
~ EV est
~3.0x +18% Disc 6 · Neutral
  • ~3.0x EV/Sales on +18% growth — value territory for profitable adtech
  • UID2 identity optionality on top
  • Open-web AI fears embedded in the multiple
Moderate–High
  • Modest multiple + identity-standard optionality
  • Data is an input, not a sold corpus
  • Positive tilt on metrics
  • Founder super-voting; Amazon DSP is the real competitive event
  • SBC and the credibility hit from the '25 stumble
Med — as threat
AI read as open-web risk; de-rate reflects it
  • CTV share; UID2 adoption; growth re-accel
ZoomInfoGTM◆ owner
IR / presentations ↗
Data 6 · Neutral
  • B2B contact + company intelligence: emails, dials, org charts, technographics
  • Buying-intent signals across millions of companies
  • A live 'who's-who' graph of decision-makers
  • Real, but increasingly replicable as AI shifts buyer behavior
  • Renamed platform around 'GTM AI'
Decay treadmill
  • B2B contact data decays ~25–30%/yr — must be rebuilt constantly
  • Customer churn weakens the contributory refresh loop
  • The clearest decaying-asset risk in the table
AI 7 · High
  • GTM Context Graph native in OpenAI's Codex for Work — agent context layer
  • AI is both distribution and disruptor
  • Cut 2026 guidance + ~20% of staff on AI-driven shifts
  • Ahead on plumbing, behind on the seat-based model
  • Clearest live case of 'data doesn't protect the equity'
✓ deep dive
  • GTM Context Graph natively in OpenAI's Codex for Work
  • No disclosed licensing $; positioning as agent context layer
  • Per-call context pricing for sales agents
  • More agent-platform embeds (Claude, Gemini)
  • Data-only tier decoupled from seats
  • Customers replace SDR seats with AI — seat-based model directly hit (guidance cut said so)
  • Agents can increasingly infer contact data without a vendor
~$0.8B
✓ FMP
~1.7x ~−3% Disc 6 · Neutral
  • ~1.7x EV/Sales — lowest multiple on the board
  • But revenue is declining; the cheapness reflects decay risk
  • Statistically cheap; operationally a falling knife
High · binary
  • ~1.7x sales embeds heavy pessimism — small asymmetric base
  • Re-rates hard if revenue stabilizes as the agent-context layer
  • Declining revenue is the live left tail
  • Debt on a shrinking base; SBC dilution; churn is the whole story
High — as threat
The market's AI-victim poster child; the Codex embed is ignored
  • Next earnings: ~early-mid Aug 2026 (Q1 reported May 11)
  • The trough test: FY26 guide cut to $1.185–1.205B (−4% mid); Q2 $300–303M — does it hold? (Q1 call)
  • Agent embeds: Salesforce prospecting agent ships with ZoomInfo as first/primary external data source (150k+ customers); HubSpot native; ChatGPT/Claude/Copilot/Perplexity connectors live
  • Pricing pivot: Copilot moving from seats to prepackaged credits/consumption
  • Mgmt points to growth returning H2 2027; 35% AOI margin + cost cuts fund the wait
Auto data
ACV / OPENLANEACVA/KAR○ operator
IR / presentations ↗
Data 6 · Neutral
  • Wholesale used-car condition & transaction data (ACV inspection corpus)
  • Granular vehicle-condition/pricing data
  • Still primarily marketplaces
Growing
  • Inspection corpus grows with every vehicle listed (ACV)
AI 5 · Neutral
  • Feeds AI pricing
  • ACV more data-distinctive
  • Operators, not data-unlock plays
~ desk note
  • ACV inspection-AI in-product
  • Condition-data licensing to pricing AIs
  • Low-moderate; inspection AI is ACV's own product
~$1.0B
~ EV est
~5.0x +25% Disc 5 · Neutral
  • ACV the more data-distinctive
  • Both operators
  • Corpus enhances the platform
Moderate
  • ACV growth + condition-data optionality
  • Valued on the marketplace
  • Mildly positive
  • ACV not yet sustainably profitable; OPENLANE balance sheet
Low
Marketplace story
  • GMV growth; take rates
CarGurus / Cars.comCARG/CARS○ operator
IR / presentations ↗
Data 5 · Neutral
  • Auto listing, pricing & shopper-intent data
  • Largely audience/marketplace
  • Listings not fully proprietary
Churning flow
  • Listings churn; intent data flows with traffic
AI 4 · Neutral
  • Useful intent data, Zillow-like
  • Not a data-unlock play
~ desk note
  • In-product pricing AI
  • AI shopping agents could bypass listing sites
~$2.7B
~ EV est
~3.0x +5% Disc 5 · Neutral
  • Reasonable valuations
  • Operators in the Zillow mold
Low
  • Operator, limited data asymmetry
  • Balanced-to-low
  • Dealer-count churn; marketing-spend treadmill
Low
Marketplace story
  • Dealer counts
CopartCPRT○ operator
IR / presentations ↗
Data 7 · High
  • Salvage-auto auction & vehicle-history data (IntelliSeller)
  • Decades of auction-outcome data
  • Serves its dominant auction marketplace
Growing
  • Salvage auction outcomes accumulate with volume
AI 5 · Neutral
  • AI tools in-product, not licensed
  • Data deepens the moat, isn't the product
~ desk note
  • Internal auction AI (IntelliSeller)
  • Low; AI assists damage assessment
~$29B
~ EV est
~10x +10% Disc 3 · Low
  • Premium, high-quality operator
  • Data deepens the moat, isn't the product
Low
  • Premium, data not the re-rate driver
  • Leadership transition from founder era; totals cycle depends on used-car values
Low
Operator story
  • Volume cycles
Retail · e-commerce data
InstacartCART○ operator
IR / presentations ↗
Data 6 · Neutral
  • Grocery-purchase + fast-growing retail-media ad data
  • Rich first-party purchase data
  • Powers its own high-margin ads (input)
Compounding
  • Purchase graph deepens with order history
AI 5 · Neutral
  • Strong data-driven ad engine
  • AI-relevant, but feeds its ads, not sold
  • Operator class
~ desk note
  • Retail-media AI in-product
  • Purchase-data into commerce agents (never signaled)
  • AI shopping agents could disintermediate the storefront layer
~$9.9B
~ EV est
~3.5x +10% Disc 5 · Neutral
  • Reasonable on ads + delivery
  • Strong ad engine
  • Data is an input
Moderate
  • Retail-media optionality
  • Valued on the business, not the data
  • Balanced
  • DoorDash/Uber entering grocery; ad growth must outrun fee pressure
Low
Grocery/ads story
  • Ad revenue growth
Transaction · payments data
FISFIS○ operator
IR / presentations ↗
Data 5 · Neutral
  • Merchant transaction flows & fraud signals (banking/payments processing)
  • Real data, but serves its processing
Steady flow
  • Transaction flow tracks processing volumes
AI 4 · Neutral
  • In-product fraud/upsell, not a corpus
~ desk note
  • Fraud AI in-product
  • AI-native fintech infrastructure competition
~$21B
~ EV est
~4.0x +4% Disc 4 · Neutral
  • Cheap-ish fintech
  • But not a data re-rate
Low
  • Value fintech, data not the driver
  • Worldpay separation aftermath; bank IT spending cycles
Low
Fintech story
  • Banking IT spend
Visa / Mastercard / AmexV/MA/AXP○ operator
IR / presentations ↗
Data 8 · High
  • Among the largest transaction datasets on earth
  • Regulated, privacy-bound byproduct
  • Not licensed as a corpus
Compounding
  • Payment volumes grow ~10%/yr — among the largest data flows on earth
AI 3 · Low
  • Increasingly productized
  • But privacy-bound; not a corpus sale
  • The ultimate data-advantaged operators
~ desk note
  • Internal fraud/credit AI at vast scale; agentic-commerce pilots
  • Agentic payments standards (who authorizes an AI's purchase?)
  • Agentic payments could reshape authorization economics — also an opportunity
  • Stablecoin/alternative rails the bigger structural worry
~$623B / $438B
~ EV est
~16x +10% Disc 2 · Low
  • Valued as payment giants
  • n/a as a data re-rate
Low
  • Priced payment networks; data is internal
  • Interchange regulation (CCCA) and DOJ debit suit (V)
  • Stablecoin rails as long-term routing threat
Med
Agentic commerce chatter rising; data never the thesis
  • Agentic-payment standards; volume growth
Data trajectory = stock vs flow: is the corpus compounding (green), in steady/churning flow (amber), or slowing/decaying (red)? A growing flow makes licensing an annuity; a static archive is a one-time sale. New research columns: ✓ deep dive = filings/PRs/coverage reviewed this pass, key bullets link to sources · ~ desk note = knowledge-based fill, deep dive pending Hype factor = how aware the market is of the AI exposure. Low = unpriced (where mispricing lives) · Med / as-threat · High = fully in the price. "As threat" = the market narrative treats AI as a risk to the business model rather than an opportunity. All market caps + EV/Sales are FMP (SEC-sourced, Jun 9 2026) on the ✓ FMP rows; ~ EV est = cap is FMP, EV/Sales estimated (mostly operators) * = ADR/foreign listing, cap hand-adjusted to total company. Revenue growth is FMP FY-over-FY for the high-growth/declining names and key mid-caps; large stable franchises & operators carry ±2pt estimates (FMP confirmed those it checked). green EV/Sales cheap for profile · red rich High underpriced · Neutral fair · Low priced Convexity = payoff shape: bounded downside + asymmetric upside. High ≠ low-risk (distressed names are option-like). Company column frozen — scroll sideways.

Ratings summary

The core tension as a 2×2: data quality (rows) vs AI-unlock (columns). Top-left — elite data, slow unlock — is the latent re-rate watchlist. Note IQVIA's migration to the right column after the deep dive (IQVIA.ai, 150+ agents, 19 of top-20 pharma). Key: * = High valuation discrepancy · † = High convexity (both symbols = both).

Lower AI-unlock
Higher AI-unlock
Elite data
Latent — great data, slow unlock

IT*†, CLVT*†, GETY*†, EFX*†, TRU, EXPN.L, GH, NTRA, DH†, VRSK, MORN, NDAQ, ICE, CSGP, RAMP, NFLX, CPRT, V/MA/AXP

Monetizing the moat

IQV*†, WLY*, SRAD, GENI†, RDDT, TEM†, PL†, BKSY, RELX, SPGI, TRI, MSCI, MCO, WTKWY, VEEV

Lower data
Lower-tier / operators

FICO, DUOL, GDRX, SPIR/SATL†, PSO, TTD, DV/SCOR*†, YELP, Z/ZG, ACVA/KAR, CART, ELV, CARG/CARS, CME, DOCS, TRIP, MELI, CVNA, FIS, COR, LDOS

Punching above their data

GTM†, SMWB*†, FDS

And the grouped view — High and Low ends of each rating.

Data quality

High (7–10)
IT*†, IQV*†, CLVT*†, GETY*†, WLY*, SRAD, GENI†, EFX*†, TRU, EXPN.L, RDDT, TEM†, GH, NTRA, DH†, PL†, BKSY, VRSK, MORN, RELX, SPGI, TRI, MSCI, MCO, WTKWY, NDAQ, ICE, VEEV, CSGP, RAMP, NFLX, CPRT, V/MA/AXP
Low (1–3)
LDOS

Data trajectory

Growing / compounding
IQV*†, GETY*†, WLY*, SRAD, GENI†, EFX*†, TRU, EXPN.L, RDDT, TEM†, GH, NTRA, DUOL, PL†, BKSY, SPIR/SATL†, VRSK, MORN, TTD, ACVA/KAR, CART, RELX, SPGI, TRI, MSCI, MCO, NDAQ, CME, VEEV, CSGP, NFLX, MELI, CVNA, CPRT, V/MA/AXP
Slowing / decaying
GTM†, DH†, TRIP, LDOS

AI-unlock

High (7–10)
IQV*†, GTM†, WLY*, SRAD, GENI†, SMWB*†, RDDT, TEM†, PL†, BKSY, FDS, RELX, SPGI, TRI, MSCI, MCO, WTKWY, VEEV
Low (1–3)
CSGP, NFLX, MELI, CVNA, COR, V/MA/AXP

Discrepancy

High — underpriced
IT*†, IQV*†, CLVT*†, GETY*†, WLY*, SMWB*†, EFX*†, DV/SCOR*†
Low — priced / n-a
SPGI, TRI, MSCI, MCO, WTKWY, NDAQ, CME, RAMP, NFLX, MELI, CVNA, CPRT, V/MA/AXP

Convexity

High — asymmetric
IT*†, IQV*†, CLVT*†, GETY*†, GTM†, GENI†, SMWB*†, EFX*†, TEM†, DH†, PL†, SPIR/SATL†, DV/SCOR*†
Low — symmetric
CARG/CARS, RELX, SPGI, TRI, MSCI, MCO, WTKWY, NDAQ, ICE, CME, VEEV, RAMP, TRIP, NFLX, CVNA, CPRT, FIS, COR, V/MA/AXP, LDOS

Most promising names

The screen's pick: names from the Monetizing the moat and Punching above their data quadrants that also carry a High valuation discrepancy (*) or High convexity (†) — i.e. the unlock is already happening and the payoff is mispriced or asymmetric.

NameThesisQ&A
IQVIA IQV *† Thesis ↓Q&A ↓
Similarweb SMWB *† Thesis ↓Q&A ↓
Wiley WLY * Thesis ↓Q&A ↓
Tempus AI TEM Thesis ↓Q&A ↓
Sports-data duopoly GENI + SRAD Thesis ↓Q&A ↓
Planet Labs PL Thesis ↓Q&A ↓
ZoomInfo GTM Thesis ↓Q&A ↓

IQVIA (IQV) *†

The cleanest setup on the board: the scarcest healthcare data at ~2.7x sales, and the unlock just went live — IQVIA.ai (Mar 2026) with 150+ agents and 19 of the top-20 pharma already using them — while the market still prices it as a sleepy CRO. High discrepancy, high convexity, low hype.

Catalysts

Similarweb (SMWB) *†

The deep-value outlier: ~0.7x EV/Sales on +15% growth while selling the datasets everyone uses to measure the AI-search era, with feeds and MCP integrations into AI workflows. Modeled (non-owned) data and nano-cap liquidity are the risks — but at this price the asymmetry is real.

Catalysts

Wiley (WLY) *

The proven licensor at a value multiple: ~1.9x EV/Sales with $92M of lifetime AI-licensing revenue already disclosed ($29M in a single quarter), an Anthropic partnership, and recurring inference pilots converting corporate R&D demand. Flat underlying growth is the offset and caps the convexity — but on pure metrics, paying under 2x sales for the rare publisher with demonstrated, repeatable AI revenue is a discrepancy. The flow is healthy too: submissions +25%. Watch item: whether the licensing line proves recurring rather than episodic.

Catalysts

Tempus AI (TEM) †

The growth-convexity pick: >$1B remaining contract value, the $200M AstraZeneca/Pathos foundation-model deal (non-exclusive — the motion can be resold), and ~6.5x sales on +83% growth. Not cheap, but the foundation-model optionality is large and real.

Catalysts

Sports-data duopoly: Genius Sports (GENI) † + Sportradar (SRAD)

The promising thing here is the duopoly's position, which both names share: legally exclusive rights to real-world events — the one category of data AI cannot generate — whose consumption AI multiplies (more priced micro-markets per game, CV-deepened datasets, AI media on the long tail, settlement-oracle demand from agents and prediction markets). Neither is framed as an AI story by the market, and at ~3.0–3.6x EV/Sales the multiples charge little for it. The choice between them is an expression preference, not a separate thesis: GENI (†) is the torque expression — ~31% growth at ~3.6x, the media/ad layer monetizing the rights a second time, and the Second Spectrum CV stack creating proprietary data beyond the feed — with the NFL renewal as the concentrated left tail that comes with the convexity. SRAD is the quality expression — twice the size, diversified across 80+ sports, profitable, founder-led — a steadier, more symmetric payoff that misses the † by one notch of judgment. Because their biggest idiosyncratic risks are mirror images (league concentration for GENI, operator in-housing of services for SRAD), the pair diversifies the single-contract risk while keeping full exposure to the shared thesis. Watch item for both: rights-cost inflation at renewal — the leagues are the residual claimants on the AI surplus.

Catalysts

Planet Labs (PL) †

Compounding-by-design data (the Earth, daily) with AI as pure accelerant, a ~$900M backlog, and Golden Dome optionality. The caveat is the price: ~28x sales means you're paying for the story — convex, not cheap.

Catalysts

ZoomInfo (GTM) †

The contrarian lottery ticket: ~1.7x sales with its Context Graph natively inside OpenAI's Codex for Work — agent-context positioning the market ignores while pricing it as the AI victim. The decay-treadmill data and shrinking revenue are why it's a small, binary position, not a core one.

Catalysts

Honorable mentions just outside the rule: Gartner (IT) *† and Equifax (EFX) *† carry both marks but sit in the Latent quadrant — the data is elite and cheap, but the unlock hasn't shipped yet; they're the re-rate watchlist rather than the active list. As throughout: one analyst's qualitative framework, not investment advice.

Q&A

Deeper discussion behind the picks above, organized by name — questions raised against each thesis, answered in full. Read the winners first; come here for the weeds.

IQVIA (IQV)

Q: I thought I heard somewhere that a lot of the IQV data is not monetizable?

You heard something real — and it's the most important caveat on the IQVIA thesis, so worth unpacking properly. The claim has three solid legs:

First, IQVIA doesn't own most of its data the way Reddit or Tempus own theirs — it licenses it in. The prescription and claims data comes from pharmacies, PBMs, payers, and distributors under supply contracts that govern what IQVIA can do with it. That's a different property position from Reddit (owns its UGC outright under its terms of service) or Tempus (generates the data in its own labs as a byproduct of its own tests). Practical consequences: the supply contracts may restrict or simply not contemplate AI-training uses, meaning new monetization can require renegotiation; and once suppliers see AI value in their feeds, they can demand a bigger cut — the data-acquisition cost line can inflate to absorb part of any new revenue. IQVIA's own 10-K risk factors flag dependence on data suppliers for exactly this reason.

Second, a big chunk of the data people attribute to IQVIA isn't theirs to sell at all. The clinical-trial data from the CRO half (the old Quintiles) belongs to the pharma sponsors who ran the trials. IQVIA gains operational expertise and metadata from running them, but the underlying trial datasets are the clients' property. So "world's largest holder of clinical trial data" overstates what's actually licensable.

Third, privacy law boxes in the rest. The patient-level asset is valuable precisely because it's longitudinal and granular — and that's exactly what HIPAA de-identification standards, GDPR, and the newer state health-privacy laws make hard to hand over for model training, where re-identification risk and downstream-use control are unresolved questions. Compliance-wrapped enterprise analytics: fine. Shipping the corpus to a lab: mostly not.

Now the counterpoint, because the bull case survives in modified form: these constraints are exactly why IQVIA's unlock took the shape it did. It isn't licensing data out — it's building agents on the data (IQVIA.ai, the NVIDIA-built foundation models) and selling workflow outcomes, which keeps everything inside the compliance and contract perimeter. That's the monetization path that's actually available to them, and the 19-of-top-20-pharma adoption suggests it's working. The honest restatement of the thesis: you're not buying a Reddit-style licensing annuity; you're buying cheap (~2.7x sales) exposure to whether agents-on-regulated-data becomes a product category — with the risks being supplier economics, execution, and the CRO half's labor-pricing exposure to AI.

It does argue the Data 9 rating conflates two things the table mostly treats as one: data scarcity (genuinely elite) versus data ownership cleanliness (middling — licensed-in, use-restricted). Tempus, Reddit, Verisk, and Equifax score much better on the second dimension.

One flag on the specifics: the supplier-contract and sponsor-ownership structure described here is well-established, but current contract terms and the state of IQVIA's data-access disputes (including long-running litigation with Veeva over data access) haven't been re-verified in this pass.

Similarweb (SMWB)

Q: I am wary of companies that perpetually operate at a loss, where the stock continuously declines for many years. To me this suggests a fundamental problem with the business model, or management simply not aligned with shareholder value — more often than not these go to zero rather than turning around. Is there any indication Similarweb is actually doing something transformative right now? Is there any justification for their current losses?

The heuristic is a good one — companies that lose money for years while the stock grinds down do usually die rather than turn — but on Similarweb the factual premise has gone stale, and the recent filings answer both questions concretely.

On "perpetually operating at a loss": no longer quite true. In Q1 2026 the GAAP operating loss narrowed to $4.4M and GAAP net loss to $6.4M, while the company posted non-GAAP operating profit of $2.4M and normalized free cash flow of $6.6M — its tenth consecutive quarter of positive normalized free cash flow. 2025 was its second consecutive year of non-GAAP operating profit (~$13M FCF for the year), and it carries ~$65–72M cash, no debt, and an untouched $75M credit line. The remaining GAAP loss is small and consists mostly of stock-based comp and amortization — a dilution cost worth counting, but a very different animal from a cash-burning melting cube. Net cash, no debt, self-funding: the "goes to zero" mechanics (refinancing walls, forced raises) aren't present.

On "doing something transformative right now": real, quantified evidence, not just narrative. AI-related sales reached 11% of Q4 2025 revenue, up from 8% in mid-2025, with AI revenue roughly tripling year over year. In Q1 2026 it signed a seven-digit LLM data-training contract with an existing big-tech customer, with a second large LLM contract expected in coming quarters. Contract quality is improving in ways that suggest durability: remaining performance obligations grew 18% to $297.7M, 64% of ARR is now multi-year (up from 49% a year ago), and $100k+ customers grew 12% to 461. The transformation thesis, plainly: the same clickstream dataset that sold as marketing-analytics seats is being re-sold as LLM training/grounding data to big tech, and that line went from experimental to 11% of revenue in about a year.

The parts that support the wariness, because they're real: growth has decelerated — FY26 guidance implies ~10%, down from ~15% — and overall net revenue retention is 98%, meaning the core seat business churns slightly faster than it expands; the AI line grows on top of a flattish base. Large deals are lumpy (Q4 missed guidance on contract-timing slippage). And note the word "normalized" before the cash-flow figure: unadjusted free cash flow in Q1 was $(0.3)M — the cash generation is real but thinner than the headline. The long stock decline also reflects genuine history: a 2021 IPO at growth-stock pricing followed by years of guidance resets before the discipline arrived (a new CFO joined this year with a mandate explicitly framed around monetizing the data asset).

Verdict: this doesn't fit the classic to-zero profile — those companies have debt, burn, and no buyer for what they make; Similarweb has net cash, positive cash generation, and a dataset big tech just started writing seven-figure checks for. But it also isn't a proven transformation: a slightly-churning core with a fast-growing AI line that's still only ~11% of revenue, priced at under 1x EV/Sales. The bet, stated plainly: the AI mix shift outruns the core's decay before panel-collection methods get harder. That's exactly why it's sized as option-like rather than core. The single best thing to watch: whether the second large LLM contract lands and whether AI revenue keeps compounding off 11% — if that line stalls, the base-rate skepticism wins.

Wiley (WLY)

Q: My assumption is every major AI is going to need to ultimately contract with every major publisher. I don't have anything hard to go on when saying this — it just seems logical. What is your take on this matter?

The logic is sound as a directional bet, but it needs amending in two important ways — because the version as stated sits on top of a legal question that's actively being decided, and the form of the contracting matters more than the fact of it.

The case for the assumption is real, and it has four legs. First, scarcity economics: frontier labs are compute-rich and quality-data-constrained, and peer-reviewed scientific text is among the highest-value-per-token corpora that exists — concentrated in a handful of publishers (Elsevier, Springer Nature, Wiley, Taylor & Francis, Oxford/Cambridge). Second, the checks are rounding errors: the observed deals run roughly $10–40M a year per publisher against labs spending tens of billions on compute — the cost-benefit of just paying is lopsided. Third, enterprise customers increasingly demand provenance and indemnification, which only licensed data provides. Fourth, there's a cascade dynamic: once two or three labs license a corpus, the others face both a capability gap and a worse litigation posture for not licensing — and publishers are improving their coordination (Wiley's Nexus, licensing on behalf of 36 publishers, is exactly a move to bundle the long tail and raise the table stakes).

The amendment the logic needs: the courts have partially undercut the training version of it. In 2025, US federal rulings (the Anthropic books case and Meta's authors case) held that training on lawfully acquired copies is fair use — transformative — while Anthropic's ~$1.5B settlement was about the piracy of acquisition, not training itself. If that line of jurisprudence holds (the NYT–OpenAI case and appeals are the ones to watch), then for pure model training a lab arguably needs to buy one legitimate copy of everything, not sign a license with anyone. Add the supply-side erosion: open access now covers a large and growing share of new science (mandated by funders), preprint servers carry much of the frontier, and synthetic data reduces marginal dependence on any single corpus. So "every lab must license every publisher for training" is the weakest form of the thesis — training licenses may prove episodic, one-and-done archive purchases rather than annuities.

But the inference version of the assumption is much stronger — and that's the one that matters for Wiley. Fair use is a training-time doctrine; it does not cover an agent retrieving, reproducing, and serving current copyrighted articles at query time. A clinical AI citing this month's literature, a corporate R&D copilot grounding on vetted chemistry, a research agent that must point to the authoritative version of record — those need live, licensed, recurring access, and there's no fair-use route around it. That's a per-seat or per-query annuity, not a one-time archive sale. And this is precisely the pivot Wiley's own disclosures describe: recurring inference pilots with pharma/chemical/space companies, the OpenEvidence deal (five-year licensing into clinical AI, with equity), and management guiding the recurring share of AI revenue to triple. The vetting layer compounds the case — as AI floods the world with plausible text, the peer-review stamp becomes more valuable at retrieval time, not less. Jurisdiction helps too: the EU's text-and-data-mining opt-out regime pushes toward licensing more firmly than US fair use does.

Two sizing cautions so the logic doesn't overrun the numbers. Even if the assumption fully plays out, the per-publisher checks observed so far ($20–40M/yr range) would be transformative for Wiley's multiple (high-margin recurring revenue on a ~$1.7B base at ~1.9x sales) without being transformative for its P&L — the re-rate comes from the market capitalizing a data-annuity line, not from revenue doubling. And "every major AI" overstates the buyer pool: it's really the handful of frontier labs with consumer/enterprise products and US legal exposure; open-weight and non-US players may never pay, and a buyer pool of five-ish labs versus many publishers is an oligopsony where the labs hold pricing power — another reason the publisher-coalition motion (Nexus) matters strategically.

Compressed take: right in direction, wrong in mechanism. Restated: "most frontier labs will end up paying the few publishers whose vetted corpora can't be substituted — but the durable money is inference-time grounding, not training, and the training-license wave may prove episodic." The June 16 print is a direct test: the recurring-vs-episodic split in the AI line is exactly the number that adjudicates between the two versions. Tail risk to flag honestly: if appellate courts extend fair use further than expected — or open-access coverage gets good enough — the licensing imperative weakens at the margin, which is why the NYT–OpenAI outcome is a thesis input for the whole publisher group, not just Wiley.

Q: You suggest the journals are under fire from preprint services and open access. My take is that the peer-review system is an essential check on avoiding the 'human centipede' problem — where information on the open web is increasingly generated by AI, and so there are risks when AI in turn ingests that information. Does the review process for WLY journals differ from that of preprints and open access? And if so, is there substance to this take?

There's real substance to the take — and it actually sharpens the thesis — but it needs one correction first, because two different things are getting bundled under "open access."

The correction: open access is a payment model, not a review standard. Most open-access journals — including Wiley's own large OA portfolio, PLOS, Nature's OA titles — are fully peer-reviewed; "open" refers to who pays (the author/funder, via article processing charges, instead of the reading library). So "under fire from open access" is a pricing threat — it erodes the paywall/subscription model and compresses revenue per article — not a vetting threat. The category the human-centipede argument applies to is preprints (arXiv, bioRxiv, SSRN), which get only light screening: scope checks, plagiarism filters, basic moderation — no expert evaluation of methods, statistics, or claims before posting.

So yes, the process differs materially. A Wiley journal article passes through editorial triage by a domain editor, evaluation by typically two-plus independent expert referees who challenge methodology and claims, mandated revisions, increasingly statistical and image-integrity screening plus AI-generated-text detection, named-author accountability tied to institutions, and — crucially — post-publication infrastructure: corrections, retractions, and the citable version of record. A preprint has none of that. The difference isn't that preprints are wrong (most eventually pass review somewhere); it's that the journal version carries a verified provenance chain and a maintained error-correction mechanism.

And the underlying mechanism is documented, not just intuitive. The "model collapse" literature — most prominently a 2024 Nature paper — showed that models trained recursively on AI-generated data degrade, losing the tails of the distribution first. As AI-generated text floods the open web (and the preprint servers — they're getting hit too), the share of verifiably human-originated, expert-checked text shrinks as a fraction of available training and grounding material. That makes the peer-review stamp exactly what the question says it is: a provenance filter whose scarcity value rises with the pollution level. It's the cleanest version of the Wiley moat argument: they don't just own content, they own a certification process — and certification is the thing AI can't synthesize, because its value comes from accountable humans staking reputations on it. This is even stronger at inference time than training time: an agent citing a retracted or fabricated paper is a liability event, and only the publishers maintain the retraction/version-of-record signal that prevents it.

The honest caveat — it's a big one and it's Wiley-specific: peer review is a leaky filter, and Wiley owns the cautionary tale. Its Hindawi acquisition collapsed into the largest paper-mill scandal in publishing history — on the order of 10,000+ retractions, dozens of journals closed, the Hindawi brand itself shut down. Paper mills and AI-assisted manuscripts are now hammering journals, not just preprint servers; reviewers don't rerun experiments or audit raw data; and the submission surge Wiley reports (+25%) is partly the AI-writing flood arriving at their own front door. So the right framing isn't "peer review = clean, everything else = contaminated." It's that peer review is a costly, maintained, accountable filter — imperfect, but the only one with institutional machinery behind it — and its economic value depends on publishers actually defending it. Wiley's post-Hindawi integrity investment is, in that sense, capex on the moat.

Net for the thesis: the take upgrades the Wiley argument from "they own good text" to "they operate the trust layer of the scientific record in an era when trust is the scarce input." It implies the durable revenue isn't selling the archive once — it's selling certified, current, retraction-aware access continuously, which is the inference-annuity case from the question above arriving by a different road. Two things to watch: whether labs start paying explicit premiums for verified-human/vetted corpora over scraped web (early signs yes — that's what every one of these licensing deals implicitly is), and whether Wiley keeps its integrity record clean post-Hindawi — because in this framing a second paper-mill scandal isn't an embarrassment, it's impairment of the core asset.

Q: Does AI open the door for more automated verification and/or replication of research? It seems like it should be one of the tiers of peer review baked into every process.

Yes — and this might be the most underexplored part of the whole publisher thesis, because AI changes the cost curve of verification, which has always been the binding constraint. Peer review checks what's cheap to check (plausibility, novelty, methodology-as-described) and skips what's expensive (does the code run, are the numbers internally consistent, does the data support the claims, does it replicate). AI attacks exactly the expensive part. But the tiers are arriving at very different speeds.

Tier 1 — automated integrity screening: already here, and Wiley is deploying it. Image-manipulation and duplication detection (tools like Proofig and ImageTwin), statistical consistency checks (recomputing p-values from reported test statistics, checking whether reported means are even arithmetically possible given sample sizes), plagiarism and tortured-phrase detection, and paper-mill signature screening. Post-Hindawi, Wiley built and launched its own AI-powered paper-mill detection service, and the industry runs a shared STM Integrity Hub. This tier is becoming standard intake screening — it happens before human review, exactly the "baked-in tier" the question describes. It exists because the Hindawi-class scandals made the cost of not having it explicit.

Tier 2 — computational reproduction: technically feasible now, not yet standard. With data- and code-availability mandates spreading, an AI agent can literally re-execute the analysis pipeline: pull the deposited data, run the deposited code, check that the figures and tables regenerate, flag where they don't. This is the genuinely transformative one, because it converts "reproducibility" from a years-later social process into a pre-publication compile check. Nothing about it is science fiction — it's agentic code execution, which is mature. The blockers are economic and social, not technical: review labor is currently free (referees are unpaid volunteers), so any machine tier adds real cost per paper that someone must absorb; data/code deposits are still incomplete; and — the uncomfortable one — open-access publishing runs on volume economics (revenue per article published), so journals' financial incentive is to reduce friction at acceptance, not add it. The honest answer to "why isn't this baked in everywhere already" is mostly that one sentence.

Tier 3 — empirical replication: AI helps at the margins, robots are the long game. Wet-lab replication can't be done by a language model; it requires self-driving labs and cloud-lab infrastructure, which exist in narrow domains (chemistry, materials science) but are nowhere near general or cheap. What AI can do today at this tier is triage: models trained to predict which findings will replicate (DARPA ran a program on this) perform respectably, so a "replication-risk score" attached to papers could direct scarce replication resources. Full automated replication-as-review is a decade-scale story, not a product-cycle one.

The investment-relevant part cuts both ways for Wiley. The bullish reading: machine verification strengthens the certification product (a stamp meaning "human-reviewed AND machine-verified" is worth more than either alone, especially to AI labs buying training/grounding data — "verified corpus" becomes a premium SKU); and it favors scale, since integrity-tech stacks amortize across millions of submissions, squeezing small publishers toward consolidation or toward agency arrangements like Wiley's Nexus — the big five get bigger. The bearish reading to hold alongside it: if automated verification becomes good, cheap, and independent, it commoditizes part of what journals sell. The overlay-journal model — preprint servers plus automated verification plus lightweight expert curation (eLife has experimented in this direction) — is the disintermediation scenario, where the verification tier exists but publishers don't own it. The likely equilibrium: accountability, liability, retraction infrastructure, and the career-credentialing function keep the journals at the center for a long time, but the moat narrows from "we are the only filter" to "we are the accountable filter."

The deepest version of the point: AI won't just verify papers — AI will increasingly write and even conduct the science, generating a flood of machine-assisted findings. In that world the certification layer becomes more central, not less, but it has to evolve from "two human referees" to "human judgment supervising a machine-verification stack." Whoever operates that stack credibly owns the trust layer of the scientific record. The publishers are the incumbents with the brand, the workflow position, and (post-Hindawi) the motivation — but it's an asset they have to build, and the capex shows up before the pricing power does. For the June 16 call and beyond, the tell would be Wiley talking about verification as a product (verified-corpus licensing, integrity-as-a-service) rather than only as a cost line.

One adversarial caveat to keep honest: any automated check becomes a target — papers will be optimized to pass the machines, the way text is already optimized to evade AI detectors. So the equilibrium isn't "verification solves fraud"; it's an arms race in which the side with more data about fraud patterns wins — which, conveniently, is another argument for scale.

Q: I don't understand the WLY business all that well. Some of it seems to be related to journals. Nexus and clinical seem different. Can you break down these segments, providing a concise summary of each one, and a rough estimate of the percent they are contributing to overall revenue?

The map, with FY2025 actuals (fiscal year ended April 30, 2025; total continuing revenue ~$1.66B; segment figures from SEC filings via FMP):

1. Research — $1,075M, ~65% of revenue. The heart of the company and the entire AI thesis. Two sub-pieces:

  • Research Publishing (roughly $940M, ~57% of total): the ~2,000 academic journals. Revenue comes from library subscriptions and the newer "read-and-publish" institutional agreements, plus gold open-access article fees (author/funder pays per article), plus publishing services for the ~900 scholarly societies whose journals Wiley operates. The "Advanced" journal family (Advanced Materials etc.) alone exceeds $70M and grows double-digit. Highly recurring, calendar-year renewal cycle.
  • Research Solutions / Intelligence (roughly $130–140M, ~8%): the picks-and-shovels — the Atypon publishing platform, society services, databases, and corporate/R&D-facing data products.

Everything stress-tested in the questions above lives in or attaches to Research: the AI licensing deals (journal corpus → labs), OpenEvidence (journal content → clinical AI), Nexus, and the Emerald acquisition (more journals).

2. Learning — $585M, ~35% of revenue. The other, slower half:

  • Academic (~$320–330M, ~19–20% of total): university textbooks and digital courseware. Structurally pressured (used/rental/OER substitution) but stabilized by digital.
  • Professional (~$255–260M, ~15%): trade and professional books — the For Dummies franchise, business/finance titles — plus corporate talent-development assessments (Everything DiSC, Five Behaviors). Book-corpus AI training deals get booked here (the original $23M FY24 GenAI deal was book content).

The two flagged items, properly placed — neither is a segment:

Nexus is a service inside Research, launched ~2025: Wiley acts as licensing agent for other publishers' content into AI developers — 36 partner publishers signed. Revenue today: not disclosed, almost certainly immaterial (low single-digit millions at most). Its significance is structural, not financial yet: a take-rate marketplace that scales without Wiley creating content, and it aggregates the long tail of publishers into a single negotiating bloc against the labs.

"Clinical" = COA (Clinical Outcome Assessments), also inside Research: Wiley licenses validated patient questionnaires and assessment instruments — the standardized forms used as endpoints in drug trials — to pharma, with an IQVIA distribution agreement. Per the Q3 call it grew from $0.8M in 2021 to ~$7M — i.e., under 0.5% of revenue. It's a proof-of-concept for "Wiley content as regulated-workflow infrastructure," not a needle-mover.

And the line that cuts across both segments: AI licensing was ~$40M in FY25 (~2.4% of revenue), $29M in Q1 FY26 alone — booked in Research when it's journal content, Learning when it's books.

The concise mental model: ~57% journals (the moat and the AI story), ~8% publishing infrastructure, ~35% books and courseware (the slow-melt legacy that pays the ~4.5% dividend), with AI licensing at ~2–3% and growing fast, and Nexus/COA as currently-tiny options on the agency and clinical-infrastructure models. The valuation question is whether the 65% earns a data-asset multiple before the 35% erodes — which is why the recurring-AI-revenue disclosure on June 16 matters more than any segment line.

Tempus AI (TEM)

Q: My concern with TEM is that nearly every high-profile bio IPO comes to market at an astronomical valuation, and as the reality and complexity of what they intend to achieve becomes evident, it's a continual erosion in price. Are there pathways where TEM could have 5x more revenue? What would various forward-growth scenarios look like?

The concern is well-founded — the "IPO at an astronomical multiple, then erode for years as reality intrudes" pattern is the base rate for high-profile diagnostics listings (Schrödinger, Recursion, 23andMe, Guardant and Exact Sciences for long stretches). The mechanism is real: TAM slides assume flawless execution, reimbursement arrives slower than modeled, and the multiple de-rates from "platform" to "lab services" as growth normalizes. So the right question isn't whether TEM can 5x in the abstract — it's whether there are specific, fundable pathways to ~$8B and what has to be true for each, held against that gravitational pull.

The math that disciplines everything. 5x the FY26 guide (~$1.6B) is ~$8B. The street already models TEM to ~$3.1B by 2030 — roughly 2x, an ~18% CAGR. So 5x is on no current analyst sheet: it needs ~26% revenue CAGR sustained for seven years, or a step-change from new modalities/M&A the consensus isn't underwriting. That gap is where both the upside and the erosion risk live. Today's business is ~$955M Diagnostics + ~$300M+ Data & Applications, and the Data line — not the lab — is what makes the 5x debate interesting, because it carries software economics the erosion-pattern names usually lack.

The five pathways to 5x — what each contributes and what must be true:

  • 1. Genomics volume + menu expansion (the base engine). Oncology testing compounding ~30% with new assay launches and pull-through. This is the most reliable lever and roughly tracks the street's ~2x to $3B. On its own it does not get to 5x — it's the foundation the other four build on. Must be true: volume growth doesn't decelerate below ~20% as the base scales.
  • 2. MRD / minimal residual disease (the highest-torque organic lever). MRD volume already grew ~500% YoY off a small base. Recurrence-monitoring is recurring by nature — patients test repeatedly over years — so it converts a one-time diagnostic into an annuity. A credible MRD franchise is a multi-hundred-million to billion-dollar line by itself. Must be true: Medicare/commercial reimbursement decisions land favorably; competition (Natera's Signatera is entrenched) doesn't cap share.
  • 3. Data & licensing scale-up (the multiple-justifying lever). TCV >$1.1B, 70+ pharma customers, Insights growing ~40%+, NRR ~126%. If this line compounds into a $1.5–2.5B business, it re-rates the whole company because it's high-margin and AI-grounding-relevant — the part that makes TEM a data owner rather than a lab. This is the lever most aligned with your own thesis. Must be true: pharma keeps paying for de-identified multimodal data at scale, and non-exclusive foundation-model co-builds materialize.
  • 4. New modalities / adjacencies (the step-change lever). Hereditary (Ambry), cardiology, neuro/psychiatry, radiology AI, and algorithmic diagnostics (Lens). Each new clinical area is a new TAM. This is how a diagnostics company breaks the ~$3B ceiling — but it's also where complexity and the erosion risk concentrate. Must be true: at least two adjacencies reach real scale without diluting focus or burning the balance sheet.
  • 5. M&A / consolidation (the inorganic lever). Tempus has been acquisitive (Ambry, Deep 6, etc.); part of any 5x plausibly comes from buying revenue, funded by stock if the multiple holds. Must be true: the equity stays strong enough to be a currency — which is circular, and the first thing the erosion pattern breaks.

Scenario table — forward revenue, with the levers each requires:

Scenario ~Revenue (7y) / multiple of today Implied CAGR What has to go right Read
Erosion case ~$2.5–3B · ~2x ~8–10% Genomics decelerates, MRD reimbursement stalls, Data growth normalizes; no new modality scales. Execution fine, story de-rates. Your base-rate fear — revenue still grows but the multiple compresses faster; the stock erodes even as the company "works."
Street case ~$3.1B (2030) · ~2x ~18% (to 2030) Levers 1–2 deliver, Data compounds steadily, no heroic adjacency. The consensus model. Consensus — solid, already in estimates; ~2x, not 5x. The market is paying ~6.5x sales for this.
Bull case (5x) ~$8B · ~5x ~26% Levers 1–3 all deliver and at least one adjacency (lever 4) reaches scale; Data line hits $1.5–2.5B. Seven years of high-20s growth with no balance-sheet break. Fundable, not forecastable — requires 4 of 5 levers firing; each is plausible, the conjunction is demanding.
Moonshot ~$12B+ · ~8x ~33%+ All five levers, including transformative M&A funded by a holding multiple, plus Data becoming a true foundation-model data platform. Tempus becomes the clinical-data layer of medical AI. Lottery upside — the reason to hold a small position; do not underwrite to it.

The honest synthesis. Yes, there are real pathways to 5x — and unlike a pure therapeutics binary, they're incremental and observable: you watch MRD reimbursement, Data TCV, and adjacency scaling quarter by quarter, so you're not betting on a single trial readout. That's what makes TEM less of a classic erosion candidate than the pattern suggests. But the 5x case requires four of five levers to fire over seven years, and the erosion pattern you describe is precisely what happens when two or three fire and the multiple — today ~6.5x sales — does the rest of the work downward. The single most important variable is lever 3 (Data): if the licensing line compounds into a genuine multi-billion data platform, TEM re-rates as a data owner and the 5x is reachable; if Data stalls and TEM stays a fast-growing lab, the street case caps it at ~2x and your erosion thesis likely wins on the multiple even if revenue grows. The tell to watch each quarter: is Data & Applications growing faster than Diagnostics? As long as it is, the data-platform thesis is intact; the quarter that flips is the quarter the erosion case gains the upper hand. Usual caveat: scenario framework, not a forecast — and explicitly not investment advice.

Sports-data duopoly — Genius Sports (GENI) + Sportradar (SRAD)

Q: Regarding SRAD and GENI, how does AI change the landscape for sports data? I understand the bull case relating to a lot more prediction markets increasing demand for sports betting. But how does AI — if at all — increase the value of this data?

The prediction-market bull case is really a demand story; the more interesting question is whether AI changes the unit value of the data itself. Five mechanisms where AI genuinely increases value, and two where it leaks away.

1. AI multiplies the number of sellable markets per game. The binding constraint on in-play betting has never been bettor appetite — it's pricing capacity. A human-supervised trading desk can only run so many live markets; AI models can price thousands of micro-markets simultaneously (next pitch, next possession, player props that re-price every few seconds). Every additional market is additional consumption of the underlying feed, and it's exactly what Sportradar's managed trading service (MTS) sells: operators outsource the model because building it in-house is hard. AI raises the ceiling on markets-per-event, and the official feed is the raw input for all of them.

2. Computer vision turns each game into orders of magnitude more data. The most underappreciated piece. Traditional play-by-play is hundreds of events per game; optical tracking is millions of positional coordinates. Genius owns this capability outright — it acquired Second Spectrum, which does the player-tracking for the EPL and NBA-grade optical work — and Sportradar's equivalent is its 4Sight/computer-vision stack. AI is simultaneously the collection technology (video → structured data, collapsing the cost of capturing depth) and the demand driver (those coordinates feed augmented broadcasts like BetVision, automated officiating support, coaching analytics, and richer bet types). The same rights now yield a much bigger dataset.

3. AI-generated media makes the long tail monetizable. Automated commentary, recaps in any language, and synthetic broadcast layers mean a third-tier table-tennis match or a lower-division soccer game — events where human production never penciled — can now carry a produced, bettable, watchable product. The data is the script for all of it. Both companies cover hundreds of thousands of events a year; AI raises the revenue per event at the bottom of that pyramid from near-zero to something.

4. Agents and prediction markets need a settlement-grade truth oracle. If AI agents are placing bets or trading event contracts, they need machine-readable, licensed, low-latency, legally safe ground truth — both to act on and to settle against. Scraped data doesn't work for settlement; official data is the oracle. The Kalshi–Sportradar deal is the early template. In an agentic world, the API call to the official feed is the toll booth.

5. Integrity services grow with the attack surface. Thousands of AI-priced micro-markets are also thousands of manipulation targets, and AI lowers the cost of coordinated fixing. Both companies sell integrity monitoring to leagues and regulators — that business scales with exactly the complexity AI creates.

Now the two leaks. First, the value-add layer can commoditize even as the feed doesn't. A Flutter or DraftKings can build its own AI trading models — they still must buy the raw rights-protected feed, but the services margin faces in-housing pressure from sophisticated operators, even as smaller ones outsource more. Second — and this is the big one — the leagues are the residual claimants. Every mechanism above raises the value of official data, but the rights are re-auctioned, and the leagues know what AI is doing to that value. AI also lowers the leagues' cost of collecting their own data (cameras plus computer vision), strengthening their threat to go direct. So a large share of the AI-created surplus gets transferred upstream at each renewal — that's why rights-cost inflation is flagged as the key endogenous concern for both. The NFL taking warrants in Genius is exactly this dynamic made explicit.

One marginal erosion worth knowing: AI makes unofficial data reconstruction cheaper — computer vision on a broadcast or pirate stream can rebuild play-by-play seconds behind real time ("courtsiding 2.0"). The official feeds keep the latency edge (in-venue collection) that matters for live betting, but enforcement against synthetic scraping becomes part of the moat maintenance.

Net read: AI is unambiguously positive for the value of the data — it multiplies markets, depth, monetizable events, and machine consumption, while the asset itself (real-world events under legal exclusivity) is the one thing AI can't generate. The investment question is the split: the duopoly's economics depend on whether market expansion and CV-driven product depth outrun what the leagues claw back at renewals. The line to watch in both names' filings: take-rate and gross margin on one hand, rights amortization and renewal terms on the other. The duopoly structure helps here too — with only two credible bidders for league rights, the auctions are less ruinous than they'd be with five.

Q: What is the distinction between GENI and SRAD on a high level?

At a high level they're the diversified incumbent versus the concentrated challenger — same business model, opposite portfolio construction.

Sportradar is the global utility. Founded in 2001, still run by founder Carsten Koerl (who remains a major holder), it's roughly twice Genius's size, profitable, and built on breadth: coverage across 80+ sports and on the order of 900k events a year, with marquee exclusive rights like the NBA, MLB, NHL, ATP tennis, and UEFA. Its strategy is full-stack: not just selling feeds but running the betting plumbing — live odds, managed trading services (MTS, where it effectively operates the risk book for bookmakers), streaming (bolstered by acquiring IMG Arena's portfolio), advertising (ad:s), and integrity. Because of the breadth, no single league renewal can break it, and the model behaves like an infrastructure compounder: take-rate expansion on a diversified rights base.

Genius is the concentrated marquee-rights play. It came public via SPAC in 2021 and is built on a handful of premium Anglo-American exclusives — most importantly the NFL official data rights (won in a 2021 bidding war against Sportradar, paying up and giving the NFL equity warrants), plus the NCAA and the Premier League (via Football DataCo). Two structural tilts distinguish it: first, the technology angle — its Second Spectrum acquisition made optical/skeletal tracking a core asset (it ran the NBA's tracking for years and does the Premier League's), which feeds products like BetVision, the in-stream NFL betting broadcast; second, the media layer — programmatic ads and fan-engagement products that monetize the same rights a second time. Higher growth (~31% vs ~12% on the FMP pull), but with concentration to match: the NFL relationship is both its crown jewel and its single point of failure, and the warrant structure means the league literally participates in the upside.

The duopoly framing is real but asymmetric: they've largely partitioned the rights map (Sportradar = global basketball/baseball/tennis/soccer breadth; Genius = NFL/NCAA/EPL depth), and the 2021 NFL auction is the cautionary tale of what happens when they do collide — rights inflation that took Genius years to digest. That's also why the leagues-as-residual-claimants point from the question above bites differently for each: a bad renewal is a margin headwind for Sportradar but a thesis event for Genius.

As investments: SRAD is the quality/steadiness expression — diversified, founder-led, profitable, the way to own the category with the least single-contract risk. GENI is the torque expression — more growth, more optionality (CV tech, BetVision, the media layer scaling against fixed rights costs), but the returns hinge on NFL economics at each renewal and on the media layer actually scaling. Roughly similar multiples (~3.0x vs ~3.6x EV/Sales), so the market isn't charging much for Genius's higher growth — arguably because it's pricing the concentration risk.

One caveat: exact current rights terms and renewal dates (especially the NFL deal's latest extension structure) should be verified in filings before leaning on them — the contours above are solid, but contract specifics move.

Q: Do SRAD and GENI differ substantially in how they are adopting AI technology or making their data more valuable in an AI era?

Substantially in emphasis, yes — even though the buzzword surface looks similar. The cleanest way to put it: Genius is applying AI to the data-capture and presentation layer; Sportradar is applying it to the pricing, risk, and personalization layer. One is making the data itself richer; the other is making each unit of data earn more.

Genius's AI center of gravity is computer vision. Second Spectrum gave it arguably the best sports-CV team in the business — years of running the NBA's official optical tracking, plus the Premier League's — and Genius has organized its product strategy around that: the GeniusIQ platform unifies the tracking layer, and the flagship expression is BetVision, the NFL stream with live odds and bet placement embedded directly in the broadcast, built on real-time CV understanding of the game. The same capability feeds its media/ads layer (audience products like FANHub targeting fans off sports data). So Genius's version of "making data more valuable in the AI era" is generative of new data: turning video into millions of skeletal coordinates per game, then turning those coordinates into augmented experiences and ad inventory. It's expanding what the rights yield.

Sportradar's AI center of gravity is the betting economics stack. Its signature AI products sit downstream of the feed: Alpha Odds (AI-driven, dynamically personalized odds that let each operator differentiate pricing), the MTS managed-trading business (ML risk management run as a service — effectively the outsourced quant desk for hundreds of bookmakers), the Vaix acquisition (deep-learning personalization and recommendations inside sportsbooks), and AI-driven integrity monitoring across its market surveillance. It has CV too — 4Sight and the Synergy Sports acquisition — so the capability gap is narrower than the positioning gap, but Sportradar's AI shows up as better decisioning per event: more markets priced, sharper risk, higher take per bet across an enormous diversified event base. Notably, it's also the one with the early agent/prediction-market posture — the Kalshi relationship makes it the settlement-oracle precedent.

Mapped to the five mechanisms above: Sportradar is strongest on mechanism 1 (AI multiplying priced markets per game) and 4 (the truth-oracle role), Genius on mechanism 2 (CV multiplying data per game) and 3 (AI-generated/augmented media). Both sell mechanism 5 (integrity). Which orientation wins depends on where the surplus lands: if the AI era's prize is decisioning — odds, risk, personalization at scale — Sportradar's infrastructure position compounds; if the prize is experience and attention — augmented broadcasts, in-stream betting, fan-data advertising — Genius's CV-first stack has the sharper edge. There's also a defensiveness asymmetry: Genius's CV layer creates proprietary data beyond the official feed (tracking data the league rights alone don't give you), while Sportradar's decisioning layer is the part big operators could most plausibly in-house — which loops back to the services-commoditization leak.

Caveats in the usual spirit: the product specifics here (Alpha Odds, Vaix, GeniusIQ, BetVision, Synergy, the Kalshi structure) are from desk knowledge plus this session's lighter pass — current scope hasn't been verified against the latest filings, and both roadmaps move fast. The thing to verify in a deep-dive batch is each company's disclosed revenue mix by product line — how much of Sportradar is now MTS/managed services and how much of Genius is BetVision/media — because that's where the AI orientation shows up in numbers rather than press releases.

Planet Labs (PL)

No questions logged yet.

ZoomInfo (GTM)

No questions logged yet.

Appendix: the peer-reviewed publishers compared — who is actually transforming for the AI era

The lens here is not who leverages journals more — it is who is embracing the AI era in a way the market under-recognizes, judged by the trajectory or transformation underway relative to company size. Financials from the main table (FMP, June 9 2026): RELX ~$92B cap / ~9.0x EV-S / +7%; Wolters Kluwer ~$38B / ~6.0x / +6%; Wiley ~$2.3B / ~1.9x / ~flat. The multiples are the recognition gap stated numerically.

RELX Wiley (WLY) Wolters Kluwer (WTKWY)
(a) Journal holdings — rank & domain Elsevier is the #1 journal publisher by articles and citations (~2,900 journals; The Lancet, Cell Press), dominant in life sciences and medicine. On top sits a layer no one else has: Scopus indexes ~29,000 journals from ~7,000 publishers — RELX owns the citation graph of everyone's journals, not just its own. Journals are roughly a third of company revenue; Risk and Legal are bigger. #3 commercial publisher (~2,000 journals), broad-spectrum with genuine franchise strength in materials science and chemistry (the Advanced family, >$70M and growing double-digit) plus exclusive publishing partnerships with ~900 scholarly societies — content it monetizes without owning outright. No citation-graph layer. Journals are ~57% of revenue — the only one of the three where the journal thesis is the stock thesis. Smallest journal estate: Lippincott's ~300 medical/nursing titles, largely society-owned, narrow clinical domain. But the crown jewel sits above journals: UpToDate — 7,600+ expert contributors continuously synthesizing the literature into care recommendations. A derived, perpetually-current expert layer, arguably more AI-grounding-relevant than raw journals.
(b) Journal-value rank 1 — largest, highest-impact corpus + the cross-publisher Scopus layer 2 — top-tier breadth, society leverage, no meta-layer 3 on journals per se — but UpToDate is a different, highly defensible asset class
(c) Brokering data beyond own journals (Nexus / clinical-style motions) Deliberate non-broker. Refuses corpus licensing to labs; everything embeds in its own grounded products — Scopus AI, ScienceDirect AI, Reaxys/Embase/ClinicalKey AI, Lexis+ AI and Protégé — and it is buying AI-natives (Doctrine, Apr 2026, second legal-AI deal in 24 months). The one quasi-brokering position: Scopus AI monetizes 7,000 other publishers' abstracts inside RELX's product — aggregation at the metadata layer. The only true broker of the three. $92M lifetime external AI licensing (three big-tech clients incl. AWS; Perplexity; Anthropic partnership); recurring inference pilots with pharma/chemical/space corporates; OpenEvidence licensing plus equity stake; Nexus — licensing agent for 36 partner publishers' content (brokering others' data on a take rate); COA clinical instruments distributed via IQVIA; Emerald acquired (Jun 2026) as more corpus to license. Middle path: distribution-brokering, not corpus-brokering. No licensing to labs, but pushing curated data into third-party surfaces: UpToDate inside Microsoft Dragon Copilot, M365 Copilot, Teams; an Epic pilot pairing GPT-4 with UpToDate content; an expanded OpenAI enterprise collaboration; and Medi-Span Expert AI shipping an MCP server so third-party agents can consume its medication data — genuine agent-layer exposure.
(d) Brokering rank 3 — by explicit choice; highest in-product monetization instead 1 — corpus licensing + agency model + clinical infrastructure, all live 2 — partner distribution + MCP/agent exposure, no corpus sales
(e) Transformation summary — embracing the AI era The most advanced AI operator — and the least transformed. Every division has shipped grounded-AI SKUs and it is consolidating AI-native challengers by acquisition. But this is sustaining innovation: AI features defending and extending an already-premium subscription machine. The business model is unchanged; the strategy is to make sure nothing changes. Business-model transformation, not feature addition. From subscription publisher to data licensor + licensing agent + clinical-infrastructure provider, with equity stakes in AI distribution (OpenEvidence). AI revenue went 0 → $40M (FY25) → $29M in Q1 FY26 alone, with the recurring share guided to triple. The company is becoming something it wasn't. Fast, urgent — and defensive. Under direct attack: OpenEvidence (free, ad-funded, $12B valuation, Jan 2026) is aimed squarely at UpToDate's ~$595M seat-license business. The response is real and rapid — Expert AI signed by >half of US hospital enterprise customers (~2,000 hospitals) within months — but the transformation protects existing dollars more than it creates new lines.
(f) Transformation magnitude vs company size Low–moderate — large absolute AI investment, immaterial against ~$12B revenue; fully recognized at ~9x sales High — a new revenue category already ~2.5% of sales and compounding, on a $2.3B company at ~1.9x sales; the multiple says the market sees a books company Moderate — product architecture genuinely rebuilt, but on ~€5.9B revenue the motion is value-protective; the market nets the OpenEvidence threat against the progress at ~6x sales

Verdict through the stated lens. If the question is who is most transforming to embrace the AI era, under-recognized by the market, the answer is Wiley, and it isn't close — it is the only one of the three where AI is creating a genuinely new business (external licensing, the Nexus agency, clinical infrastructure, equity in AI distribution) that is material relative to the company's size, priced at a multiple that embeds none of it. RELX is the best AI operator of the three, but operating excellence on a recognized premium franchise is the opposite of an under-recognized transformation — at ~9x sales the market has already written the AI-winner thesis into the price. The interesting second-order case is Wolters Kluwer: the >50% enterprise adoption of Expert AI in a matter of months is the single most impressive execution statistic among the three, and if OpenEvidence stalls at the hospital-procurement gate — where governance, liability, and expert-in-the-loop validation matter more than free access — the market's threat-discount becomes the under-recognition. Wiley is the transformation bet; Wolters Kluwer is the survival-mispriced bet; RELX is the quality compounder where the AI question is already answered in the multiple. Usual caveat: qualitative framework, not advice — and the June 16 Wiley print is the nearest falsification test for the first claim.