Review of Data Players in an Era of AI

How to read it. The two left columns carry the logic — the nature of the data bullets explain the moat (scarcity, provenance, why it can't be scraped); the AI-unlock bullets track concrete evidence of monetization (licensing deals, MCP/API exposure, grounded products) versus where it's still just potential. The tier under each is the 1–10 score. The three financial columns then ground the discrepancy: green EV/Sales = cheap for the profile, red = rich.

The full table

◆ owner (corpus is the asset) · ○ operator (data is an input). Scroll sideways — the Company column stays pinned.

Company · IR	Data						Financials (FMP, Jun 9 2026)			Assessment
Company · IR	Nature of the data · tier	Data trajectory (stock vs flow)	Position on the AI-unlock curve · tier	Current AI contracts & counterparties	Possibilities for additional contracts	AI risks — what stands to lose	Market cap	EV / Sales	YoY rev growth	Valuation & discrepancy	Convexity & why	Other endogenous concerns	Hype factor (market awareness)	Catalysts
Financial-market data
CME GroupCME○ operator IR / presentations ↗	Data 6 · Neutral Derivatives pricing & trade data High-margin byproduct of the exchange	Growing Derivatives data flow grows with record volumes	AI 5 · Neutral Sells valuable data, but it's not the thesis	~ desk note Sells market data conventionally	Derivatives data into quant/agent stacks	Minimal — clearing/execution moat unaffected	~$93B ~ EV est	~15x	+6%	Disc 3 · Low Premium, well-understood Owner-ish, but data isn't the re-rate	Low Priced, data not the driver	Volume cyclicality; FMX (BGC) attacking rates franchise	Low Not a data-AI story	Volume cycles; data pricing
FactSetFDS◆ owner IR / presentations ↗	Data 6 · Neutral Entity-linked financial data: fundamentals, estimates, ownership, transcripts 'Symbology' deep ticker-linking is the connective tissue agents need But much content is aggregated/licensed, not owned — caps the moat Workflow terminals for buy/sell-side	Steady flow Coverage expands steadily; much content aggregated, not originated	AI 7 · High Conversational FactSet Mercury shipped; 48/50 top clients on AI tools Clean, entity-linked data is ideal RAG fuel for finance copilots Up-ish the curve Aggregated data limits licensing leverage	~ desk note FactSet Mercury + transcript AI; aggregated content limits licensing	Symbology/entity-linking as agent infrastructure	The terminal seat is the product — agents directly substitute analyst workflows Aggregated (non-owned) content gives least pricing defense	~$9.0B ✓ FMP	~4.3x	+5%	Disc 6 · Neutral ~4.3x EV/Sales on +5% growth — quality at a modest multiple Aggregated (non-owned) data caps the moat Modest favorable gap on metrics	Moderate Quality franchise at a modest multiple — some re-rate optionality Aggregated content caps the upside Balanced	Content-licensing input costs (incl. CUSIP) squeeze margins CEO transition; retention metrics softening	Med — as threat De-rated with the info-services group in Feb 2026	Retention metrics; Mercury adoption
Intercontinental Exch.ICE◆ owner IR / presentations ↗	Data 8 · High Dominant US mortgage data (Black Knight/Ellie Mae) — origination/servicing graph Pricing & fixed-income reference data Hard-to-replicate corpus inside an 'exchange' wrapper	Cyclical flow Mortgage data flows with origination cycle; pricing data steady	AI 6 · Neutral Steadily productizing pricing/reference data Mortgage data graph is AI-relevant Mid on the curve	~ desk note In-product mortgage-AI; data feeds sold conventionally	Mortgage-graph grounding for housing/credit agents	Minimal — transaction infrastructure; some data products commoditized	~$80B ~ EV est	~10x	+6%	Disc 4 · Neutral A real owner screens miss (files as an exchange) Mostly priced	Low Quality priced Limited asymmetry	Mortgage tech is deeply cyclical — bought at the top Black Knight deal debt still being digested	Low Read as an exchange, never as a data-AI play	Mortgage cycle; IMB platform wins
Moody'sMCO◆ owner IR / presentations ↗	Data 9 · High Credit ratings (MIS) + Moody's Analytics Orbis: largest private-company database (~500M entities) Default histories + ownership graph — decision-grade Essential grounding for credit agents, KYC, supply-chain AI	Growing Orbis entity graph keeps expanding (~500M+ entities) Ratings/transcript flow continuous; issuance cyclical	AI 9 · High Early OpenAI partnership; Research Assistant copilot MCP distribution into Claude/ChatGPT/Copilot Packaging data for agentic workflows — furthest on distribution High — arguably best-executed, hence richly priced	✓ deep dive Early OpenAI partnership; Research Assistant copilot MCP distribution into Claude/ChatGPT/Copilot No raw licensing — productized access only	Agentic KYC/credit-memo workflows priced per seat Orbis private-company graph as agent grounding	Analytics research/tools face AI commoditization; ratings are regulatorily protected KYC/compliance products meet AI-native challengers	~$79B ✓ FMP	~11x	+9%	Disc 2 · Low Best business + furthest-along AI ~11x sales / ~40x earnings to match Thinnest discount; DCFs flag it rich	Low Best business, thinnest discount, DCF flags it rich Limited upside → low convexity	Ratings revenue rides the debt-issuance cycle Duopoly position invites periodic antitrust/regulatory attention	High Best-executed AI strategy is consensus; it's in the ~11x	Agentic product attach rates Ratings issuance cycle Orbis monetization moves
MorningstarMORN◆ owner IR / presentations ↗	Data 7 · High Fund/ETF data, star & analyst ratings; DBRS credit ratings PitchBook private-markets/VC dataset is the scarce crown jewel Fund data feeds advisor copilots	Growing PitchBook's private-company universe compounds with VC/PE activity	AI 5 · Neutral Mo chatbot + PitchBook AI features Monetization mostly stays in-product Mid on the curve	~ desk note Mo assistant; PitchBook AI features; in-product only	PitchBook private-market data licensing to AI deal tools	Fund research commoditized by AI summarization; ratings brand defensible PitchBook data scraping/inference by AI tools	~$7.0B ✓ FMP	~3.4x	+8%	Disc 6 · Neutral ~$7.0B cap, ~3.4x EV/Sales on +8% growth Cheap for a PitchBook-owning franchise Market under-paying for the private-markets data	Moderate–High PitchBook AI-deal-sourcing optionality, cheaply priced No hard catalyst Cheap enough to tilt positive	Founder (Mansueto) voting control PitchBook decelerated with the VC downturn; DBRS is issuance-cyclical	Low PitchBook's AI value ~absent from the narrative	PitchBook growth; advisor-AI launches
MSCIMSCI◆ owner IR / presentations ↗	Data 8 · High Indices (World, EM) portfolios are built and measured against ESG/climate ratings, Barra factor/risk models, Burgiss private-asset data Benchmarks + factor models are chokepoints Index licensing is a recurring toll-road	Growing Index/factor data grows with markets; private-asset (Burgiss) expanding fast	AI 7 · High IndexAI connector; 'train clients' LLMs' roadmap Solid enterprise APIs Less aggressive than S&P/Moody's Mid/high — capable but measured	~ desk note IndexAI connector; 'train clients' LLMs' roadmap — no licensing $ disclosed	Benchmark/factor licensing to agent platforms	ESG/analytics tools commoditized by AI; index licensing protected	~$44B ✓ FMP	~16x	+10%	Disc 3 · Low ~14x sales — one of the richest here Priced as the premium compounder it is No discount to the quality	Low Richest multiple here Least convex — priced for the quality	Client concentration in fee-pressured asset managers US political backlash against ESG products	Med AI seen as feature, not thesis	Index flows; ESG/private-asset data attach
NasdaqNDAQ◆ owner IR / presentations ↗	Data 7 · High 100+ proprietary market-data feeds Index & analytics data products A licensing toll-road like S&P benchmarks	Growing Market data grows with volumes; Verafin fraud signals compound	AI 6 · Neutral Feeds quant/agent workflows Productized data Up the curve on data productization	~ desk note Verafin AI (fraud), market-data feeds; no LLM licensing line	Surveillance/fraud agents; index licensing	Minimal core risk; market-data products face some AI substitution	~$49B ~ EV est	~12x	+8%	Disc 3 · Low Premium valuation reflects the toll-road Quality owner, little discount	Low Priced toll-road Limited asymmetry	Adenza acquisition debt + integration Crypto-listings exposure adds volatility	Low AI in products, not in the multiple	Fin-crime AI growth; data ARR
S&P GlobalSPGI◆ owner IR / presentations ↗	Data 9 · High Credit ratings, Capital IQ fundamentals/transcripts, Platts benchmarks S&P Dow Jones Indices + Mobility (CARFAX) Benchmarks are licensing toll-roads AI can't route around The grounding layer any financial LLM/agent needs	Growing Daily benchmark prints, transcripts, fundamentals — relentless flow CARFAX events + Mobility add new streams	AI 9 · High Kensho LLM-ready API live since Nov 2024; 300+ customers Anthropic MCP connector + Claude Cowork plugin (Feb 2026) Cohere North partnership (Jun 8, 2026) — sovereign/regulated AI Distribution into Claude, ChatGPT, Gemini, Copilot The most aggressive everywhere-the-agents-are strategy	✓ deep dive Kensho LLM-ready API (Nov 2024), 300+ customers (launch) Claude Cowork plugin + Anthropic MCP (Kensho) Cohere North partnership, Jun 8 2026 (PR)	Per-seat / usage pricing for agentic data access Benchmark licensing to agent platforms (toll-road extension) Private-markets data into AI workflows	Capital IQ desktop seats at risk as agents answer directly (why it sells the data INTO agents) Ratings & indices largely insulated	~$126B ✓ FMP	~8.8x	+8%	Disc 3 · Low ~8.8x EV/Sales on +8% growth — premium largely intact Top-tier AI execution already recognized in the multiple Quality fully priced; no metric discrepancy	Low Quality + best-in-class AI execution already in the multiple Limited discrepancy on metrics Modest two-way payoff	IHS Markit integration legacy; Mobility (CARFAX) is auto-cyclical Index fee compression a slow structural drag	Med-High AI execution is consensus among analysts; the multiple carries only a modest sector AI-threat discount	AI-access revenue disclosure (none yet) More agent-platform embeds Ratings cycle + index flows
Professional-information data (legal · tax · IT advisory)
GartnerIT◆ owner IR / presentations ↗	Data 8 · High 45+ yrs of proprietary syndicated IT/business research from ~2,000 analysts Magic Quadrants & Hype Cycles are de-facto standards CIOs buy on Price, salary & contract benchmarks from thousands of engagements Behind a hard paywall — not on the open web, not freely scrapeable >75% of contract value multi-year recurring, embedded in workflows	Steady — watch the flow Analyst output paced by headcount; inquiry/benchmark data grows with clients CV slowdown = the inflow risk: fewer clients → less peer data	AI 5 · Neutral Two-sided: AI could commoditize 'advice' or make its data the grounding layer Rolling out AskGartner inside client licenses Has NOT licensed its corpus to labs — keeps it walled Contract-value growth slowed to ~1–5% — the market's disruption tell Early on the curve; data-as-grounding thesis unproven	✓ deep dive None — AskGartner ships inside existing client licenses AskGartner live across research portal (example)	Corpus-grounded agent for enterprises (license upsell) Selective API access to benchmarks/peer data Price/SLA tiers for AI-assisted research	The core product IS advice — generalist AI is a direct substitute Seat-based research licenses are the exposed surface Conferences/consulting more defensible	~$10.5B ✓ FMP	~2.1x	+4%	Disc 8 · High ~2.1x EV/Sales for a 77%-gross-margin, mostly-recurring franchise The multiple embeds a full AI-disruption outcome; CV growth ~1–5% is the operational tell Cheapest quality owner on the board on metrics	High · quality-convex Profitable recurring base at ~2x sales bounds the downside Large upside if AI proves additive to the franchise Cheap quality + two-sided AI = positive convexity	Conference/consulting segments are macro-cyclical EPS growth leans on buybacks; sales-force productivity in question	High — as threat Narrative casts Gartner as an AI casualty; AskGartner and the paywalled corpus get little credit	Contract-value growth stabilization (the single tell) AskGartner engagement disclosures Buyback pace
Thomson ReutersTRI◆ owner IR / presentations ↗	Data 9 · High Westlaw: case law, statutes, annotations built over a century Editorial headnotes/KeyCite are irreplicable human layers Practical Law, Checkpoint (tax), Reuters News Legal/tax = highest-value, lowest-hallucination-tolerance use cases	Steady compounding Case law grows with the courts — slow, perpetual accretion Editorial annotations (headnotes/KeyCite) compound on top	AI 8 · High CoCounsel scaling fast — ~1M AI users AI-native Westlaw does grounded retrieval over its corpus Monetizes the data itself High — clear legal-AI leader	✓ deep dive No corpus licensing — deliberate walled strategy CoCounsel: 1M professionals, 107 countries (Feb 2026) (PR) Building proprietary LLM for regulated use cases	Selective agent-platform access to Westlaw (MCP-style) CoCounsel 10x user target = the in-product unlock Tax/audit agentic suites later in 2026	Legal research workflow is the AI battleground — Harvey, Legora, generalist agents Westlaw seat pricing under pressure if agents do the research Reuters news commoditized by AI summarization	~$78B* ✓ FMP	~9.0x	+7%	Disc 3 · Low ~9x EV/Sales on +7% growth — a modest AI-threat discount against its quality CoCounsel at 1M users is distribution the multiple under-credits Premium franchise; the discount is partial, not deep	Low Priced quality; AI leadership reflected Limited convexity	Woodbridge (Thomson family) controls ~70% — governance is theirs Print/legacy declines largely done; tax season concentration	High — as threat Market narrative treats agentic AI as a threat to legal-research seats; CoCounsel distribution under-credited	CoCounsel next-gen GA + adoption metrics ACV growth reacceleration (the proof point) Competitive data vs Harvey/Legora/Claude Cowork
Credit · identity · risk data
EquifaxEFX◆ owner IR / presentations ↗	Data 8 · High The Work Number — unique employer-sourced income/employment records Verified income/employment ground-truth no LLM can infer Utility/telecom payment data extends the picture Gating data for lending, hiring, benefits Contributory — employers feed it (network effects)	Compounding The Work Number records keep growing via payroll integrations Every paycheck is a new record — true flow asset	AI 6 · Neutral EFX.AI built into new product models FCRA permissible-purpose rules cap AI exposure Monetization stays inside regulated rails Mid — gated by regulation, not capability Re-rate is cyclical more than AI-driven	~ desk note EFX.AI in-product; FCRA limits external exposure	Verified-income rails for lending/hiring agents (permissioned)	AI cash-flow underwriting could route around bureau scores at the margin AI-driven synthetic-identity fraud raises cost of trust	~$20B ✓ FMP	~4.0x	+7%	Disc 7 · High ~$20B cap, ~4.0x EV/Sales on +7% growth Cheap for the owner of The Work Number Re-rates on the lending/hiring cycle + verified-income AI demand	High Unique income/employment data at a low multiple FCRA caps direct licensing, but the asset is irreplaceable Cheap + cyclical-recovery optionality = convex	2017 breach legacy = elevated security/regulatory burden Mortgage + hiring volumes are the real earnings driver near-term CFPB / FCRA scrutiny is permanent	Low AI angle absent; mortgage cycle dominates the narrative	Mortgage/hiring recovery; TWN records growth; any agent-rail pilots
ExperianEXPN.L◆ owner IR / presentations ↗	Data 8 · High Third global credit bureau + marketing/identity/fraud data Best organic growth of the three bureaus Verified credit/identity data with network effects	Growing Same bureau flow; strongest organic data investment of the three	AI 6 · Neutral AI products across credit & fraud FCRA-style rules cap ecosystem exposure Mid on the curve	~ desk note Ascend platform AI; in-product	Same permissioned-rails option as EFX/TRU	Same as the other bureaus; strongest product diversification of the three	~$45B* ~ EV est	~6.5x	+7%	Disc 6 · Neutral Reasonable bureau multiple Only friction is access (London listing) Quality peer to EFX/TRU	Moderate Quality + reasonable price Regulation caps the convex upside Balanced	UK listing discount; Brazil FX exposure	Low UK listing keeps it out of the AI conversation	Cycle; NA mortgage volumes
FICOFICO◆ owner IR / presentations ↗	Data 6 · Neutral The FICO score — decisioning standard embedded in US credit More algorithm/standard than raw corpus But the score is a data product with monopoly economics	Derived flow Scores recompute on bureau flow; FICO originates little raw data	AI 6 · Neutral Own FFM foundation model AI lending agents still need an accepted standard Mortgage-pricing change is a catalyst Not a corpus play	~ desk note FICO Foundation Model (FFM) announced; platform AI	Score-as-API inside lending agents	The central AI risk case: AI-native underwriting bypassing the Score Lenders' in-house models + FHFA score competition (VantageScore 4.0)	~$28B ✓ FMP	~14x	+15%	Disc 5 · Neutral ~14x EV/Sales on +15% growth — still premium on metrics Moat contested (VantageScore push, AI underwriting) Two-sided	Moderate De-rated standard with mortgage-pricing optionality But expensive on sales (~14x) Two-sided	Pricing-power backlash: FHFA pushing VantageScore competition in mortgages Revenue concentrated in B2B scores; software segment unloved	Med Debate is pricing power, not AI	Mortgage-score pricing; platform ARR
LiveRampRAMP◆ owner IR / presentations ↗	Data 7 · High Identity graph & data-collaboration network (25k+ publishers) Clean-room identity for the post-cookie/AI-data era	Maintained Identity graph is refresh-maintenance, not accumulation	AI 6 · Neutral Well-placed for AI-data era But the story is now M&A	~ desk note Identity/clean-room infra relevant to AI data flows	—	Acquisition pending — risk transfers to Publicis	~$2.3B ~ EV est	~3.0x	+10%	Disc 2 · Low Being acquired ~$2.5B by Publicis Off the board as a standalone bet Signal: ad-holdcos paying up for identity data	Low Taken out — payoff capped by the deal price	Deal-close risk is the only variable left (~$38.50 cash)	Low Story is now the Publicis acquisition	Deal close (~$38.50)
TransUnionTRU◆ owner IR / presentations ↗	Data 7 · High Credit bureau + identity resolution (Neustar) Links offline identity to digital identifiers Identity graphs matter more as AI agents transact Contributory bureau data with network effects	Growing Credit + identity events flow with economic activity	AI 5 · Neutral OneTru platform, TruIQ agents Identity products quietly AI-relevant FCRA-capped exposure like Equifax Mid on the curve	~ desk note OneTru platform, TruIQ agents; in-product	Identity verification for AI-agent transactions	Same bypass risk as EFX; identity products partly hedge it	~$13.5B ✓ FMP	~3.9x	+8%	Disc 6 · Neutral Cheapest of the three bureaus Modest favorable gap Same regulatory ceiling	Moderate Cheapest bureau + cycle/identity optionality FCRA caps the convex upside Balanced	Neustar deal leverage; UK consumer business weak Same CFPB overhang	Low Same as EFX — cycle story, not AI story	Cycle turn; Neustar identity products
VeriskVRSK◆ owner IR / presentations ↗	Data 9 · High Decades of contributory claims, loss & property/peril data Nearly all US P&C insurers both feed and buy it back Catastrophe models built on the loss history Near-monopoly; no AI lab can rebuild it	Steady compounding Contributory model: every insurer claim feeds it, by contract Cat-event data grows with each season	AI 5 · Neutral Generative/agentic AI in underwriting/claims products Consortium-locked — not licensed to the open ecosystem Value unlock in-product, not via licensing Mid — deepest moat, deliberately walled	~ desk note Consortium AI in underwriting/claims products	Walled option: claims-history grounding for insurance agents	Insurers building AI on their own claims data could weaken the consortium pull	~$24B ✓ FMP	~9.0x	+7%	Disc 5 · Neutral Deep moat, but ~12x sales / ~7% growth Fully paid for Quality High, value Low-ish	Low–Moderate Near-monopoly data, but premium & walled Bounded downside, limited upside Low asymmetry	Consortium members push back on pricing; class actions over contributory data use	Low-Med Quality priced; AI not separately valued	Product attach; pricing renewals
Healthcare · life-sciences data
CencoraCOR○ operator IR / presentations ↗	Data 5 · Neutral Pharmacy/dispensing & distribution data Optimizes thin-margin logistics	Steady flow Distribution data tracks volumes	AI 3 · Low Logistics input, not sold	~ desk note Logistics AI internal	—	Low — physical distribution	~$54B ~ EV est	~0.1x	+10%	Disc 4 · Neutral Fair defensive distributor Data-rich, not a data owner	Low Defensive, data not a driver	Drug-pricing policy; thin-margin model	Low Not an AI story	Distribution volumes
Definitive Health.DH◆ owner IR / presentations ↗	Data 7 · High Healthcare commercial intel: providers, claims, affiliations, install-base 'The ZoomInfo of healthcare' — sells intelligence to life-sciences/med-tech A pure data owner, not a marketplace Continuously refreshed healthcare-entity graph	Slowing Refresh continues but shrinking revenue funds less data collection	AI 6 · Neutral Real owner, but AI is as much threat as tailwind Limited AI productization so far Mid/behind — business being repriced Erosion risk from AI-generated provider signal	~ desk note None disclosed	Healthcare-commercial grounding data for pharma AI	AI-generated provider intelligence directly substitutes the core product — erosion already visible	~$0.1B ~ EV est	~2.0x	−8%	Disc 6 · Neutral ~$0.1B cap, ~2x EV/Sales on declining revenue Distressed micro-cap; the data is better than the equity Cheap for existential reasons	High · distressed Distressed micro-cap → option on stabilization or M&A Declining revenue is the live left tail Cheap healthcare-commercial data if it survives	PE overhang (Advent), serial goodwill writedowns, micro-cap liquidity	Low Micro-cap; no AI narrative attaches	Revenue stabilization; strategic review odds
DoximityDOCS◆ owner* IR / presentations ↗	Data 6 · Neutral Verified network of most US physicians The asset is the audience/engagement, not a corpus Workflow tools for doctors	Saturated graph Most US physicians already on it — the graph is mature Engagement/newsfeed data still grows; the asset is breadth, not flow	AI 6 · Neutral Strong AI tools (Doximity GPT), huge engagement But no AI revenue in guidance Data asset is the audience, not a corpus	~ desk note Doximity GPT free for physicians; ad AI in-product	Clinician-verified channel for healthcare AI distribution	Physician attention shifting to AI clinical tools (OpenEvidence et al.) Pharma ad budgets could follow attention into AI channels	~$3.8B ✓ FMP	~5.6x	+13%	Disc 5 · Neutral ~$3.8B cap, ~5.6x EV/Sales on +13% growth Far from the ~18x I'd assumed — reasonable now Verified clinician graph; audience-not-corpus caps licensing	Moderate Verified clinician graph + AI tools, now at a fair multiple Audience-not-corpus caps the data-licensing upside Balanced after the de-rate	Pharma ad-budget concentration; engagement metrics are the whole story	Med Was priced for AI hopes; now reset to fair	Ad market; AI tool engagement
ElevanceELV○ operator IR / presentations ↗	Data 6 · Neutral Claims/care-management data via Carelon Latent separable data asset Used to lower its own medical costs	Steady flow Claims flow with membership; flat membership = flat flow	AI 4 · Neutral AI care-management lowers internal costs Closest operator to a separable data asset Still not pure-play	~ desk note Carelon internal AI	Separable claims-data asset (never signaled)	Low direct risk; AI mostly a cost lever	~$92B ~ EV est	~0.4x	+5%	Disc 5 · Neutral Cheap, but on insurer fundamentals Latent data optionality (Carelon) Cyclical	Moderate De-rated insurer with latent data optionality Cyclical, not a data re-rate Mildly positive	Medical-cost trend + Medicaid redeterminations; ACA subsidy politics	Low Insurer story	Medical-cost trend; Carelon growth
GoodRxGDRX○ operator IR / presentations ↗	Data 5 · Neutral Rx-pricing & consumer prescription-behavior data Unique data, but an input to a discount platform Platform under structural pressure	Steady flow Pricing data flows; nothing accumulating in value	AI 4 · Neutral Data feeds the platform; not licensed as a corpus Limited AI productization	~ desk note None disclosed	Rx-pricing data into consumer-health agents	AI agents compare drug prices directly, disintermediating the front end	~$0.9B ~ EV est	~1.3x	~flat	Disc 4 · Neutral Cheap, but pressured core Marginal owner with hard-to-monetize data	Moderate · binary Cheap with stabilization optionality But structural pressure on the core Binary-ish	PBM dependence — a single partner change (Kroger '22) cratered it once	Low No AI narrative	Platform stabilization
Guardant HealthGH◆ owner IR / presentations ↗	Data 9 · High Liquid-biopsy genomic + clinical-outcomes data in oncology Proprietary, scarce — a direct Tempus peer Longitudinal molecular profiles track tumor evolution Cannot be assembled from public sources	Compounding fast Test volumes +25–35%/yr; each test extends longitudinal profiles	AI 6 · Neutral Pharma data partnerships + co-development, earlier-stage Smart Platform multiomic insights Building the 'co-develop on our data' motion Mid — monetization layer still forming	~ desk note Pharma data partnerships (earlier-stage than Tempus); Smart Platform	Tempus-style co-builds on liquid-biopsy data	Interpretation commoditizes; raw assay + outcomes data is the defensible part	~$17B ✓ FMP	~17x	+33%	Disc 5 · Neutral Scarce data, but ~12x sales and unprofitable Analyst upside exists Expensive growth, not cheap	Moderate Scarce-data optionality, but ~12x sales + unprofitable cap it More a growth bet than an option Balanced, positive tilt	Cash burn continues; patent litigation history with Natera Screening (Shield) economics still unproven at scale	Med Priced as diagnostics growth; data angle secondary	MRD reimbursement; pharma deal announcements
IQVIAIQV◆ owner IR / presentations ↗	Data 9 · High World's largest pharmacy-claims & prescription dataset (ex-IMS Health) Population-scale real-world evidence across global Rx Clinical-trial operational data as the largest CRO (ex-Quintiles) De-identified, compliance-grade — built under HIPAA/GDPR, unscrapeable Sold to virtually every major pharma	Compounding Rx/claims flow is continuous and population-scale Trial operational data compounds with every study run	AI 7 · High IQVIA.ai unified agentic platform (Mar 2026): 150+ agents deployed NVIDIA partnership since Jan 2025 — custom foundation models on its data 19 of top 20 pharma already using IQVIA agents; 100+ AI patents Builds agents ON the data rather than licensing it out No longer latent — monetization architecture is live	✓ deep dive NVIDIA partnership (Jan 2025) → IQVIA.ai platform, Mar 2026 (PR) 150+ agents live; 19 of top-20 pharma using them (report) 100+ AI patents; agents built ON proprietary data, not licensed out	Agent subscriptions as a separate revenue line RWE feeds for medical LLMs (compliance-wrapped) Trial-design agents priced on outcomes	CRO services half is labor-heavy — AI compresses what pharma will pay for it Pharma in-housing analytics with AI tools	~$31B ✓ FMP	~2.7x	+6%	Disc 7 · High $16.3B FY25 rev, +5.9% (~7% TTM) Low-single-digit sales multiple for unique data ~$13B net debt is the caveat	High ~2.7x EV/Sales on +6% growth — cheap for the scarcest Rx data Locked in compliance contracts; low AI surface today Cheap + latent-unlock optionality = convex	~$13B net debt limits flexibility CRO bookings cyclical; pharma R&D budgets squeezed (IRA effects)	Low → rising Cheapest scarce-data name; IQVIA.ai barely registers in the multiple yet	Next earnings: ~late July 2026 (Q1 reported May 5 — beat; EPS guide raised) IQVIA.ai adoption: now 192 agents / 64 use cases; watch for monetization disclosure (Q1 call) R&DS backlog $32.7B (+5.3%); Q4 book-to-bill 1.18x — bookings reacceleration is the proof point $1.2B buyback remaining ($552M done in Q1) Duke obesity-trials collaboration (Feb 2026) — fastest-growing trial category De-leveraging from 3.62x / $13.9B net debt frees the multiple
NateraNTRA◆ owner IR / presentations ↗	Data 9 · High Genetic-testing / cfDNA data (MRD, prenatal, transplant) Large, fast-growing proprietary genomic dataset Outcome-linked longitudinal data is the durable asset Same scarce-data position as Guardant/Tempus	Compounding fast Fastest test-volume growth in the group; outcome links accrue with time	AI 6 · Neutral Owns the data; data-layer monetization still maturing Strong clinical-validation pipeline feeds the dataset Files as diagnostics, so screens miss it Mid on the curve	~ desk note Data feeds pharma trials; in-product AI	Outcome-linked genomic licensing	Same as GH — value migrates from interpretation to the longitudinal data	~$32B ✓ FMP	~12x	+36%	Disc 5 · Neutral Irreplaceable data, ~12x sales on ~30% growth Quality High; multiple says priced, not discounted Volatile equity	Moderate Data optionality vs a rich ~12x multiple Roughly balanced, slight positive tilt	Reimbursement concentration (Medicare MRD decisions) Billing-practice scrutiny; GH litigation	Med Same — growth story, data unpriced	MRD adoption; new indications
Tempus AITEM◆ owner IR / presentations ↗	Data 9 · High Multimodal clinical + genomic data (~500-PB) pairing sequencing with clinical records Scarcest, most valuable category for biomedical AI — unscrapeable Built explicitly as an AI data company 140% net revenue retention on Insights/data Linked outcomes data is what makes it irreplaceable	Compounding fast ~300PB and growing; every test adds linked clinical+genomic data (Q1 letter) Sequencing volumes growing ~25–30%/yr — the corpus is the byproduct of revenue	AI 8 · High $200M AstraZeneca/Pathos deal (Apr 2025): largest oncology foundation model Total remaining contract value >$1B; non-exclusive — can resell the motion Data customers: AZ, Novartis, Merck KGaA, Takeda, Boehringer, United Therap. Illumina collaboration trains genomic algorithms on its multimodal data Insights (data licensing) growing ~58%	✓ deep dive $200M AstraZeneca/Pathos data+model deal over 3 yrs (PR) Total remaining contract value >$1B (Q1 letter) Data customers: Novartis, Merck KGaA, Takeda, Boehringer, United Therap. Illumina algorithm-training collaboration	Non-exclusive foundation-model co-builds with other pharma Expansion beyond oncology (cardio, neuro) Open-source pathology consortium as a funnel	Pharma could in-house modeling after learning from co-builds Interpretation layer could commoditize; the data itself is the hedge	~$8.5B ✓ FMP	~6.5x	+83%	Disc 6 · Neutral ~$8.5B cap, ~6.5x EV/Sales on +83% growth Strikingly cheap for the growth + scarcest biomedical data Priced like a normal growth co, not the data monopoly it's building	High · growth optionality Foundation-model + licensing optionality could make it the oncology-AI data layer Rich multiple + cash burn are the downside Large, real optionality = convex growth bet	Founder super-voting control; Pathos is Lefkofsky-affiliated (related-party optics on the $200M deal) Convertible debt + only just adj-EBITDA positive Short-seller scrutiny history (data-quality claims)	High AI is in the name and the multiple — but >$1B RCV arguably still under-modeled	Next earnings: ~early Aug 2026 (Q1 reported May 5 — guidance raised) (Q1 8-K) 2026 guide raised to $1.59–1.60B revenue / ~$65M adj EBITDA — the leverage inflection MRD volume ~6,500 tests in Q1, +500% YoY — reimbursement decisions are the swing TCV >$1.1B; 70+ pharma data customers — watch new (non-exclusive) co-builds Insights (data licensing) +44% in Q1 — the annuity compounding
Veeva SystemsVEEV◆ owner IR / presentations ↗	Data 7 · High Life-sciences CRM + proprietary OpenData/Link (HCP & reference data) Pharma depends on its reference data A separable corpus inside the SaaS	Growing OpenData/Link refreshed continuously; usage data grows with seats	AI 7 · High AI embedded in pharma workflows Up the curve Vertical-SaaS leader	✓ deep dive AI agents shipping across CRM/Vault (Dec 2025 wave) OpenData/Link reference data feeds its own AI	Agent pricing on top of seats Link data into pharma AI pipelines	Vertical-SaaS pricing under the same agentic pressure as all seats ('SaaS-pocalypse') AI app-builders lower barriers to bespoke pharma tools	~$27B ✓ FMP	~7.7x	+16%	Disc 5 · Neutral Premium SaaS multiple Data is a real, under-discussed asset Equity priced for quality	Low Premium SaaS; data underrated but equity priced Limited asymmetry	Salesforce→own-platform CRM migration is a multi-year execution risk Core TAM maturing; growth depends on new apps	Med Read as a quality SaaS with AI features, not a data owner	Agent adoption metrics Vault CRM migration completion
Consumer · user-generated · marketplace data
CarvanaCVNA○ operator IR / presentations ↗	Data 4 · Neutral Transactional used-car e-commerce & trade data Tunes its own pricing/inventory	Growing Transaction/pricing data grows with units; internal	AI 3 · Low Input, not the product	~ desk note Internal pricing AI	—	Low direct AI risk	~$76B ~ EV est	~3.5x	+30%	Disc 3 · Low Volatile, richly valued Weak fit for the screen	Low High beta but valued on retail growth, not data	Garcia family control + related-party history; leverage rebuilt the equity once already	Low Retail story	Unit economics
CoStar GroupCSGP◆ owner IR / presentations ↗	Data 8 · High Verified CRE comps/property data, 35-yr research army LoopNet, Apartments.com, Homes.com Unscrapeable, walled inside terminals	Compounding Research army keeps verifying; comps accumulate permanently Zonda adds a housing-data stream	AI 3 · Low Walled, litigious; data locked in terminals — minimal AI surface Heavy Homes.com ad spend Strategic data, low AI surface area	✓ deep dive None — deliberately walled; litigious vs scrapers Zonda acquisition ($800M) extends housing data	The big withheld option: licensed CRE grounding for real-estate AI Homes.com AI search features	AI aggregation/scraping pressure on listings; Google entering for-sale listings (BTIG flag) Verified CRE comps hardest to substitute	~$14B ✓ FMP	~4.0x	+19%	Disc 6 · Neutral ~$14B cap, ~4.0x EV/Sales on +19% growth Much cheaper than I'd shown; Homes.com spend masks margins Unscrapeable CRE data at a reasonable price	Moderate–High Unscrapeable CRE data, now cheap Low AI surface + heavy ad spend cap near-term Re-rate optionality as Homes.com spend rolls off	Homes.com spend is an act of will (founder-CEO); activist pressure has surfaced Serial litigation posture cuts both ways	Low AI never part of the story; the data optionality is free at ~4x	Homes.com spend roll-off (margin catalyst) Zonda integration Any posture change on data access
DuolingoDUOL◆ owner* IR / presentations ↗	Data 6 · Neutral One of the largest learning-interaction datasets (50M+ DAU) Granular data on how people learn, err & retain across 100+ courses Used in-product to tune pedagogy — not licensed Value captured as engagement, not a sellable corpus	Compounding Learning interactions scale with DAUs (50M+, growing) Every exercise answered is new pedagogy data	AI 6 · Neutral AI-first (Gen-AI 'Max', AI video calls) Shipped 148 courses in a year via generative AI Unlock shows up as engagement/ARPU, not a licensing line Mid — AI deepens the product moat	✓ deep dive None out; heavy OpenAI/GenAI consumer (Max, AI courses) 148 AI-generated courses shipped in a year	Learning-data licensing (never signaled) AI-tutor pricing tiers	ChatGPT as a free language tutor — the central substitution threat Defense: gamification + structure, not content	~$5.5B ✓ FMP	~4.0x	+39%	Disc 6 · Neutral ~4.0x EV/Sales on +39% growth — cheap on growth metrics AI-disruption fear embedded in the multiple A growth franchise at a non-growth multiple	Moderate–High ~4x on +39% growth bounds the downside if growth holds Upside if AI features lift engagement/ARPU Positive convexity on metrics	Founder control; monetization-vs-engagement tension Still expensive on earnings even after the crash	High — as threat Narrative says ChatGPT kills language learning; the AI-first operating model is ignored	DAU/booking growth stabilization Max attach rate Energy/engagement metrics
MercadoLibreMELI○ operator IR / presentations ↗	Data 6 · Neutral LatAm marketplace purchase + fintech/credit data Powers its own ads/lending (input)	Compounding Purchase + credit data compounds with GMV growth	AI 3 · Low AI for marketplace/credit optimization Not a sold corpus	~ desk note Internal AI for ads/credit	—	Low; AI mostly an internal lever	~$83B ~ EV est	~3.5x	+35%	Disc 3 · Low Premium growth stock Data doesn't re-rate it	Moderate High growth, but valued on the business, not the data	LatAm FX/political risk; credit-book quality through cycles	Low Not a data play	LatAm growth; fintech credit
NetflixNFLX○ operator IR / presentations ↗	Data 7 · High Viewing/interaction data across ~300M members Real moat for recs/greenlighting Strictly internal — never licensed	Growing Viewing data grows with engagement; internal-only	AI 2 · Low Never licensed; AI = better curation only Internal-use data	~ desk note Internal only — never licensed	—	GenAI lowers content-production barriers for rivals (long-term)	~$343B ~ EV est	~8.0x	+14%	Disc 2 · Low Premium mega-cap on subscriber economics n/a as a data play	Low Priced mega-cap, data internal	Content-spend discipline vs growth; live/sports costs	Low Recs AI assumed, not valued separately	Sub growth; ads tier
RedditRDDT◆ owner IR / presentations ↗	Data 9 · High ~100k+ communities, two decades of upvote-ranked human conversation Largest archive of authentic opinion, troubleshooting, niche expertise Exactly what LLMs lack: recommendations, lived experience, long-tail Q&A Surfaces disproportionately in AI answers Classified as social media, not 'data services'	Compounding DAU still growing; posts/comments compound the archive daily Two decades of vote-ranked history can't be replicated retroactively	AI 9 · High $203M aggregate contract value disclosed at IPO (Google + OpenAI) ~$130M/yr run-rate ≈ 10% of revenue; Google ~$60M/yr, OpenAI ~$70M/yr #1 most-cited source across AI models (~3x Wikipedia) Google renewal under negotiation — pushing usage-based pricing Litigates unlicensed scrapers (incl. Perplexity suit)	✓ deep dive $203M aggregate disclosed at IPO (TechCrunch) Google ~$60M/yr; OpenAI ~$70M/yr ≈ 10% of revenue (SEL) 2–3 yr terms struck Jan 2024 — now in renewal window	Google renewal at usage-based rates (mgmt: 'open for business') Anthropic / Meta / xAI remain unlicensed Dynamic per-citation pricing models Int'l + vertical (commerce intent) licensing	Google AI Overviews already cut logged-out traffic (the 2025 user-growth scare) AI-generated content pollution threatens corpus authenticity Meta forums app targets the community moat	~$34B ✓ FMP	~13x	+69%	Disc 5 · Neutral Best corpus + fastest unlock, but ~13x sales ~65% growth supports it — priced FOR growth Quality off the charts; valuation not a gap	Moderate Big growth/licensing optionality = upside call But ~13x sales means real drawdown if growth slows Net mildly positive from the licensing option	Community/moderator revolt risk is structural (2023 API blackout precedent) Altman's stake = governance optics Ad business still ~90% of revenue and competitive	High The AI-data story IS the stock; renewal terms are the swing	Google contract renewal & structure (report) Scraper litigation incl. Perplexity suit Meta forums app traction (the bear case) Data-licensing line in quarterly prints
TripAdvisorTRIP◆ owner IR / presentations ↗	Data 4 · Neutral ~1B travel reviews; Viator experiences marketplace Widely scraped & substitutable Reviews feed AI trip-planning agents	Slowing risk ~1B cumulative, but contributions follow visits — and AI answers divert visits The corpus ages if the flywheel slows	AI 4 · Neutral Perplexity partnership (Jan 2025) now a measurable booking channel ChatGPT app launch partner (Oct 2025) for trip planning Distribution-into-AI strategy, not paid corpus licensing Viator + TheFork now >50% of revenue — the real value	✓ deep dive Perplexity partnership, Jan 2025 — hotels customer-acquisition channel (PR) ChatGPT app launch partner, Oct 2025 (report)	Paid licensing of the review corpus (currently given for distribution) Viator inventory as the bookable layer inside AI agents	AI trip planners bypass the site entirely — the core meta business is the casualty Viator/TheFork partially insulated (fulfillment, not discovery)	~$1.4B ~ EV est	~0.7x	+3%	Disc 5 · Neutral ~$1.4B cap, ~0.7x EV/Sales on +3% growth Very cheap, but reviews are being disintermediated Value is Viator/TheFork, not the review corpus	Low–Moderate Cheap but melting Weak optionality	Viator faces GetYourGuide/Klook competition; legacy meta declines Post-Liberty structure leaves strategic questions	Med AI read as existential threat; partnerships seen as defensive, not monetizing	AI-channel booking disclosures Membership program launch Viator/TheFork growth (the real value)
YelpYELP◆ owner IR / presentations ↗	Data 6 · Neutral ~300M geocoded local-business reviews Structured local sentiment for 'best X near me' Classified as internet content, not data services	Steady flow — not melting 22M new reviews in 2025 (vs 21M in '24); corpus 330M, +7% YoY (FY25 PR) What's melting is consumption (app engagement), not contribution — yet Risk: contribution follows traffic with a lag	AI 6 · Neutral Signed OpenAI agreement (disclosed Feb 2026) Perplexity has used Yelp local data since Mar 2024 'Other revenue' +17% on data licensing & transactions Expanding Yelp Assistant; Hatch acquisition (AI front-desk) Core local-ad business still the eroding center	✓ deep dive OpenAI agreement signed (Feb 2026, undisclosed) (FY25 PR) Perplexity has integrated Yelp local data since Mar 2024 Data licensing inside 'Other revenue' (+17%)	More assistant integrations (Gemini, Claude, Alexa-class) Usage-priced local-data API Transactional referrals from AI answers	AI assistants answer 'best X near me' without a Yelp visit — ad impressions leak Google's AI search squeezes the top of funnel	~$1.3B ✓ FMP	~0.9x	+3%	Disc 6 · Neutral ~$1.3B cap, ~0.9x EV/Sales on +3% growth <1x sales — cheap, but the ad core is eroding AI-distribution optionality vs value-trap risk	Moderate · binary Cheap with AI-distribution optionality vs an eroding core Binary-ish	Own antitrust fight with Google (plaintiff) — outcome cuts both ways SMB advertiser churn; restaurant/retail ads already shrinking	Low-Med OpenAI deal is new and barely in the price; story still read as 'Google victim'	OpenAI deal revenue contribution 'Other revenue' growth each quarter Services-ads resilience vs AI search
ZillowZ/ZG◆ owner* IR / presentations ↗	Data 5 · Neutral Zestimate + listing data + largest US housing audience Much listing data is MLS-shared, not fully proprietary Consumer housing intent data	Churning flow Listings turn over rather than accumulate; Zestimate history compounds quietly	AI 5 · Neutral Strong in-app AI; partial data moat Real-estate AI agents could use it Mid on the curve	~ desk note In-app AI (natural-language search); MLS data shared	Housing-intent data for real-estate agents	AI agents could search listings directly; Zillow's audience moat = the defense Low risk to Zestimate itself	~$8.6B ✓ FMP	~3.1x	+16%	Disc 6 · Neutral ~$8.6B cap, ~3.1x EV/Sales on +16% growth Cheaper than I'd shown; partial (MLS-shared) moat Housing-cycle leverage on top	Moderate Housing-cycle optionality Partial moat caps the data upside Balanced	NAR commission-settlement reshapes agent economics — its customers' wallets Housing-cycle beta; Showcase/mortgage execution	Med AI features noted, not a data thesis	Housing cycle; Showcase attach
Peer-reviewed journal publishing data
RELXRELX◆ owner IR / presentations ↗	Data 10 · High Elsevier science (The Lancet, Cell, Scopus) — peer-reviewed at scale LexisNexis legal + LexisNexis Risk Solutions (identity/fraud) Three of the most defensible corpora on earth in one company Scientific literature is critical for frontier capability	Growing Global science output grows mid-single-digit %/yr; submissions rising Caveat: AI-generated paper flood is a quality-control burden	AI 8 · High Lexis+AI, Scopus AI, ClinicalKey AI, Protégé all live Embeds data in grounded retrieval vs raw training access Among the best-positioned grounded-AI owners High — productization mature & shipping	✓ deep dive No raw licensing; grounded products only Lexis+ AI, Scopus AI, ClinicalKey AI, Protégé all shipped	Elsevier corpus licensing remains a withheld option (big if ever) Agent-access tiers to Scopus/Lexis Risk-data feeds into KYC agents	Lexis faces the same legal-AI insurgency as Westlaw Elsevier: AI summarization + open access erode subscription rationale Risk division most insulated	~$92B* ✓ FMP	~9.0x	+7%	Disc 4 · Neutral ~9x EV/Sales on +7% growth — a small AI-threat discount embedded Grounded AI products shipping across all three corpora Durable compounder; thesis is durability, not deep value	Low Fully-valued premium compounder Durable, but limited asymmetry either way	Open-access mandates (Plan S) pressure Elsevier's model Exhibitions segment is cyclical	High — as threat Same legal-AI threat narrative as TRI; grounded-product execution under-credited	Lexis+ AI penetration disclosures Any Elsevier AI-licensing posture change FY guide post-crash
WileyWLY◆ owner IR / presentations ↗	Data 7 · High Peer-reviewed STM journals/books; Cochrane co-publishing Vetted scientific text — what labs pay for to lift capability A 'smaller Elsevier' — quality corpus, narrower than RELX Editorial vetting + citation links add provenance Proprietary, not freely on the open web	Growing Submissions +25%, output +13% — the journal flow is accelerating (Q1 PR) Caveat: some of that surge is AI-assisted writing — vetting is the product	AI 7 · High $92M lifetime AI-licensing revenue; $29M in Q1 FY26 alone Anthropic strategic partnership (Sep 2025) + projects with 3 top tech cos Recurring inference pilots: pharma, chemical, space-exploration cos One of the only names with disclosed, recurring AI revenue Recurring AI line gives it proven monetization few peers can show	✓ deep dive $92M lifetime AI revenue; $29M in Q1 FY26 (PR) Anthropic strategic partnership (Sep 2025) Projects with 3 of the largest tech cos (unnamed) Recurring inference pilots: pharma, chemical, space	Convert pilots → recurring corporate R&D subscriptions License on behalf of partner publishers (agency model) Agent-citation / RAG licensing beyond training	AI summarization reduces per-article reading; open access erodes paywalls AI-written paper flood strains (and ironically validates) peer review	~$2.3B ✓ FMP	~1.9x	~flat	Disc 7 · High ~1.9x EV/Sales with disclosed, recurring AI-licensing revenue ($92M lifetime) Flat underlying top line is the offset Cheap on metrics for the rare proven AI licensor	Moderate Low multiple + proven licensing = bounded downside with optional upside Flat core growth caps the slope Asymmetry modest but positive	Library budget pressure + consolidation of academic spend Post-divestiture portfolio still re-finding growth	High AI-licensing story is prominent in coverage; expectations now elevated	Next earnings: Tue June 16, 2026, pre-market — FY26 Q4 + FY27 guide (notice) AI recurring revenue <10% of AI revenue today; mgmt expects the proportion to triple next year (Q3 call) OpenEvidence partnership: 5-yr multimillion licensing + Wiley equity stake Nexus licensing service at 36 publishing partners — the agency model scaling Emerald Publishing acquisition (Jun 2, 2026) adds proprietary research corpus Q3 raised margin/EPS guidance to high end; ~4.5% dividend while you wait
Wolters KluwerWTKWY◆ owner IR / presentations ↗	Data 9 · High Legal, tax, health & regulatory information + workflow (CCH, UpToDate) UpToDate is a premier point-of-care clinical reference Authoritative corpora like RELX/Thomson Reuters Subscription, deeply embedded in workflows	Steady flow Regulatory/tax/clinical updates are a built-in perpetual flow	AI 7 · High AI workflow tools shipping across segments Same grounded-AI position as RELX/TRI Up the curve, productizing its corpus	~ desk note AI embedded in UpToDate/CCH; no corpus licensing	Clinical-grounding deals for medical AI (UpToDate is the prize)	UpToDate's clinical-reference franchise faces AI-native rivals (e.g. OpenEvidence) Tax/legal workflow seats exposed like TRI/RELX	~$38B* ✓ FMP	~6.0x	+6%	Disc 3 · Low Premium compounder AI quality understood & paid for Durability, not discount	Low Durable but fully valued Limited asymmetry	CEO transition (long-tenured McKinstry era ended) Health segment competition intensifying	Med Quality understood; AI optionality not separately priced	UpToDate AI products; FY guide
Research analytics · IP · content data
ClarivateCLVT◆ owner IR / presentations ↗	Data 7 · High Web of Science — citation graph linking ~2B scientific citations Derwent (patents) + Cortellis (drug-pipeline intelligence) ProQuest academic content: dissertations, archives, ebooks Valuable for research/IP agents — 'a poor man's Elsevier' Data quality seen as better than the company's execution	Steady flow Citations/patents grow with global publishing — steady, not accelerating	AI 5 · Neutral Signed access deals (Anthropic) + MCP exposure AI research assistants in pipeline, slow to ship Citation + patent networks useful for IP/research AI ~$4.5B net debt constrains reinvestment Behind on the curve — data ready before the company	✓ deep dive Anthropic access agreement + MCP exposure for Web of Science No disclosed $; debt limits investment	Patent/citation grounding for research agents ProQuest licensing to labs	AI literature tools (Elicit, Semantic Scholar) bypass Web of Science discovery Patent search AI-commoditized	~$1.5B ✓ FMP	~2.3x	−4%	Disc 7 · High ~2.3x EV/Sales (EV ~$5.7B, mostly debt) on $2.46B rev Equity (~$1.5B) is a small levered stub Cheap on sales, but the debt is the risk	High · distressed option Small equity stub over ~$4.5B debt ≈ a call option on the enterprise Bounded loss, multi-bagger upside if it de-levers/monetizes Convex but a high-probability left tail — size accordingly	~$4.5B debt wall dominates everything PE overhang; serial restructurings and writedowns	Low Debt story drowns the data story entirely	De-leveraging milestones Any AI-licensing disclosure Segment divestitures
Getty ImagesGETY◆ owner IR / presentations ↗	Data 8 · High ~500M licensed, rights-cleared, caption-annotated images & video Exclusive editorial archives spanning a century iStock + Unsplash extend the catalog across tiers Rights-cleared image–text pairs = ideal multimodal training data Legal indemnification is the product AI builders need	Strong flow + archive 160k+ events covered/yr; ~600k creators; thousands of assets ingested daily (Q2 PR) Editorial is a daily flow machine, not just a vault — FY25 grew both segments Risk is creative-side inflow: genAI erodes contributor economics	AI 6 · Neutral Perplexity multi-yr display deal (Oct 2025) Generative tools with NVIDIA; licensed-data posture vs scrapers Shutterstock merger (UK-cleared May 2026) adds its lab licensing deals Litigation (Stability AI) continues to define the rights frontier Licensing not yet replacing what AI takes from stock demand	✓ deep dive Perplexity multi-yr display deal, Oct 2025 — undisclosed $ (PR) NVIDIA-powered licensed generative tools (Getty/iStock) Shutterstock brings lab deals (OpenAI, Meta, Apple, Amazon) post-merger	Post-merger: consolidated licensed-visual-data vendor to every lab Display/attribution deals with other AI search products Indemnified training data as a product line	GenAI image substitution is already in the creative numbers Editorial (real events) is the un-generatable refuge	~$0.3B ✓ FMP	~1.5x	+4%	Disc 7 · High ~$0.3B cap — a deep-distress equity stub over ~$1.3B+ debt ~1.5x EV/Sales on ~$0.9B revenue Cheap + levered = a lottery ticket on the data	High · lottery Distressed, levered equity on ideal data — near-binary Multiplies on a licensing/M&A catalyst, or drifts to zero Steeply convex, lowest-conviction high-convexity name	~$1.3B+ debt; controlled company (Getty family + Koch) Shutterstock integration risk; CMA found UK editorial concerns (remedies)	Med-High Every AI headline attaches to it; the balance sheet, not awareness, is the constraint	Shutterstock merger close (UK-cleared May 2026) Combined AI-licensing revenue line Stability AI litigation outcomes Debt refinancing
PearsonPSO◆ owner IR / presentations ↗	Data 6 · Neutral Education content, assessment & learning-outcome data Proprietary curriculum + testing content Education is an AI-disruption epicenter	Steady flow Assessment/courseware data flows with enrollment	AI 5 · Neutral AI partnerships to license/embed content Two-sided disruption: tutoring threat + licensing optionality Mid on the curve	~ desk note AI partnerships announced 2025 with Microsoft, Google Cloud & AWS for learning products	Curriculum licensing into AI tutors; assessment data moats	AI tutors substitute courseware — the existential half of the two-sided story Assessment/credentialing more defensible	~$9.3B ~ EV est	~2.0x	+3%	Disc 5 · Neutral Disruption discount Optionality + threat both real Owner with genuine two-sidedness	Moderate Content-licensing/AI-tutoring optionality vs a real disruption threat Two-sided convexity	Enrollment cliffs + OPM decline in higher ed Multi-year strategic rebuild under newer CEO	Med Two-sided: tutoring threat vs licensing option	Enrollment AI products; partnership revenue
Geospatial · sensor data
BlackSkyBKSY◆ owner IR / presentations ↗	Data 8 · High High-frequency satellite imagery + Spectra AI geospatial intelligence Rapid-revisit imagery over own and third-party sensors Growing multi-year defense backlog Same theme as Planet, earlier in scaling	Compounding Constellation growth (Gen-3) raises capture rate; archive accrues	AI 7 · High $100M+, 7-yr international defense contract (Jan 2025) $30M+ multi-year Gen-3 tactical ISR deal (Q3 2025) Backlog $323M, 91% international Spectra AI analytics layer over own + third-party sensors Same curve as Planet, earlier and cheaper-cap stage	✓ deep dive $100M+, 7-yr int'l defense contract, Jan 2025 (PR) $30M+ Gen-3 tactical ISR deal (Q3 25); backlog $323M, 91% int'l	Gen-3 constellation upsells US budget normalization Spectra analytics licensing to allied gov'ts	Same — AI raises the value of the sensor flow	~$1.2B ✓ FMP	~12x	+4%	Disc 4 · Neutral ~$1.2B cap, EV ~$1.25B on ~$107M revenue ~12x EV/Sales on ~flat revenue — richly valued, not cheap Defense backlog is the story; the price is not a discount	Moderate Unique imagery + defense backlog = real optionality But ~12x sales on flat revenue means you pay up for it Not the bounded-downside cheap option it first looked like	Dilution history; international customer concentration Gen-3 execution timeline risk	Med-High Defense-AI story increasingly recognized; ~12x sales already pays for it	Gen-3 launch & tasking milestones US budget resolution New int'l capacity commitments
LeidosLDOS△ borderline IR / presentations ↗	Data 3 · Low Works on gov geospatial/intel data it doesn't own Palantir-type: analytics layer on others' data	n-a Doesn't own the data it works on	AI 4 · Neutral AI analysis agents on others' data Services, not a data owner	~ desk note AI services on government data it doesn't own	—	AI compresses services labor pricing — the classic services squeeze	~$16B ~ EV est	~1.3x	+6%	Disc 4 · Neutral Cheap services multiple Not a data-owner screen fit	Low Services multiple, no data optionality	Recompete cycles; budget continuing-resolution exposure	Low Services multiple, services story	Award cycles
Planet LabsPL◆ owner IR / presentations ↗	Data 9 · High Images the entire landmass daily (~3.5m), plus high-res SkySat/Pelican (~50cm) A unique multi-year temporal archive no competitor has Change-over-time is the moat — can't retroactively collect history Increasingly delivered as AI-ready analytics Defense & intelligence is the fastest-growing buyer	Compounding by design Whole-Earth scan daily — the archive grows every 24h by construction New satellites add resolution/cadence; history can't be re-collected	AI 7 · High Anthropic partnership (Mar 2025): Claude applied to satellite imagery First prime win on NGA Luno ($12.8M, maritime AI analytics) MDA SHIELD IDIQ prime — eligible for Golden Dome task orders Backlog ~$900M (+79% YoY); Q4 revenue +41% AI analytics is the product; defense is the buyer	✓ deep dive Anthropic partnership (Mar 2025): Claude on satellite imagery (report) NGA Luno prime win $12.8M (SpaceNews) MDA SHIELD IDIQ prime (Golden Dome-eligible); backlog ~$900M	Golden Dome task orders AI-analytics subscriptions over the archive (insurance, ag) More foundation-model partnerships on temporal imagery	Low — AI is the accelerant, not the threat; risk is capex/competition not AI	~$10.4B ✓ FMP	~28x	+26%	Disc 4 · Neutral ~28x EV/Sales, pre-profit — the data and backlog are the appeal, not the multiple Backlog ~$900M anchors forward revenue Rich on every metric	High · optionality Unique archive + ramping $906M defense backlog = large 'if it scales' upside Pre-profit/capital intensity is the downside Strong positive convexity	SPAC-era dilution legacy; Pelican capex cycle Government contract concentration & timing lumps	High AI + defense premium fully in the ~28x; expectations are the risk	Next earnings: ~early Sept 2026 (FQ1'27 reported Jun 4 — record print) (Q1 8-K) FY27 guide raised to $425–441M (+41% mid); Q2 guide $102–107M with adj-EBITDA breakeven-to-positive Backlog $906M (+72%), RPO $816M (+81%); ~40% of backlog converts within 12 months Pelican cadence: 3 launched in Q1 incl Sweden's first sovereign recon satellite $731M cash funds the capex cycle; NGA $22M extension; Golden Dome task orders the option
Spire / SatellogicSPIR/SATL◆ owner IR / presentations ↗	Data 6 · Neutral Weather/maritime/RF data (Spire); hyperspectral imagery (Satellogic) Niche proprietary sensor data Early and capital-intensive	Compounding Continuous sensor flow (weather/RF/hyperspectral); small base	AI 5 · Neutral Real but early sensor datasets On the curve but small	~ desk note Niche gov/defense sensor contracts	Weather/RF data into forecasting AI	Low AI risk; survival risk is capital, not AI	~$0.5B ~ EV est	~3.0x	+20%	Disc 5 · Neutral Speculative micro-caps Watchlist-only owners High risk, thin coverage	High · lottery Micro-cap sensor data — binary Large upside if a dataset scales, fat left tail High-variance convexity	Cash runway and listing-compliance history — survival-grade risks	Low Below the radar entirely	Contract wins; cash runway
Sports data
Genius SportsGENI◆ owner IR / presentations ↗	Data 8 · High Exclusive official league-data rights (NFL, NCAA, EPL) Now the NCAA's official data provider The other half of the official-sports-data duopoly with Sportradar Growing media/ad data layer (post-Legend acquisition) Multi-year rights = a hard moat	Growing More leagues, deeper tracking (player-level optical) each season	AI 7 · High AI for fan engagement and betting integrity products Media/ad data layer monetizes the rights twice Growing ~25%; up the curve like Sportradar Owner actively monetizing, not just holding	✓ deep dive No corpus licensing; exclusive NFL/NCAA/EPL rights in-product BetVision + media/ad data layer (Legend acq.)	Second monetization of rights via media/ads AI integrity & fan-engagement products	Same; rights moat holds, services layer competitive	~$1.7B ✓ FMP	~3.6x	+31%	Disc 6 · Neutral ~$1.7B cap, ~3.6x EV/Sales on +31% growth Cheap for the growth + the official-data rights duopoly Media/ad layer monetizes the rights twice	High Rights moat + media-data optionality = asymmetric upside Growth-priced, so not deeply cheap Convex if the media layer scales	NFL warrant dilution; rights renewals can reset economics Only recently profitable	Low Same as Sportradar — the duopoly's AI angle is unpriced	Next earnings: ~early Aug 2026 (Q1 reported May 8) Legend closed May 1 → FY26 guide ~$990M–$1.01B rev / $270–280M EBITDA (~28% margin) (Q1 call) NFL rights locked through Super Bowl 2030; GeniusIQ to automate the full rights portfolio by end-2027 Prediction markets: market makers onboarded in Q1 on low-latency feeds Targets: positive GAAP net income 2027; ≥60% uFCF conversion by 2028; ~$100M H2'26 cash flow
SportradarSRAD◆ owner IR / presentations ↗	Data 8 · High Official, licensed sports-data rights — 900k+ events, 80+ sports Real-time play-by-play feeds, pre-match & live odds, streaming Multi-year exclusive league contracts = hard-to-replicate moat Half of a duopoly with Genius for official betting data The data backbone of the global betting industry	Growing Event coverage (900k+/yr) and in-play depth keep expanding	AI 7 · High AI for in-play personalization, risk/trading, content generation Higher-margin products (MTS, 4Sight) lift take-rates A genuine owner monetizing its corpus Up the curve; AI deepens products vs a new licensing line Recent Kalshi deal extends into prediction markets	✓ deep dive No corpus licensing — official-data rights monetized in-product Kalshi deal extends feeds into prediction markets	AI in-play products lift take-rates (4Sight, MTS) Prediction-market data feeds scale	Betting operators in-housing AI models could squeeze value-add services Official rights protect the raw feed itself	~$4.9B ✓ FMP	~3.0x	+12%	Disc 6 · Neutral ~3.0x EV/Sales on ~12% growth for a rights-duopoly owner Reasonable on metrics for the moat Fair-to-slightly-cheap	Moderate ~3x sales on duopoly rights gives a floor Upside from take-rate growth on new products Balanced, slight positive tilt	Rights-cost inflation: leagues extract more each renewal Founder (Koerl) control; bookmaker customer concentration	Low Priced as a betting vendor; data-rights duopoly rarely framed as AI	Next earnings: ~early Aug 2026 (Q1 reported early May) NOW: FIFA World Cup (Jun–Jul 2026) — major in-play/MTS volume event (Q1 call) FY26 reaffirmed: 23–25% cc revenue growth / 34–37% EBITDA growth Prediction markets 'imminent, potentially material' — H2 ramp IMG Arena synergies above 25% target; >700k streamed matches in 2026; H2 restructuring for leverage Short-seller reports — CEO pushed back on call; monitor, don't ignore
Ad · measurement · web data
DoubleVerify / ComscoreDV/SCOR◆ owner IR / presentations ↗	Data 6 · Neutral Ad-verification/fraud data (DV, healthier franchise) Cross-platform audience measurement (Comscore, distressed) Proprietary measurement data	Flow with ad spend Verification events track media volumes	AI 6 · Neutral Measurement owners; AI + walled gardens pressure the moat DV is the credible franchise; SCOR a broken business	~ desk note AI-content verification products (DV)	Verification layer for AI-generated ad content	AI-generated content/MFA sites flood verification (volume up, value contested) Walled gardens self-verify	~$1.6B ✓ FMP	~2.0x	+14%	Disc 7 · High ~$1.6B cap, ~2.0x EV/Sales on +14% growth Cheap for an ad-verification data owner (DV) DV the franchise; SCOR the distressed lottery leg	High ~2x sales for a profitable measurement owner AI + walled gardens pressure the moat Cheap enough to be convex	Ad-budget cyclicality; IAS rivalry compresses pricing; SCOR is balance-sheet-fragile	Low-Med De-rated with adtech; AI angle minor	DV growth; SCOR restructuring
SimilarwebSMWB◆ owner IR / presentations ↗	Data 5 · Neutral Panel/clickstream traffic, keyword, conversion estimates for nearly every site The dataset everyone uses to track digital behavior — incl. AI-search traffic Broad coverage, but modeled/estimated, not a first-party record Continuously updated digital-intelligence feeds	Continuous panel Clickstream flow is constant but panel-based — quality needs constant defense Privacy/cookie shifts are structural headwinds to collection	AI 8 · High Sells data feeds/APIs + MCP integrations into AI workflows Uniquely positioned to measure (and feed) the AI-search era Ahead for its size — high AI exposure per dollar Catch: modeled data less defensible than owned	✓ deep dive Sells AI/clickstream datasets + MCP integrations into AI workflows The standard source for tracking ChatGPT/Gemini traffic share	AI-data ARR as a disclosed line Agent-platform data feeds Strategic acquirer interest (data fits many buyers)	AI search shrinks open-web traffic — shrinking the thing it measures Collection (panels/extensions) gets harder as browsing shifts to agents	~$0.36B ✓ FMP	~0.7x	+15%	Disc 7 · High ~$0.36B cap, EV ~$0.21B on ~$283M revenue ~0.7x EV/Sales — strikingly cheap, even for modeled data Deep-value + AI-licensing optionality; small & illiquid	High · deep-value <1x EV/Sales with AI-licensing pull = asymmetric Small, illiquid, modeled (non-owned) data = the risk Cheap enough that convexity tilts positive	Nano-cap liquidity; SBC heavy; privacy rules threaten collection methods	Med Its datasets are quoted everywhere; the equity is ignored at ~0.7x EV/S	Next earnings: ~mid-Aug 2026 (Q1 reported May 13) Second large LLM training contract expected 'over the coming quarters' (Q1 6-K) AI revenue trajectory: 11% of Q4 revenue, ~3x YoY — does it keep compounding? RPO $297.7M (+18%); multi-year ARR at 64% — contract-quality migration FY26 guide $307–315M; low end already raised once
The Trade DeskTTD○ operator IR / presentations ↗	Data 6 · Neutral Ad-bidding/bidstream data + UID2 identity framework Powers its own bidding (demand-side platform) Vast behavioral data, but an input	High flow Bidstream data scales with ad volume; ephemeral by nature	AI 6 · Neutral Stewards the UID2 identity standard Identity-data optionality, not a corpus sale De-rated; case on ad-platform fundamentals	~ desk note Kokai AI in-platform; UID2 stewardship	UID2 as identity layer for agentic commerce	AI walled-garden answers shrink open-web inventory — the de-rate driver Agentic ad-buying could compress DSP take rates	~$9.4B ~ EV est	~3.0x	+18%	Disc 6 · Neutral ~3.0x EV/Sales on +18% growth — value territory for profitable adtech UID2 identity optionality on top Open-web AI fears embedded in the multiple	Moderate–High Modest multiple + identity-standard optionality Data is an input, not a sold corpus Positive tilt on metrics	Founder super-voting; Amazon DSP is the real competitive event SBC and the credibility hit from the '25 stumble	Med — as threat AI read as open-web risk; de-rate reflects it	CTV share; UID2 adoption; growth re-accel
ZoomInfoGTM◆ owner IR / presentations ↗	Data 6 · Neutral B2B contact + company intelligence: emails, dials, org charts, technographics Buying-intent signals across millions of companies A live 'who's-who' graph of decision-makers Real, but increasingly replicable as AI shifts buyer behavior Renamed platform around 'GTM AI'	Decay treadmill B2B contact data decays ~25–30%/yr — must be rebuilt constantly Customer churn weakens the contributory refresh loop The clearest decaying-asset risk in the table	AI 7 · High GTM Context Graph native in OpenAI's Codex for Work — agent context layer AI is both distribution and disruptor Cut 2026 guidance + ~20% of staff on AI-driven shifts Ahead on plumbing, behind on the seat-based model Clearest live case of 'data doesn't protect the equity'	✓ deep dive GTM Context Graph natively in OpenAI's Codex for Work No disclosed licensing $; positioning as agent context layer	Per-call context pricing for sales agents More agent-platform embeds (Claude, Gemini) Data-only tier decoupled from seats	Customers replace SDR seats with AI — seat-based model directly hit (guidance cut said so) Agents can increasingly infer contact data without a vendor	~$0.8B ✓ FMP	~1.7x	~−3%	Disc 6 · Neutral ~1.7x EV/Sales — lowest multiple on the board But revenue is declining; the cheapness reflects decay risk Statistically cheap; operationally a falling knife	High · binary ~1.7x sales embeds heavy pessimism — small asymmetric base Re-rates hard if revenue stabilizes as the agent-context layer Declining revenue is the live left tail	Debt on a shrinking base; SBC dilution; churn is the whole story	High — as threat The market's AI-victim poster child; the Codex embed is ignored	Next earnings: ~early-mid Aug 2026 (Q1 reported May 11) The trough test: FY26 guide cut to $1.185–1.205B (−4% mid); Q2 $300–303M — does it hold? (Q1 call) Agent embeds: Salesforce prospecting agent ships with ZoomInfo as first/primary external data source (150k+ customers); HubSpot native; ChatGPT/Claude/Copilot/Perplexity connectors live Pricing pivot: Copilot moving from seats to prepackaged credits/consumption Mgmt points to growth returning H2 2027; 35% AOI margin + cost cuts fund the wait
Auto data
ACV / OPENLANEACVA/KAR○ operator IR / presentations ↗	Data 6 · Neutral Wholesale used-car condition & transaction data (ACV inspection corpus) Granular vehicle-condition/pricing data Still primarily marketplaces	Growing Inspection corpus grows with every vehicle listed (ACV)	AI 5 · Neutral Feeds AI pricing ACV more data-distinctive Operators, not data-unlock plays	~ desk note ACV inspection-AI in-product	Condition-data licensing to pricing AIs	Low-moderate; inspection AI is ACV's own product	~$1.0B ~ EV est	~5.0x	+25%	Disc 5 · Neutral ACV the more data-distinctive Both operators Corpus enhances the platform	Moderate ACV growth + condition-data optionality Valued on the marketplace Mildly positive	ACV not yet sustainably profitable; OPENLANE balance sheet	Low Marketplace story	GMV growth; take rates
CarGurus / Cars.comCARG/CARS○ operator IR / presentations ↗	Data 5 · Neutral Auto listing, pricing & shopper-intent data Largely audience/marketplace Listings not fully proprietary	Churning flow Listings churn; intent data flows with traffic	AI 4 · Neutral Useful intent data, Zillow-like Not a data-unlock play	~ desk note In-product pricing AI	—	AI shopping agents could bypass listing sites	~$2.7B ~ EV est	~3.0x	+5%	Disc 5 · Neutral Reasonable valuations Operators in the Zillow mold	Low Operator, limited data asymmetry Balanced-to-low	Dealer-count churn; marketing-spend treadmill	Low Marketplace story	Dealer counts
CopartCPRT○ operator IR / presentations ↗	Data 7 · High Salvage-auto auction & vehicle-history data (IntelliSeller) Decades of auction-outcome data Serves its dominant auction marketplace	Growing Salvage auction outcomes accumulate with volume	AI 5 · Neutral AI tools in-product, not licensed Data deepens the moat, isn't the product	~ desk note Internal auction AI (IntelliSeller)	—	Low; AI assists damage assessment	~$29B ~ EV est	~10x	+10%	Disc 3 · Low Premium, high-quality operator Data deepens the moat, isn't the product	Low Premium, data not the re-rate driver	Leadership transition from founder era; totals cycle depends on used-car values	Low Operator story	Volume cycles
Retail · e-commerce data
InstacartCART○ operator IR / presentations ↗	Data 6 · Neutral Grocery-purchase + fast-growing retail-media ad data Rich first-party purchase data Powers its own high-margin ads (input)	Compounding Purchase graph deepens with order history	AI 5 · Neutral Strong data-driven ad engine AI-relevant, but feeds its ads, not sold Operator class	~ desk note Retail-media AI in-product	Purchase-data into commerce agents (never signaled)	AI shopping agents could disintermediate the storefront layer	~$9.9B ~ EV est	~3.5x	+10%	Disc 5 · Neutral Reasonable on ads + delivery Strong ad engine Data is an input	Moderate Retail-media optionality Valued on the business, not the data Balanced	DoorDash/Uber entering grocery; ad growth must outrun fee pressure	Low Grocery/ads story	Ad revenue growth
Transaction · payments data
FISFIS○ operator IR / presentations ↗	Data 5 · Neutral Merchant transaction flows & fraud signals (banking/payments processing) Real data, but serves its processing	Steady flow Transaction flow tracks processing volumes	AI 4 · Neutral In-product fraud/upsell, not a corpus	~ desk note Fraud AI in-product	—	AI-native fintech infrastructure competition	~$21B ~ EV est	~4.0x	+4%	Disc 4 · Neutral Cheap-ish fintech But not a data re-rate	Low Value fintech, data not the driver	Worldpay separation aftermath; bank IT spending cycles	Low Fintech story	Banking IT spend
Visa / Mastercard / AmexV/MA/AXP○ operator IR / presentations ↗	Data 8 · High Among the largest transaction datasets on earth Regulated, privacy-bound byproduct Not licensed as a corpus	Compounding Payment volumes grow ~10%/yr — among the largest data flows on earth	AI 3 · Low Increasingly productized But privacy-bound; not a corpus sale The ultimate data-advantaged operators	~ desk note Internal fraud/credit AI at vast scale; agentic-commerce pilots	Agentic payments standards (who authorizes an AI's purchase?)	Agentic payments could reshape authorization economics — also an opportunity Stablecoin/alternative rails the bigger structural worry	~$623B / $438B ~ EV est	~16x	+10%	Disc 2 · Low Valued as payment giants n/a as a data re-rate	Low Priced payment networks; data is internal	Interchange regulation (CCCA) and DOJ debit suit (V) Stablecoin rails as long-term routing threat	Med Agentic commerce chatter rising; data never the thesis	Agentic-payment standards; volume growth

Data trajectory = stock vs flow: is the corpus compounding (green), in steady/churning flow (amber), or slowing/decaying (red)? A growing flow makes licensing an annuity; a static archive is a one-time sale. New research columns: ✓ deep dive = filings/PRs/coverage reviewed this pass, key bullets link to sources · ~ desk note = knowledge-based fill, deep dive pending Hype factor = how aware the market is of the AI exposure. Low = unpriced (where mispricing lives) · Med / as-threat · High = fully in the price. "As threat" = the market narrative treats AI as a risk to the business model rather than an opportunity. All market caps + EV/Sales are FMP (SEC-sourced, Jun 9 2026) on the ✓ FMP rows; ~ EV est = cap is FMP, EV/Sales estimated (mostly operators) * = ADR/foreign listing, cap hand-adjusted to total company. Revenue growth is FMP FY-over-FY for the high-growth/declining names and key mid-caps; large stable franchises & operators carry ±2pt estimates (FMP confirmed those it checked). green EV/Sales cheap for profile · red rich High underpriced · Neutral fair · Low priced Convexity = payoff shape: bounded downside + asymmetric upside. High ≠ low-risk (distressed names are option-like). Company column frozen — scroll sideways.

Ratings summary

The core tension as a 2×2: data quality (rows) vs AI-unlock (columns). Top-left — elite data, slow unlock — is the latent re-rate watchlist. Note IQVIA's migration to the right column after the deep dive (IQVIA.ai, 150+ agents, 19 of top-20 pharma). Key: * = High valuation discrepancy · † = High convexity (both symbols = both).

Lower AI-unlock

Higher AI-unlock

Elite data

Latent — great data, slow unlock

IT*†, CLVT*†, GETY*†, EFX*†, TRU, EXPN.L, GH, NTRA, DH†, VRSK, MORN, NDAQ, ICE, CSGP, RAMP, NFLX, CPRT, V/MA/AXP

Monetizing the moat

IQV*†, WLY*, SRAD, GENI†, RDDT, TEM†, PL†, BKSY, RELX, SPGI, TRI, MSCI, MCO, WTKWY, VEEV

Lower data

Lower-tier / operators

FICO, DUOL, GDRX, SPIR/SATL†, PSO, TTD, DV/SCOR*†, YELP, Z/ZG, ACVA/KAR, CART, ELV, CARG/CARS, CME, DOCS, TRIP, MELI, CVNA, FIS, COR, LDOS

Punching above their data

GTM†, SMWB*†, FDS

And the grouped view — High and Low ends of each rating.

Data quality

High (7–10)

IT*†, IQV*†, CLVT*†, GETY*†, WLY*, SRAD, GENI†, EFX*†, TRU, EXPN.L, RDDT, TEM†, GH, NTRA, DH†, PL†, BKSY, VRSK, MORN, RELX, SPGI, TRI, MSCI, MCO, WTKWY, NDAQ, ICE, VEEV, CSGP, RAMP, NFLX, CPRT, V/MA/AXP

Low (1–3)

LDOS

Data trajectory

Growing / compounding

IQV*†, GETY*†, WLY*, SRAD, GENI†, EFX*†, TRU, EXPN.L, RDDT, TEM†, GH, NTRA, DUOL, PL†, BKSY, SPIR/SATL†, VRSK, MORN, TTD, ACVA/KAR, CART, RELX, SPGI, TRI, MSCI, MCO, NDAQ, CME, VEEV, CSGP, NFLX, MELI, CVNA, CPRT, V/MA/AXP

Slowing / decaying

GTM†, DH†, TRIP, LDOS

AI-unlock

High (7–10)

IQV*†, GTM†, WLY*, SRAD, GENI†, SMWB*†, RDDT, TEM†, PL†, BKSY, FDS, RELX, SPGI, TRI, MSCI, MCO, WTKWY, VEEV

Low (1–3)

CSGP, NFLX, MELI, CVNA, COR, V/MA/AXP

Discrepancy

High — underpriced

IT*†, IQV*†, CLVT*†, GETY*†, WLY*, SMWB*†, EFX*†, DV/SCOR*†

Low — priced / n-a

SPGI, TRI, MSCI, MCO, WTKWY, NDAQ, CME, RAMP, NFLX, MELI, CVNA, CPRT, V/MA/AXP

Convexity

High — asymmetric

IT*†, IQV*†, CLVT*†, GETY*†, GTM†, GENI†, SMWB*†, EFX*†, TEM†, DH†, PL†, SPIR/SATL†, DV/SCOR*†

Low — symmetric

CARG/CARS, RELX, SPGI, TRI, MSCI, MCO, WTKWY, NDAQ, ICE, CME, VEEV, RAMP, TRIP, NFLX, CVNA, CPRT, FIS, COR, V/MA/AXP, LDOS

Most promising names

The screen's pick: names from the Monetizing the moat and Punching above their data quadrants that also carry a High valuation discrepancy (*) or High convexity (†) — i.e. the unlock is already happening and the payoff is mispriced or asymmetric.

Name	Thesis	Q&A
IQVIA IQV *†	Thesis ↓	Q&A ↓
Similarweb SMWB *†	Thesis ↓	Q&A ↓
Wiley WLY *	Thesis ↓	Q&A ↓
Tempus AI TEM †	Thesis ↓	Q&A ↓
Sports-data duopoly GENI + SRAD †	Thesis ↓	Q&A ↓
Planet Labs PL †	Thesis ↓	Q&A ↓
ZoomInfo GTM †	Thesis ↓	Q&A ↓

IQVIA (IQV) *†

The cleanest setup on the board: the scarcest healthcare data at ~2.7x sales, and the unlock just went live — IQVIA.ai (Mar 2026) with 150+ agents and 19 of the top-20 pharma already using them — while the market still prices it as a sleepy CRO. High discrepancy, high convexity, low hype.

Catalysts

Next earnings: ~late July 2026 (Q1 reported May 5 — beat, EPS guide raised to $12.65–12.95)
IQVIA.ai adoption metrics: now 192 agents across 64 use cases; the thing to watch is the first monetization/attach-rate disclosure (Q1 call)
R&DS backlog $32.7B (+5.3%), Q4 book-to-bill 1.18x — bookings reacceleration is the proof point for the CRO half
Capital return: $552M repurchased in Q1, $1.2B authorization remaining
Duke Clinical Research obesity-trials collaboration (Feb 2026) — exposure to the fastest-growing trial category
De-leveraging from 3.62x net leverage ($13.9B net debt) is what frees the multiple

Similarweb (SMWB) *†

The deep-value outlier: ~0.7x EV/Sales on +15% growth while selling the datasets everyone uses to measure the AI-search era, with feeds and MCP integrations into AI workflows. Modeled (non-owned) data and nano-cap liquidity are the risks — but at this price the asymmetry is real.

Catalysts

Next earnings: ~mid-Aug 2026 (Q1 reported May 13)
The second large LLM training contract — management says expected 'over the coming quarters'; the first seven-digit deal signed in Q1 (Q1 6-K)
AI revenue trajectory: 11% of Q4 revenue and ~3x YoY — whether it keeps compounding is the whole thesis
RPO $297.7M (+18%) and 64% multi-year ARR — the contract-quality migration continuing
FY26 guide $307–315M (~10% growth); the low end was already raised once

Wiley (WLY) *

The proven licensor at a value multiple: ~1.9x EV/Sales with $92M of lifetime AI-licensing revenue already disclosed ($29M in a single quarter), an Anthropic partnership, and recurring inference pilots converting corporate R&D demand. Flat underlying growth is the offset and caps the convexity — but on pure metrics, paying under 2x sales for the rare publisher with demonstrated, repeatable AI revenue is a discrepancy. The flow is healthy too: submissions +25%. Watch item: whether the licensing line proves recurring rather than episodic.

Catalysts

Next earnings: Tuesday, June 16, 2026, pre-market — six days out. FY26 Q4 + the first FY27 guide; the AI-licensing run-rate disclosure is the number (notice)
AI recurring revenue is <10% of AI revenue today — management expects that proportion to triple next year; recurring-vs-episodic is the re-rate hinge (Q3 call)
OpenEvidence partnership: five-year multimillion licensing plus a Wiley equity stake — distribution into clinical AI
Nexus (licensing on behalf of partner publishers) at 36 partners — the agency model scales without new content cost
Emerald Publishing acquisition (Jun 2, 2026) adds proprietary research corpus to license
Q3 already raised margin/EPS guidance to the high end; ~4.5% dividend while you wait

Tempus AI (TEM) †

The growth-convexity pick: >$1B remaining contract value, the $200M AstraZeneca/Pathos foundation-model deal (non-exclusive — the motion can be resold), and ~6.5x sales on +83% growth. Not cheap, but the foundation-model optionality is large and real.

Catalysts

Next earnings: ~early Aug 2026 (Q1 reported May 5 — guidance raised) (Q1 8-K)
2026 guide raised to $1.59–1.60B revenue with ~$65M adjusted EBITDA — the operating-leverage inflection is now in the numbers
MRD volume ~6,500 tests in Q1, +500% YoY — Medicare reimbursement decisions are the swing factor
TCV >$1.1B across 70+ pharma data customers — each new (non-exclusive) foundation-model co-build re-rates the data annuity
Insights (data licensing) +44% in Q1 — the line that proves the moat is compounding

Sports-data duopoly: Genius Sports (GENI) † + Sportradar (SRAD)

The promising thing here is the duopoly's position, which both names share: legally exclusive rights to real-world events — the one category of data AI cannot generate — whose consumption AI multiplies (more priced micro-markets per game, CV-deepened datasets, AI media on the long tail, settlement-oracle demand from agents and prediction markets). Neither is framed as an AI story by the market, and at ~3.0–3.6x EV/Sales the multiples charge little for it. The choice between them is an expression preference, not a separate thesis: GENI (†) is the torque expression — ~31% growth at ~3.6x, the media/ad layer monetizing the rights a second time, and the Second Spectrum CV stack creating proprietary data beyond the feed — with the NFL renewal as the concentrated left tail that comes with the convexity. SRAD is the quality expression — twice the size, diversified across 80+ sports, profitable, founder-led — a steadier, more symmetric payoff that misses the † by one notch of judgment. Because their biggest idiosyncratic risks are mirror images (league concentration for GENI, operator in-housing of services for SRAD), the pair diversifies the single-contract risk while keeping full exposure to the shared thesis. Watch item for both: rights-cost inflation at renewal — the leagues are the residual claimants on the AI surplus.

Catalysts

Next earnings: both ~early Aug 2026 (both reported Q1 in early May)
Happening now: the FIFA World Cup (Jun–Jul 2026) — the biggest in-play betting volume event in the cycle, flowing through SRAD's MTS (SRAD Q1 call)
GENI: Legend acquisition closed May 1 → FY26 guide raised to ~$990M–$1.01B revenue / $270–280M EBITDA (~28% margin) — on the new guide the forward multiple is closer to ~2.5x than the trailing ~3.6x in the table (GENI Q1 call)
GENI: NFL rights locked through Super Bowl 2030; GeniusIQ targeted to automate the entire rights portfolio by end-2027; targets positive GAAP net income in 2027 and ≥60% uFCF conversion by 2028
SRAD: FY26 reaffirmed at 23–25% cc revenue growth / 34–37% EBITDA growth; IMG Arena synergies above the 25% target; >700k streamed matches in 2026
Both: prediction markets — SRAD calls commercial activity 'imminent, potentially material' (H2 ramp); GENI onboarded market makers in Q1
SRAD: short-seller reports surfaced this spring — CEO rebutted on the call; monitor rather than ignore

Planet Labs (PL) †

Compounding-by-design data (the Earth, daily) with AI as pure accelerant, a ~$900M backlog, and Golden Dome optionality. The caveat is the price: ~28x sales means you're paying for the story — convex, not cheap.

Catalysts

Next earnings: ~early Sept 2026 (FQ1'27 reported Jun 4 — record print) (Q1 8-K)
FY27 guide raised to $425–441M (+41% at midpoint); Q2 guide $102–107M with adjusted EBITDA breakeven-to-positive — the profitability crossover watch
Backlog $906M (+72%) and RPO $816M (+81%); ~40% of backlog converts within 12 months — unusual revenue visibility for the multiple
Pelican cadence: 3 launched in Q1 including Sweden's first sovereign reconnaissance satellite — sovereign-satellite deals are a new product line
$731M cash (post warrant exercises) funds the capex cycle; NGA $22M extension landed; Golden Dome task orders remain the unpriced option

ZoomInfo (GTM) †

The contrarian lottery ticket: ~1.7x sales with its Context Graph natively inside OpenAI's Codex for Work — agent-context positioning the market ignores while pricing it as the AI victim. The decay-treadmill data and shrinking revenue are why it's a small, binary position, not a core one.

Catalysts

Next earnings: ~early-mid Aug 2026 (Q1 reported May 11)
The trough test: FY26 guide cut to $1.185–1.205B (−4% at midpoint), Q2 guide $300–303M — whether this floor holds is the entire near-term thesis (Q1 call)
Agent embeds compounding: Salesforce's prospecting agent shipped with ZoomInfo as the first and primary external data provider across 150k+ customers; HubSpot shipped a native integration; connectors live for ChatGPT, Claude, Copilot, Perplexity with Gemini advancing
Pricing pivot: Copilot moving from seats to prepackaged credits/consumption — the model transition that decouples revenue from headcount
Management points to growth returning in H2 2027; 35% adjusted operating margin and cost cuts fund the wait

Honorable mentions just outside the rule: Gartner (IT) *† and Equifax (EFX) *† carry both marks but sit in the Latent quadrant — the data is elite and cheap, but the unlock hasn't shipped yet; they're the re-rate watchlist rather than the active list. As throughout: one analyst's qualitative framework, not investment advice.

Q&A

Deeper discussion behind the picks above, organized by name — questions raised against each thesis, answered in full. Read the winners first; come here for the weeds.

IQVIA (IQV)

Q: I thought I heard somewhere that a lot of the IQV data is not monetizable?

You heard something real — and it's the most important caveat on the IQVIA thesis, so worth unpacking properly. The claim has three solid legs:

First, IQVIA doesn't own most of its data the way Reddit or Tempus own theirs — it licenses it in. The prescription and claims data comes from pharmacies, PBMs, payers, and distributors under supply contracts that govern what IQVIA can do with it. That's a different property position from Reddit (owns its UGC outright under its terms of service) or Tempus (generates the data in its own labs as a byproduct of its own tests). Practical consequences: the supply contracts may restrict or simply not contemplate AI-training uses, meaning new monetization can require renegotiation; and once suppliers see AI value in their feeds, they can demand a bigger cut — the data-acquisition cost line can inflate to absorb part of any new revenue. IQVIA's own 10-K risk factors flag dependence on data suppliers for exactly this reason.

Second, a big chunk of the data people attribute to IQVIA isn't theirs to sell at all. The clinical-trial data from the CRO half (the old Quintiles) belongs to the pharma sponsors who ran the trials. IQVIA gains operational expertise and metadata from running them, but the underlying trial datasets are the clients' property. So "world's largest holder of clinical trial data" overstates what's actually licensable.

Third, privacy law boxes in the rest. The patient-level asset is valuable precisely because it's longitudinal and granular — and that's exactly what HIPAA de-identification standards, GDPR, and the newer state health-privacy laws make hard to hand over for model training, where re-identification risk and downstream-use control are unresolved questions. Compliance-wrapped enterprise analytics: fine. Shipping the corpus to a lab: mostly not.

Now the counterpoint, because the bull case survives in modified form: these constraints are exactly why IQVIA's unlock took the shape it did. It isn't licensing data out — it's building agents on the data (IQVIA.ai, the NVIDIA-built foundation models) and selling workflow outcomes, which keeps everything inside the compliance and contract perimeter. That's the monetization path that's actually available to them, and the 19-of-top-20-pharma adoption suggests it's working. The honest restatement of the thesis: you're not buying a Reddit-style licensing annuity; you're buying cheap (~2.7x sales) exposure to whether agents-on-regulated-data becomes a product category — with the risks being supplier economics, execution, and the CRO half's labor-pricing exposure to AI.

It does argue the Data 9 rating conflates two things the table mostly treats as one: data scarcity (genuinely elite) versus data ownership cleanliness (middling — licensed-in, use-restricted). Tempus, Reddit, Verisk, and Equifax score much better on the second dimension.

One flag on the specifics: the supplier-contract and sponsor-ownership structure described here is well-established, but current contract terms and the state of IQVIA's data-access disputes (including long-running litigation with Veeva over data access) haven't been re-verified in this pass.

Similarweb (SMWB)

Q: I am wary of companies that perpetually operate at a loss, where the stock continuously declines for many years. To me this suggests a fundamental problem with the business model, or management simply not aligned with shareholder value — more often than not these go to zero rather than turning around. Is there any indication Similarweb is actually doing something transformative right now? Is there any justification for their current losses?

The heuristic is a good one — companies that lose money for years while the stock grinds down do usually die rather than turn — but on Similarweb the factual premise has gone stale, and the recent filings answer both questions concretely.

On "perpetually operating at a loss": no longer quite true. In Q1 2026 the GAAP operating loss narrowed to $4.4M and GAAP net loss to $6.4M, while the company posted non-GAAP operating profit of $2.4M and normalized free cash flow of $6.6M — its tenth consecutive quarter of positive normalized free cash flow. 2025 was its second consecutive year of non-GAAP operating profit (~$13M FCF for the year), and it carries ~$65–72M cash, no debt, and an untouched $75M credit line. The remaining GAAP loss is small and consists mostly of stock-based comp and amortization — a dilution cost worth counting, but a very different animal from a cash-burning melting cube. Net cash, no debt, self-funding: the "goes to zero" mechanics (refinancing walls, forced raises) aren't present.

On "doing something transformative right now": real, quantified evidence, not just narrative. AI-related sales reached 11% of Q4 2025 revenue, up from 8% in mid-2025, with AI revenue roughly tripling year over year. In Q1 2026 it signed a seven-digit LLM data-training contract with an existing big-tech customer, with a second large LLM contract expected in coming quarters. Contract quality is improving in ways that suggest durability: remaining performance obligations grew 18% to $297.7M, 64% of ARR is now multi-year (up from 49% a year ago), and $100k+ customers grew 12% to 461. The transformation thesis, plainly: the same clickstream dataset that sold as marketing-analytics seats is being re-sold as LLM training/grounding data to big tech, and that line went from experimental to 11% of revenue in about a year.

The parts that support the wariness, because they're real: growth has decelerated — FY26 guidance implies ~10%, down from ~15% — and overall net revenue retention is 98%, meaning the core seat business churns slightly faster than it expands; the AI line grows on top of a flattish base. Large deals are lumpy (Q4 missed guidance on contract-timing slippage). And note the word "normalized" before the cash-flow figure: unadjusted free cash flow in Q1 was $(0.3)M — the cash generation is real but thinner than the headline. The long stock decline also reflects genuine history: a 2021 IPO at growth-stock pricing followed by years of guidance resets before the discipline arrived (a new CFO joined this year with a mandate explicitly framed around monetizing the data asset).

Verdict: this doesn't fit the classic to-zero profile — those companies have debt, burn, and no buyer for what they make; Similarweb has net cash, positive cash generation, and a dataset big tech just started writing seven-figure checks for. But it also isn't a proven transformation: a slightly-churning core with a fast-growing AI line that's still only ~11% of revenue, priced at under 1x EV/Sales. The bet, stated plainly: the AI mix shift outruns the core's decay before panel-collection methods get harder. That's exactly why it's sized as option-like rather than core. The single best thing to watch: whether the second large LLM contract lands and whether AI revenue keeps compounding off 11% — if that line stalls, the base-rate skepticism wins.

Wiley (WLY)

Q: My assumption is every major AI is going to need to ultimately contract with every major publisher. I don't have anything hard to go on when saying this — it just seems logical. What is your take on this matter?

The logic is sound as a directional bet, but it needs amending in two important ways — because the version as stated sits on top of a legal question that's actively being decided, and the form of the contracting matters more than the fact of it.

The case for the assumption is real, and it has four legs. First, scarcity economics: frontier labs are compute-rich and quality-data-constrained, and peer-reviewed scientific text is among the highest-value-per-token corpora that exists — concentrated in a handful of publishers (Elsevier, Springer Nature, Wiley, Taylor & Francis, Oxford/Cambridge). Second, the checks are rounding errors: the observed deals run roughly $10–40M a year per publisher against labs spending tens of billions on compute — the cost-benefit of just paying is lopsided. Third, enterprise customers increasingly demand provenance and indemnification, which only licensed data provides. Fourth, there's a cascade dynamic: once two or three labs license a corpus, the others face both a capability gap and a worse litigation posture for not licensing — and publishers are improving their coordination (Wiley's Nexus, licensing on behalf of 36 publishers, is exactly a move to bundle the long tail and raise the table stakes).

The amendment the logic needs: the courts have partially undercut the training version of it. In 2025, US federal rulings (the Anthropic books case and Meta's authors case) held that training on lawfully acquired copies is fair use — transformative — while Anthropic's ~$1.5B settlement was about the piracy of acquisition, not training itself. If that line of jurisprudence holds (the NYT–OpenAI case and appeals are the ones to watch), then for pure model training a lab arguably needs to buy one legitimate copy of everything, not sign a license with anyone. Add the supply-side erosion: open access now covers a large and growing share of new science (mandated by funders), preprint servers carry much of the frontier, and synthetic data reduces marginal dependence on any single corpus. So "every lab must license every publisher for training" is the weakest form of the thesis — training licenses may prove episodic, one-and-done archive purchases rather than annuities.

But the inference version of the assumption is much stronger — and that's the one that matters for Wiley. Fair use is a training-time doctrine; it does not cover an agent retrieving, reproducing, and serving current copyrighted articles at query time. A clinical AI citing this month's literature, a corporate R&D copilot grounding on vetted chemistry, a research agent that must point to the authoritative version of record — those need live, licensed, recurring access, and there's no fair-use route around it. That's a per-seat or per-query annuity, not a one-time archive sale. And this is precisely the pivot Wiley's own disclosures describe: recurring inference pilots with pharma/chemical/space companies, the OpenEvidence deal (five-year licensing into clinical AI, with equity), and management guiding the recurring share of AI revenue to triple. The vetting layer compounds the case — as AI floods the world with plausible text, the peer-review stamp becomes more valuable at retrieval time, not less. Jurisdiction helps too: the EU's text-and-data-mining opt-out regime pushes toward licensing more firmly than US fair use does.

Two sizing cautions so the logic doesn't overrun the numbers. Even if the assumption fully plays out, the per-publisher checks observed so far ($20–40M/yr range) would be transformative for Wiley's multiple (high-margin recurring revenue on a ~$1.7B base at ~1.9x sales) without being transformative for its P&L — the re-rate comes from the market capitalizing a data-annuity line, not from revenue doubling. And "every major AI" overstates the buyer pool: it's really the handful of frontier labs with consumer/enterprise products and US legal exposure; open-weight and non-US players may never pay, and a buyer pool of five-ish labs versus many publishers is an oligopsony where the labs hold pricing power — another reason the publisher-coalition motion (Nexus) matters strategically.

Compressed take: right in direction, wrong in mechanism. Restated: "most frontier labs will end up paying the few publishers whose vetted corpora can't be substituted — but the durable money is inference-time grounding, not training, and the training-license wave may prove episodic." The June 16 print is a direct test: the recurring-vs-episodic split in the AI line is exactly the number that adjudicates between the two versions. Tail risk to flag honestly: if appellate courts extend fair use further than expected — or open-access coverage gets good enough — the licensing imperative weakens at the margin, which is why the NYT–OpenAI outcome is a thesis input for the whole publisher group, not just Wiley.

Q: You suggest the journals are under fire from preprint services and open access. My take is that the peer-review system is an essential check on avoiding the 'human centipede' problem — where information on the open web is increasingly generated by AI, and so there are risks when AI in turn ingests that information. Does the review process for WLY journals differ from that of preprints and open access? And if so, is there substance to this take?

There's real substance to the take — and it actually sharpens the thesis — but it needs one correction first, because two different things are getting bundled under "open access."

The correction: open access is a payment model, not a review standard. Most open-access journals — including Wiley's own large OA portfolio, PLOS, Nature's OA titles — are fully peer-reviewed; "open" refers to who pays (the author/funder, via article processing charges, instead of the reading library). So "under fire from open access" is a pricing threat — it erodes the paywall/subscription model and compresses revenue per article — not a vetting threat. The category the human-centipede argument applies to is preprints (arXiv, bioRxiv, SSRN), which get only light screening: scope checks, plagiarism filters, basic moderation — no expert evaluation of methods, statistics, or claims before posting.

So yes, the process differs materially. A Wiley journal article passes through editorial triage by a domain editor, evaluation by typically two-plus independent expert referees who challenge methodology and claims, mandated revisions, increasingly statistical and image-integrity screening plus AI-generated-text detection, named-author accountability tied to institutions, and — crucially — post-publication infrastructure: corrections, retractions, and the citable version of record. A preprint has none of that. The difference isn't that preprints are wrong (most eventually pass review somewhere); it's that the journal version carries a verified provenance chain and a maintained error-correction mechanism.

And the underlying mechanism is documented, not just intuitive. The "model collapse" literature — most prominently a 2024 Nature paper — showed that models trained recursively on AI-generated data degrade, losing the tails of the distribution first. As AI-generated text floods the open web (and the preprint servers — they're getting hit too), the share of verifiably human-originated, expert-checked text shrinks as a fraction of available training and grounding material. That makes the peer-review stamp exactly what the question says it is: a provenance filter whose scarcity value rises with the pollution level. It's the cleanest version of the Wiley moat argument: they don't just own content, they own a certification process — and certification is the thing AI can't synthesize, because its value comes from accountable humans staking reputations on it. This is even stronger at inference time than training time: an agent citing a retracted or fabricated paper is a liability event, and only the publishers maintain the retraction/version-of-record signal that prevents it.

The honest caveat — it's a big one and it's Wiley-specific: peer review is a leaky filter, and Wiley owns the cautionary tale. Its Hindawi acquisition collapsed into the largest paper-mill scandal in publishing history — on the order of 10,000+ retractions, dozens of journals closed, the Hindawi brand itself shut down. Paper mills and AI-assisted manuscripts are now hammering journals, not just preprint servers; reviewers don't rerun experiments or audit raw data; and the submission surge Wiley reports (+25%) is partly the AI-writing flood arriving at their own front door. So the right framing isn't "peer review = clean, everything else = contaminated." It's that peer review is a costly, maintained, accountable filter — imperfect, but the only one with institutional machinery behind it — and its economic value depends on publishers actually defending it. Wiley's post-Hindawi integrity investment is, in that sense, capex on the moat.

Net for the thesis: the take upgrades the Wiley argument from "they own good text" to "they operate the trust layer of the scientific record in an era when trust is the scarce input." It implies the durable revenue isn't selling the archive once — it's selling certified, current, retraction-aware access continuously, which is the inference-annuity case from the question above arriving by a different road. Two things to watch: whether labs start paying explicit premiums for verified-human/vetted corpora over scraped web (early signs yes — that's what every one of these licensing deals implicitly is), and whether Wiley keeps its integrity record clean post-Hindawi — because in this framing a second paper-mill scandal isn't an embarrassment, it's impairment of the core asset.

Q: Does AI open the door for more automated verification and/or replication of research? It seems like it should be one of the tiers of peer review baked into every process.

Yes — and this might be the most underexplored part of the whole publisher thesis, because AI changes the cost curve of verification, which has always been the binding constraint. Peer review checks what's cheap to check (plausibility, novelty, methodology-as-described) and skips what's expensive (does the code run, are the numbers internally consistent, does the data support the claims, does it replicate). AI attacks exactly the expensive part. But the tiers are arriving at very different speeds.

Tier 1 — automated integrity screening: already here, and Wiley is deploying it. Image-manipulation and duplication detection (tools like Proofig and ImageTwin), statistical consistency checks (recomputing p-values from reported test statistics, checking whether reported means are even arithmetically possible given sample sizes), plagiarism and tortured-phrase detection, and paper-mill signature screening. Post-Hindawi, Wiley built and launched its own AI-powered paper-mill detection service, and the industry runs a shared STM Integrity Hub. This tier is becoming standard intake screening — it happens before human review, exactly the "baked-in tier" the question describes. It exists because the Hindawi-class scandals made the cost of not having it explicit.

Tier 2 — computational reproduction: technically feasible now, not yet standard. With data- and code-availability mandates spreading, an AI agent can literally re-execute the analysis pipeline: pull the deposited data, run the deposited code, check that the figures and tables regenerate, flag where they don't. This is the genuinely transformative one, because it converts "reproducibility" from a years-later social process into a pre-publication compile check. Nothing about it is science fiction — it's agentic code execution, which is mature. The blockers are economic and social, not technical: review labor is currently free (referees are unpaid volunteers), so any machine tier adds real cost per paper that someone must absorb; data/code deposits are still incomplete; and — the uncomfortable one — open-access publishing runs on volume economics (revenue per article published), so journals' financial incentive is to reduce friction at acceptance, not add it. The honest answer to "why isn't this baked in everywhere already" is mostly that one sentence.

Tier 3 — empirical replication: AI helps at the margins, robots are the long game. Wet-lab replication can't be done by a language model; it requires self-driving labs and cloud-lab infrastructure, which exist in narrow domains (chemistry, materials science) but are nowhere near general or cheap. What AI can do today at this tier is triage: models trained to predict which findings will replicate (DARPA ran a program on this) perform respectably, so a "replication-risk score" attached to papers could direct scarce replication resources. Full automated replication-as-review is a decade-scale story, not a product-cycle one.

The investment-relevant part cuts both ways for Wiley. The bullish reading: machine verification strengthens the certification product (a stamp meaning "human-reviewed AND machine-verified" is worth more than either alone, especially to AI labs buying training/grounding data — "verified corpus" becomes a premium SKU); and it favors scale, since integrity-tech stacks amortize across millions of submissions, squeezing small publishers toward consolidation or toward agency arrangements like Wiley's Nexus — the big five get bigger. The bearish reading to hold alongside it: if automated verification becomes good, cheap, and independent, it commoditizes part of what journals sell. The overlay-journal model — preprint servers plus automated verification plus lightweight expert curation (eLife has experimented in this direction) — is the disintermediation scenario, where the verification tier exists but publishers don't own it. The likely equilibrium: accountability, liability, retraction infrastructure, and the career-credentialing function keep the journals at the center for a long time, but the moat narrows from "we are the only filter" to "we are the accountable filter."

The deepest version of the point: AI won't just verify papers — AI will increasingly write and even conduct the science, generating a flood of machine-assisted findings. In that world the certification layer becomes more central, not less, but it has to evolve from "two human referees" to "human judgment supervising a machine-verification stack." Whoever operates that stack credibly owns the trust layer of the scientific record. The publishers are the incumbents with the brand, the workflow position, and (post-Hindawi) the motivation — but it's an asset they have to build, and the capex shows up before the pricing power does. For the June 16 call and beyond, the tell would be Wiley talking about verification as a product (verified-corpus licensing, integrity-as-a-service) rather than only as a cost line.

One adversarial caveat to keep honest: any automated check becomes a target — papers will be optimized to pass the machines, the way text is already optimized to evade AI detectors. So the equilibrium isn't "verification solves fraud"; it's an arms race in which the side with more data about fraud patterns wins — which, conveniently, is another argument for scale.

Q: I don't understand the WLY business all that well. Some of it seems to be related to journals. Nexus and clinical seem different. Can you break down these segments, providing a concise summary of each one, and a rough estimate of the percent they are contributing to overall revenue?

The map, with FY2025 actuals (fiscal year ended April 30, 2025; total continuing revenue ~$1.66B; segment figures from SEC filings via FMP):

1. Research — $1,075M, ~65% of revenue. The heart of the company and the entire AI thesis. Two sub-pieces:

Research Publishing (roughly $940M, ~57% of total): the ~2,000 academic journals. Revenue comes from library subscriptions and the newer "read-and-publish" institutional agreements, plus gold open-access article fees (author/funder pays per article), plus publishing services for the ~900 scholarly societies whose journals Wiley operates. The "Advanced" journal family (Advanced Materials etc.) alone exceeds $70M and grows double-digit. Highly recurring, calendar-year renewal cycle.
Research Solutions / Intelligence (roughly $130–140M, ~8%): the picks-and-shovels — the Atypon publishing platform, society services, databases, and corporate/R&D-facing data products.

Everything stress-tested in the questions above lives in or attaches to Research: the AI licensing deals (journal corpus → labs), OpenEvidence (journal content → clinical AI), Nexus, and the Emerald acquisition (more journals).

2. Learning — $585M, ~35% of revenue. The other, slower half:

Academic (~$320–330M, ~19–20% of total): university textbooks and digital courseware. Structurally pressured (used/rental/OER substitution) but stabilized by digital.
Professional (~$255–260M, ~15%): trade and professional books — the For Dummies franchise, business/finance titles — plus corporate talent-development assessments (Everything DiSC, Five Behaviors). Book-corpus AI training deals get booked here (the original $23M FY24 GenAI deal was book content).

The two flagged items, properly placed — neither is a segment:

Nexus is a service inside Research, launched ~2025: Wiley acts as licensing agent for other publishers' content into AI developers — 36 partner publishers signed. Revenue today: not disclosed, almost certainly immaterial (low single-digit millions at most). Its significance is structural, not financial yet: a take-rate marketplace that scales without Wiley creating content, and it aggregates the long tail of publishers into a single negotiating bloc against the labs.

"Clinical" = COA (Clinical Outcome Assessments), also inside Research: Wiley licenses validated patient questionnaires and assessment instruments — the standardized forms used as endpoints in drug trials — to pharma, with an IQVIA distribution agreement. Per the Q3 call it grew from $0.8M in 2021 to ~$7M — i.e., under 0.5% of revenue. It's a proof-of-concept for "Wiley content as regulated-workflow infrastructure," not a needle-mover.

And the line that cuts across both segments: AI licensing was ~$40M in FY25 (~2.4% of revenue), $29M in Q1 FY26 alone — booked in Research when it's journal content, Learning when it's books.

The concise mental model: ~57% journals (the moat and the AI story), ~8% publishing infrastructure, ~35% books and courseware (the slow-melt legacy that pays the ~4.5% dividend), with AI licensing at ~2–3% and growing fast, and Nexus/COA as currently-tiny options on the agency and clinical-infrastructure models. The valuation question is whether the 65% earns a data-asset multiple before the 35% erodes — which is why the recurring-AI-revenue disclosure on June 16 matters more than any segment line.

Tempus AI (TEM)

Q: My concern with TEM is that nearly every high-profile bio IPO comes to market at an astronomical valuation, and as the reality and complexity of what they intend to achieve becomes evident, it's a continual erosion in price. Are there pathways where TEM could have 5x more revenue? What would various forward-growth scenarios look like?

The concern is well-founded — the "IPO at an astronomical multiple, then erode for years as reality intrudes" pattern is the base rate for high-profile diagnostics listings (Schrödinger, Recursion, 23andMe, Guardant and Exact Sciences for long stretches). The mechanism is real: TAM slides assume flawless execution, reimbursement arrives slower than modeled, and the multiple de-rates from "platform" to "lab services" as growth normalizes. So the right question isn't whether TEM can 5x in the abstract — it's whether there are specific, fundable pathways to ~$8B and what has to be true for each, held against that gravitational pull.

The math that disciplines everything. 5x the FY26 guide (~$1.6B) is ~$8B. The street already models TEM to ~$3.1B by 2030 — roughly 2x, an ~18% CAGR. So 5x is on no current analyst sheet: it needs ~26% revenue CAGR sustained for seven years, or a step-change from new modalities/M&A the consensus isn't underwriting. That gap is where both the upside and the erosion risk live. Today's business is ~$955M Diagnostics + ~$300M+ Data & Applications, and the Data line — not the lab — is what makes the 5x debate interesting, because it carries software economics the erosion-pattern names usually lack.

The five pathways to 5x — what each contributes and what must be true:

1. Genomics volume + menu expansion (the base engine). Oncology testing compounding ~30% with new assay launches and pull-through. This is the most reliable lever and roughly tracks the street's ~2x to $3B. On its own it does not get to 5x — it's the foundation the other four build on. Must be true: volume growth doesn't decelerate below ~20% as the base scales.
2. MRD / minimal residual disease (the highest-torque organic lever). MRD volume already grew ~500% YoY off a small base. Recurrence-monitoring is recurring by nature — patients test repeatedly over years — so it converts a one-time diagnostic into an annuity. A credible MRD franchise is a multi-hundred-million to billion-dollar line by itself. Must be true: Medicare/commercial reimbursement decisions land favorably; competition (Natera's Signatera is entrenched) doesn't cap share.
3. Data & licensing scale-up (the multiple-justifying lever). TCV >$1.1B, 70+ pharma customers, Insights growing ~40%+, NRR ~126%. If this line compounds into a $1.5–2.5B business, it re-rates the whole company because it's high-margin and AI-grounding-relevant — the part that makes TEM a data owner rather than a lab. This is the lever most aligned with your own thesis. Must be true: pharma keeps paying for de-identified multimodal data at scale, and non-exclusive foundation-model co-builds materialize.
4. New modalities / adjacencies (the step-change lever). Hereditary (Ambry), cardiology, neuro/psychiatry, radiology AI, and algorithmic diagnostics (Lens). Each new clinical area is a new TAM. This is how a diagnostics company breaks the ~$3B ceiling — but it's also where complexity and the erosion risk concentrate. Must be true: at least two adjacencies reach real scale without diluting focus or burning the balance sheet.
5. M&A / consolidation (the inorganic lever). Tempus has been acquisitive (Ambry, Deep 6, etc.); part of any 5x plausibly comes from buying revenue, funded by stock if the multiple holds. Must be true: the equity stays strong enough to be a currency — which is circular, and the first thing the erosion pattern breaks.

Scenario table — forward revenue, with the levers each requires:

Scenario	~Revenue (7y) / multiple of today	Implied CAGR	What has to go right	Read
Erosion case	~$2.5–3B · ~2x	~8–10%	Genomics decelerates, MRD reimbursement stalls, Data growth normalizes; no new modality scales. Execution fine, story de-rates.	Your base-rate fear — revenue still grows but the multiple compresses faster; the stock erodes even as the company "works."
Street case	~$3.1B (2030) · ~2x	~18% (to 2030)	Levers 1–2 deliver, Data compounds steadily, no heroic adjacency. The consensus model.	Consensus — solid, already in estimates; ~2x, not 5x. The market is paying ~6.5x sales for this.
Bull case (5x)	~$8B · ~5x	~26%	Levers 1–3 all deliver and at least one adjacency (lever 4) reaches scale; Data line hits $1.5–2.5B. Seven years of high-20s growth with no balance-sheet break.	Fundable, not forecastable — requires 4 of 5 levers firing; each is plausible, the conjunction is demanding.
Moonshot	~$12B+ · ~8x	~33%+	All five levers, including transformative M&A funded by a holding multiple, plus Data becoming a true foundation-model data platform. Tempus becomes the clinical-data layer of medical AI.	Lottery upside — the reason to hold a small position; do not underwrite to it.

The honest synthesis. Yes, there are real pathways to 5x — and unlike a pure therapeutics binary, they're incremental and observable: you watch MRD reimbursement, Data TCV, and adjacency scaling quarter by quarter, so you're not betting on a single trial readout. That's what makes TEM less of a classic erosion candidate than the pattern suggests. But the 5x case requires four of five levers to fire over seven years, and the erosion pattern you describe is precisely what happens when two or three fire and the multiple — today ~6.5x sales — does the rest of the work downward. The single most important variable is lever 3 (Data): if the licensing line compounds into a genuine multi-billion data platform, TEM re-rates as a data owner and the 5x is reachable; if Data stalls and TEM stays a fast-growing lab, the street case caps it at ~2x and your erosion thesis likely wins on the multiple even if revenue grows. The tell to watch each quarter: is Data & Applications growing faster than Diagnostics? As long as it is, the data-platform thesis is intact; the quarter that flips is the quarter the erosion case gains the upper hand. Usual caveat: scenario framework, not a forecast — and explicitly not investment advice.

Sports-data duopoly — Genius Sports (GENI) + Sportradar (SRAD)

Q: Regarding SRAD and GENI, how does AI change the landscape for sports data? I understand the bull case relating to a lot more prediction markets increasing demand for sports betting. But how does AI — if at all — increase the value of this data?

The prediction-market bull case is really a demand story; the more interesting question is whether AI changes the unit value of the data itself. Five mechanisms where AI genuinely increases value, and two where it leaks away.

1. AI multiplies the number of sellable markets per game. The binding constraint on in-play betting has never been bettor appetite — it's pricing capacity. A human-supervised trading desk can only run so many live markets; AI models can price thousands of micro-markets simultaneously (next pitch, next possession, player props that re-price every few seconds). Every additional market is additional consumption of the underlying feed, and it's exactly what Sportradar's managed trading service (MTS) sells: operators outsource the model because building it in-house is hard. AI raises the ceiling on markets-per-event, and the official feed is the raw input for all of them.

2. Computer vision turns each game into orders of magnitude more data. The most underappreciated piece. Traditional play-by-play is hundreds of events per game; optical tracking is millions of positional coordinates. Genius owns this capability outright — it acquired Second Spectrum, which does the player-tracking for the EPL and NBA-grade optical work — and Sportradar's equivalent is its 4Sight/computer-vision stack. AI is simultaneously the collection technology (video → structured data, collapsing the cost of capturing depth) and the demand driver (those coordinates feed augmented broadcasts like BetVision, automated officiating support, coaching analytics, and richer bet types). The same rights now yield a much bigger dataset.

3. AI-generated media makes the long tail monetizable. Automated commentary, recaps in any language, and synthetic broadcast layers mean a third-tier table-tennis match or a lower-division soccer game — events where human production never penciled — can now carry a produced, bettable, watchable product. The data is the script for all of it. Both companies cover hundreds of thousands of events a year; AI raises the revenue per event at the bottom of that pyramid from near-zero to something.

4. Agents and prediction markets need a settlement-grade truth oracle. If AI agents are placing bets or trading event contracts, they need machine-readable, licensed, low-latency, legally safe ground truth — both to act on and to settle against. Scraped data doesn't work for settlement; official data is the oracle. The Kalshi–Sportradar deal is the early template. In an agentic world, the API call to the official feed is the toll booth.

5. Integrity services grow with the attack surface. Thousands of AI-priced micro-markets are also thousands of manipulation targets, and AI lowers the cost of coordinated fixing. Both companies sell integrity monitoring to leagues and regulators — that business scales with exactly the complexity AI creates.

Now the two leaks. First, the value-add layer can commoditize even as the feed doesn't. A Flutter or DraftKings can build its own AI trading models — they still must buy the raw rights-protected feed, but the services margin faces in-housing pressure from sophisticated operators, even as smaller ones outsource more. Second — and this is the big one — the leagues are the residual claimants. Every mechanism above raises the value of official data, but the rights are re-auctioned, and the leagues know what AI is doing to that value. AI also lowers the leagues' cost of collecting their own data (cameras plus computer vision), strengthening their threat to go direct. So a large share of the AI-created surplus gets transferred upstream at each renewal — that's why rights-cost inflation is flagged as the key endogenous concern for both. The NFL taking warrants in Genius is exactly this dynamic made explicit.

One marginal erosion worth knowing: AI makes unofficial data reconstruction cheaper — computer vision on a broadcast or pirate stream can rebuild play-by-play seconds behind real time ("courtsiding 2.0"). The official feeds keep the latency edge (in-venue collection) that matters for live betting, but enforcement against synthetic scraping becomes part of the moat maintenance.

Net read: AI is unambiguously positive for the value of the data — it multiplies markets, depth, monetizable events, and machine consumption, while the asset itself (real-world events under legal exclusivity) is the one thing AI can't generate. The investment question is the split: the duopoly's economics depend on whether market expansion and CV-driven product depth outrun what the leagues claw back at renewals. The line to watch in both names' filings: take-rate and gross margin on one hand, rights amortization and renewal terms on the other. The duopoly structure helps here too — with only two credible bidders for league rights, the auctions are less ruinous than they'd be with five.

Q: What is the distinction between GENI and SRAD on a high level?

At a high level they're the diversified incumbent versus the concentrated challenger — same business model, opposite portfolio construction.

Sportradar is the global utility. Founded in 2001, still run by founder Carsten Koerl (who remains a major holder), it's roughly twice Genius's size, profitable, and built on breadth: coverage across 80+ sports and on the order of 900k events a year, with marquee exclusive rights like the NBA, MLB, NHL, ATP tennis, and UEFA. Its strategy is full-stack: not just selling feeds but running the betting plumbing — live odds, managed trading services (MTS, where it effectively operates the risk book for bookmakers), streaming (bolstered by acquiring IMG Arena's portfolio), advertising (ad:s), and integrity. Because of the breadth, no single league renewal can break it, and the model behaves like an infrastructure compounder: take-rate expansion on a diversified rights base.

Genius is the concentrated marquee-rights play. It came public via SPAC in 2021 and is built on a handful of premium Anglo-American exclusives — most importantly the NFL official data rights (won in a 2021 bidding war against Sportradar, paying up and giving the NFL equity warrants), plus the NCAA and the Premier League (via Football DataCo). Two structural tilts distinguish it: first, the technology angle — its Second Spectrum acquisition made optical/skeletal tracking a core asset (it ran the NBA's tracking for years and does the Premier League's), which feeds products like BetVision, the in-stream NFL betting broadcast; second, the media layer — programmatic ads and fan-engagement products that monetize the same rights a second time. Higher growth (~31% vs ~12% on the FMP pull), but with concentration to match: the NFL relationship is both its crown jewel and its single point of failure, and the warrant structure means the league literally participates in the upside.

The duopoly framing is real but asymmetric: they've largely partitioned the rights map (Sportradar = global basketball/baseball/tennis/soccer breadth; Genius = NFL/NCAA/EPL depth), and the 2021 NFL auction is the cautionary tale of what happens when they do collide — rights inflation that took Genius years to digest. That's also why the leagues-as-residual-claimants point from the question above bites differently for each: a bad renewal is a margin headwind for Sportradar but a thesis event for Genius.

As investments: SRAD is the quality/steadiness expression — diversified, founder-led, profitable, the way to own the category with the least single-contract risk. GENI is the torque expression — more growth, more optionality (CV tech, BetVision, the media layer scaling against fixed rights costs), but the returns hinge on NFL economics at each renewal and on the media layer actually scaling. Roughly similar multiples (~3.0x vs ~3.6x EV/Sales), so the market isn't charging much for Genius's higher growth — arguably because it's pricing the concentration risk.

One caveat: exact current rights terms and renewal dates (especially the NFL deal's latest extension structure) should be verified in filings before leaning on them — the contours above are solid, but contract specifics move.

Q: Do SRAD and GENI differ substantially in how they are adopting AI technology or making their data more valuable in an AI era?

Substantially in emphasis, yes — even though the buzzword surface looks similar. The cleanest way to put it: Genius is applying AI to the data-capture and presentation layer; Sportradar is applying it to the pricing, risk, and personalization layer. One is making the data itself richer; the other is making each unit of data earn more.

Genius's AI center of gravity is computer vision. Second Spectrum gave it arguably the best sports-CV team in the business — years of running the NBA's official optical tracking, plus the Premier League's — and Genius has organized its product strategy around that: the GeniusIQ platform unifies the tracking layer, and the flagship expression is BetVision, the NFL stream with live odds and bet placement embedded directly in the broadcast, built on real-time CV understanding of the game. The same capability feeds its media/ads layer (audience products like FANHub targeting fans off sports data). So Genius's version of "making data more valuable in the AI era" is generative of new data: turning video into millions of skeletal coordinates per game, then turning those coordinates into augmented experiences and ad inventory. It's expanding what the rights yield.

Sportradar's AI center of gravity is the betting economics stack. Its signature AI products sit downstream of the feed: Alpha Odds (AI-driven, dynamically personalized odds that let each operator differentiate pricing), the MTS managed-trading business (ML risk management run as a service — effectively the outsourced quant desk for hundreds of bookmakers), the Vaix acquisition (deep-learning personalization and recommendations inside sportsbooks), and AI-driven integrity monitoring across its market surveillance. It has CV too — 4Sight and the Synergy Sports acquisition — so the capability gap is narrower than the positioning gap, but Sportradar's AI shows up as better decisioning per event: more markets priced, sharper risk, higher take per bet across an enormous diversified event base. Notably, it's also the one with the early agent/prediction-market posture — the Kalshi relationship makes it the settlement-oracle precedent.

Mapped to the five mechanisms above: Sportradar is strongest on mechanism 1 (AI multiplying priced markets per game) and 4 (the truth-oracle role), Genius on mechanism 2 (CV multiplying data per game) and 3 (AI-generated/augmented media). Both sell mechanism 5 (integrity). Which orientation wins depends on where the surplus lands: if the AI era's prize is decisioning — odds, risk, personalization at scale — Sportradar's infrastructure position compounds; if the prize is experience and attention — augmented broadcasts, in-stream betting, fan-data advertising — Genius's CV-first stack has the sharper edge. There's also a defensiveness asymmetry: Genius's CV layer creates proprietary data beyond the official feed (tracking data the league rights alone don't give you), while Sportradar's decisioning layer is the part big operators could most plausibly in-house — which loops back to the services-commoditization leak.

Caveats in the usual spirit: the product specifics here (Alpha Odds, Vaix, GeniusIQ, BetVision, Synergy, the Kalshi structure) are from desk knowledge plus this session's lighter pass — current scope hasn't been verified against the latest filings, and both roadmaps move fast. The thing to verify in a deep-dive batch is each company's disclosed revenue mix by product line — how much of Sportradar is now MTS/managed services and how much of Genius is BetVision/media — because that's where the AI orientation shows up in numbers rather than press releases.

Planet Labs (PL)

No questions logged yet.

ZoomInfo (GTM)

No questions logged yet.

Appendix: the peer-reviewed publishers compared — who is actually transforming for the AI era

The lens here is not who leverages journals more — it is who is embracing the AI era in a way the market under-recognizes, judged by the trajectory or transformation underway relative to company size. Financials from the main table (FMP, June 9 2026): RELX ~$92B cap / ~9.0x EV-S / +7%; Wolters Kluwer ~$38B / ~6.0x / +6%; Wiley ~$2.3B / ~1.9x / ~flat. The multiples are the recognition gap stated numerically.

	RELX	Wiley (WLY)	Wolters Kluwer (WTKWY)
(a) Journal holdings — rank & domain	Elsevier is the #1 journal publisher by articles and citations (~2,900 journals; The Lancet, Cell Press), dominant in life sciences and medicine. On top sits a layer no one else has: Scopus indexes ~29,000 journals from ~7,000 publishers — RELX owns the citation graph of everyone's journals, not just its own. Journals are roughly a third of company revenue; Risk and Legal are bigger.	#3 commercial publisher (~2,000 journals), broad-spectrum with genuine franchise strength in materials science and chemistry (the Advanced family, >$70M and growing double-digit) plus exclusive publishing partnerships with ~900 scholarly societies — content it monetizes without owning outright. No citation-graph layer. Journals are ~57% of revenue — the only one of the three where the journal thesis is the stock thesis.	Smallest journal estate: Lippincott's ~300 medical/nursing titles, largely society-owned, narrow clinical domain. But the crown jewel sits above journals: UpToDate — 7,600+ expert contributors continuously synthesizing the literature into care recommendations. A derived, perpetually-current expert layer, arguably more AI-grounding-relevant than raw journals.
(b) Journal-value rank	1 — largest, highest-impact corpus + the cross-publisher Scopus layer	2 — top-tier breadth, society leverage, no meta-layer	3 on journals per se — but UpToDate is a different, highly defensible asset class
(c) Brokering data beyond own journals (Nexus / clinical-style motions)	Deliberate non-broker. Refuses corpus licensing to labs; everything embeds in its own grounded products — Scopus AI, ScienceDirect AI, Reaxys/Embase/ClinicalKey AI, Lexis+ AI and Protégé — and it is buying AI-natives (Doctrine, Apr 2026, second legal-AI deal in 24 months). The one quasi-brokering position: Scopus AI monetizes 7,000 other publishers' abstracts inside RELX's product — aggregation at the metadata layer.	The only true broker of the three. $92M lifetime external AI licensing (three big-tech clients incl. AWS; Perplexity; Anthropic partnership); recurring inference pilots with pharma/chemical/space corporates; OpenEvidence licensing plus equity stake; Nexus — licensing agent for 36 partner publishers' content (brokering others' data on a take rate); COA clinical instruments distributed via IQVIA; Emerald acquired (Jun 2026) as more corpus to license.	Middle path: distribution-brokering, not corpus-brokering. No licensing to labs, but pushing curated data into third-party surfaces: UpToDate inside Microsoft Dragon Copilot, M365 Copilot, Teams; an Epic pilot pairing GPT-4 with UpToDate content; an expanded OpenAI enterprise collaboration; and Medi-Span Expert AI shipping an MCP server so third-party agents can consume its medication data — genuine agent-layer exposure.
(d) Brokering rank	3 — by explicit choice; highest in-product monetization instead	1 — corpus licensing + agency model + clinical infrastructure, all live	2 — partner distribution + MCP/agent exposure, no corpus sales
(e) Transformation summary — embracing the AI era	The most advanced AI operator — and the least transformed. Every division has shipped grounded-AI SKUs and it is consolidating AI-native challengers by acquisition. But this is sustaining innovation: AI features defending and extending an already-premium subscription machine. The business model is unchanged; the strategy is to make sure nothing changes.	Business-model transformation, not feature addition. From subscription publisher to data licensor + licensing agent + clinical-infrastructure provider, with equity stakes in AI distribution (OpenEvidence). AI revenue went 0 → $40M (FY25) → $29M in Q1 FY26 alone, with the recurring share guided to triple. The company is becoming something it wasn't.	Fast, urgent — and defensive. Under direct attack: OpenEvidence (free, ad-funded, $12B valuation, Jan 2026) is aimed squarely at UpToDate's ~$595M seat-license business. The response is real and rapid — Expert AI signed by >half of US hospital enterprise customers (~2,000 hospitals) within months — but the transformation protects existing dollars more than it creates new lines.
(f) Transformation magnitude vs company size	Low–moderate — large absolute AI investment, immaterial against ~$12B revenue; fully recognized at ~9x sales	High — a new revenue category already ~2.5% of sales and compounding, on a $2.3B company at ~1.9x sales; the multiple says the market sees a books company	Moderate — product architecture genuinely rebuilt, but on ~€5.9B revenue the motion is value-protective; the market nets the OpenEvidence threat against the progress at ~6x sales

Verdict through the stated lens. If the question is who is most transforming to embrace the AI era, under-recognized by the market, the answer is Wiley, and it isn't close — it is the only one of the three where AI is creating a genuinely new business (external licensing, the Nexus agency, clinical infrastructure, equity in AI distribution) that is material relative to the company's size, priced at a multiple that embeds none of it. RELX is the best AI operator of the three, but operating excellence on a recognized premium franchise is the opposite of an under-recognized transformation — at ~9x sales the market has already written the AI-winner thesis into the price. The interesting second-order case is Wolters Kluwer: the >50% enterprise adoption of Expert AI in a matter of months is the single most impressive execution statistic among the three, and if OpenEvidence stalls at the hospital-procurement gate — where governance, liability, and expert-in-the-loop validation matter more than free access — the market's threat-discount becomes the under-recognition. Wiley is the transformation bet; Wolters Kluwer is the survival-mispriced bet; RELX is the quality compounder where the AI question is already answered in the multiple. Usual caveat: qualitative framework, not advice — and the June 16 Wiley print is the nearest falsification test for the first claim.