AI Pulse

A hosted benchmarking tool that generates a vocabulary-mismatch audit report comparing BM25 vs. SPLADE on your own document corpus — delivered as a PDF in 10 minutes.

Customer: ML engineer or AI lead at a 5–50 person startup who inherited or built a RAG pipeline on BM25 and is getting pressure from product to improve retrieval quality, but doesn’t have weeks to run ablation studies themselves.

Problem: They know learned sparse methods like SPLADE probably outperform BM25 on their domain, but they can’t justify the migration cost to leadership without concrete numbers on their own data — not on BEIR benchmarks. Running it themselves requires stitching together Pyserini, HuggingFace, GPU infra, and evaluation harnesses, which takes days they don’t have.

Pricing: one-time — $1,500 in one-time sales within 3 months (~15 reports at $99 each)

Why now

The RAG improvement wave (late 2025–2026) has created a category of engineers who’ve already shipped RAG v1 on BM25 and are now in the ‘should we upgrade retrieval?’ evaluation phase. They need decision-support artifacts, not tutorials.

Go-to-market

  1. Post a free public version on HuggingFace Spaces that runs BM25 vs. SPLADE on a toy 500-doc corpus — capture emails from people who want to run it on their own data.
  2. Write one very specific blog post titled ‘How much does vocabulary mismatch hurt your RAG pipeline? Here’s how to measure it in an afternoon’ and post it to Towards Data Science + the r/MachineLearning and Latent Space Discord — link to the paid report tool.
  3. DM 20 people who’ve posted about RAG retrieval quality on Twitter/X or LinkedIn in the last 60 days, offering a free first report in exchange for a testimonial and a brief async interview.
  4. List on Gumroad as a ‘service product’ ($99 one-time): customer uploads a .jsonl corpus + query set, you run the harness and return a PDF report with MRR@10, NDCG@10, latency, and a plain-English migration recommendation within 48 hours.

Moat (or lack thereof)

No real moat. Anyone with the same stack can replicate this. The edge is speed-to-insight (10-min framing vs. DIY days) and the opinionated PDF report format that busy engineers can paste into a Notion doc for their team lead. If this gets traction, a competitor could clone it in a weekend. The only durable advantage would be corpus-type specialization (e.g., ‘for legal documents’ or ‘for code search’), which would require deeper domain investment.