A hosted interactive playground that lets ML engineers viscerally see vocabulary sparsity in real prompts — making the NanoSpec/dynamic-pruning case without reading a paper.

Customer: ML infrastructure engineer at a startup or mid-size AI company (5–200 people) who is tasked with cutting inference costs on an LLM deployment and needs to justify pruning/speculative-decoding experiments to a skeptical tech lead.

Problem: The NanoSpec insight (‘95% of vocab logits are noise for any context’) is compelling on paper but hard to internalize — engineers don’t budget time to run HuggingFace notebooks to see it themselves, so pruning projects die in the ‘sounds risky’ phase before they start.

Pricing: freemium — $800 MRR in 4 months (16 teams × $50/mo Pro tier)

Why now

NanoSpec and similar speculative-decoding papers just landed; inference cost pressure is acute in mid-2026 as GPU prices remain high and OpenAI/Anthropic API margins squeeze product teams. There’s a narrow window before the insight gets absorbed into standard tooling.

Go-to-market

Post a single-prompt live demo (no sign-up) to Hacker News ‘Show HN’ and r/MachineLearning — the Plotly token-level heatmap is inherently shareable and screenshot-able; aim for 200 upvotes as social proof before charging anything.
Write one tight blog post: ‘I ran 500 real prompts through Llama-3 and 92% of the vocabulary is dead weight’ — publish on Substack + cross-post to The Batch and import to your own domain for SEO; link directly to the tool.
DM the first 20 people who star the GitHub repo and ask one question: ‘Are you running inference in prod? Would a team seat ($50/mo) that adds model comparison and CSV export be useful?’ — use replies to validate or kill the paid tier.
Offer a free ‘vocabulary audit report’ (PDF export of sparsity stats for their specific model + prompt distribution) as the Pro upsell hook; this converts tool curiosity into a concrete deliverable engineers can attach to an internal cost-reduction proposal.

Moat (or lack thereof)

No real moat. A HuggingFace Space or a Colab notebook can replicate the core visualization in a weekend. The only defensible ground is being first to be the canonical shareable link people paste in Slack (‘just look at this’), accumulating backlinks, and potentially building a small corpus of community-submitted prompt analyses. That’s distribution stickiness, not a technical moat — be honest about this and move fast.