A drop-in PyTorch benchmark kit that proves NanoSpec-style vocabulary pruning cuts draft-model latency 3–5×, sold as a one-time purchase to LLM inference engineers who need a credible PoC to justify infra changes.
Customer: A solo ML engineer or small-team inference lead at a Series A–C AI startup who is already running speculative decoding (e.g., vLLM or TGI) and needs hard numbers to pitch their CTO on switching draft-model architecture — not a researcher, but a practitioner who ships prod systems.
Problem: Speculative decoding is widely adopted but the vocabulary projection layer in the draft model is a known bottleneck that almost nobody has measured cleanly in isolation. Engineers know the theory but lack a reproducible, well-instrumented PoC they can run on their own hardware and show to leadership in an afternoon.
Pricing: one-time — $800 in month 1 (16 sales at $49), $300 passive by month 3
Why now
NanoSpec and InfoMerge landed in the same inference-efficiency cluster right as vLLM v0.5+ made speculative decoding production-ready — practitioners are actively evaluating optimizations and searching for benchmarks to validate vendor claims. The timing means the search intent exists before established tools do.
Go-to-market
- Post a detailed teardown thread on X/Twitter and r/LocalLLaMA showing your actual latency numbers (e.g., ‘Projection layer alone: 4.1× faster on Llama-3-8B draft’) with a Colab link — gate the clean repo behind the purchase.
- Submit to Hacker News ‘Show HN’ on a Tuesday morning with a headline focused on the benchmark result, not the theory — engineers upvote reproducible numbers.
- DM 10–15 inference engineers who publicly complained about speculative decoding overhead (search X for ‘vllm speculative slow’ or ‘draft model bottleneck’) and offer a free copy in exchange for a benchmark screenshot from their hardware.
- List on Gumroad with a ‘pay what you want, minimum $29’ to capture price-sensitive OSS folks while letting fans pay $99 — use the spread to learn willingness-to-pay before raising the floor.
Moat (or lack thereof)
No moat. This is a first-mover attention grab, not a defensible product. Once the idea is public, anyone can replicate it in a weekend. The only durable value is reputation: if your numbers are cited in blog posts or papers, you become the go-to name for this benchmark. That’s a weak but real advantage for selling consulting or a follow-on tool.