Drop-in Python middleware that slashes vision LLM costs by pruning low-information video frames before they hit the model.
Customer: Solo ML engineer or indie dev building a video Q&A / summarization SaaS (e.g. ‘ask questions about your Loom recordings’) who is self-hosting LLaVA or InternVL and watching their GPU bill climb as video length grows
Problem: Every extra frame fed to a vision LLM multiplies token count quadratically — a 2-minute video can easily blow past context limits or cost $0.30+ per query, making the unit economics of a video-AI product unworkable at small scale
Pricing: open-core — $800 MRR in 4 months (16 teams × $50/mo for hosted compression API + priority support; core OSS library stays free)
Why now
InfoMerge and similar papers just proved information-weighted token compression works without hurting accuracy — the research validation just landed, but there’s no production-ready pip-installable library yet. Window is 3–6 months before a VC-backed MLOps vendor packages this.
Go-to-market
- Ship the OSS library to PyPI and post a technical breakdown on r/LocalLLaMA and Hacker News Show HN — target the self-hosted vision LLM crowd who already feel the token-cost pain
- Write one concrete benchmark post: ‘We cut LLaVA video tokens by 60% with <5% accuracy drop on ActivityNet-QA’ — publish on Substack and cross-post to Hugging Face blog for SEO and credibility
- Offer a hosted FastAPI endpoint (compression-as-a-service) at $50/mo flat via a simple waitlist form — DM the first 20 signups personally on Discord/Twitter to onboard and gather feedback
- File 5–10 targeted GitHub issues on popular video-LLM repos (Video-LLaVA, LLaMA-VID, etc.) with a benchmark showing your lib as a preprocessing step — converts maintainers into unpaid distribution channels
Moat (or lack thereof)
No real moat. This is a thin wrapper around published research that a well-funded team can replicate in a sprint. The only defensible edges are: (1) being first to have a polished, well-benchmarked OSS library so your name becomes the default import, and (2) accumulating benchmark data across model/video-type combinations that takes time to reproduce. Treat it as a wedge to a consulting relationship or a larger video-AI tooling product, not a standalone defensible business.