CLI benchmark tool charging researchers per eval run to score LLM temporal video understanding against ground-truth annotations

Customer: ML engineer at a startup or university lab building video-understanding pipelines — has budget, no time to build eval infra, needs reproducible numbers for paper/demo

Problem: No lightweight, self-hostable way to run VSTAT-style temporal benchmarks locally; official eval infra is slow, gated, or nonexistent for custom datasets

Pricing: one-time — $800 in first 3 months via one-time license sales ($49/seat)

Why now

Embodied AI / video LLM wave hitting simultaneously with benchmark hunger — teams shipping video agents need eval scores fast; Claude claude-opus-4-5 vision + OpenCV makes this buildable in days

Go-to-market

Post CLI + demo JSONL dataset on Hacker News ‘Show HN’ targeting ML practitioners — link to GitHub with one-command install
Drop in r/MachineLearning and relevant Discord servers (Eleuther, Alignment Forum) with a short benchmark result screenshot showing temporal accuracy scores
DM 10-15 authors of recent video-LLM papers on Twitter/X offering free license in exchange for a quote or benchmark comparison
Write one technical blog post: ‘How we built a 200-line CLI to benchmark video temporal reasoning’ — submit to Towards Data Science or The Gradient

Moat (or lack thereof)

No moat. Anyone with Python and an API key can replicate. Defensibility comes only from: being first result when people search the problem, accumulating community benchmark datasets, and iteration speed on feature requests. Treat it as a lead-gen tool for consulting, not a durable SaaS.