VS Code extension that captures real coding sessions, collects counterfactual time estimates, and produces auditable rlog productivity scores for teams buying or selling AI tools

Customer: Head of Engineering at a 10-50 person startup who just signed (or is being pitched) an enterprise AI coding tool contract and needs ROI evidence that survives CFO scrutiny — not a CSAT survey

Problem: AI tool vendors show benchmark scores; buyers have no way to verify real productivity lift in their specific codebase and team. Anecdote and vibes don’t survive budget renewal. No lightweight, credible measurement layer exists between ‘trust us’ and hiring a research team.

Pricing: saas-mrr — $1,500 MRR in 4 months (5 teams × $300/mo per team of 10 devs)

Why now

Dollar-denominated and observational eval research is cresting in 2025-2026 — enterprises are starting to demand financial-guarantee-grade evidence from vendors, creating a window before vendors build this measurement layer themselves or before a big player (GitHub, Cursor) bundles it

Go-to-market

Cold DM 20 eng managers on LinkedIn who’ve posted about Copilot/Cursor ROI debates — offer free 30-day measurement pilot in exchange for a testimonial quote
Post the rlog methodology write-up on Hacker News (Show HN) and the AI eval Discord communities — position as ‘what financial-grade AI eval actually looks like in practice’
Partner with one AI coding tool vendor (smaller player, not GitHub) to white-label the dashboard as their own ROI proof tool — they pay per team they onboard
Publish a one-page ‘AI Productivity Audit’ PDF template targeting consultants who sell AI adoption engagements — they become referral channel

Moat (or lack thereof)

No meaningful moat. GitHub Copilot, Cursor, and any well-funded player can build session telemetry + counterfactual prompting trivially. Real defensibility comes only from: (1) a trusted, auditor-accepted methodology that gets cited in contracts, and (2) a calibration dataset of sessions across many codebases. Neither exists yet at indie scale. Early mover advantage is 12-18 months at best — use it to get methodology credibility, not to build features.