Vertical AI PMF Benchmark Builder

A lightweight survey + analytics app that lets teams inside a specific industry vertical (e.g., health AI) self-report AI deployment metrics, then anonymously benchmarks them against peers to expose where PMF is real vs. aspirational.

Difficulty: 1-month | Stack: Next.js, Supabase, Postgres, Vercel, Resend, Recharts, Claude API

Who this is for

Heads of AI at mid-market enterprises in a target vertical who need peer benchmarks (not vendor-produced case studies) to calibrate their own adoption pace and justify or challenge internal narratives.

Build steps

Design a 12–15 question survey covering: AI use case category, deployment stage (pilot/production/scaled), measurable outcome metrics (cost, time, error rate), renewal intent, and budget trajectory — all anonymized at org level.
Build a Next.js + Supabase app with email-gated survey submission (Resend for magic-link auth) and row-level security so respondents only see aggregate data.
Implement a Postgres-backed aggregation layer that computes percentile distributions per metric per vertical segment (company size, sub-vertical) with a minimum anonymity threshold (n≥5) before surfacing any cohort.
Add a Claude-powered ‘benchmark interpreter’ endpoint: given a respondent’s own answers and their cohort’s distribution, generate a 3-bullet narrative summary of where they lead, lag, and what the data suggests they do next.
Build a Recharts dashboard with filterable cohort views, downloadable CSV exports, and a shareable ‘your position’ snapshot card (PNG via html-to-image) for internal presentations.
Run a closed beta with 20–30 orgs in one vertical to validate that the metric taxonomy resonates before opening broadly.

Risks

Cold-start problem: benchmark data is worthless until you have enough respondents — acquiring the first 20 design partners in a specific vertical requires non-trivial outreach that isn’t a coding problem.
Respondents self-report optimistically or game the survey when they realize their answers determine their peer ranking, corrupting the benchmark over time.
Anonymization at small cohort sizes is hard — even with n≥5 guards, a respondent in a niche sub-vertical may be trivially re-identifiable from the combination of size, use case, and outcome, creating legal and trust risk.