A $29/mo FastAPI microservice that auto-enhances scanned documents before OCR so developers stop babysitting image quality issues in their ingestion pipelines
Customer: Solo dev or small-team backend engineer at a 5-50 person SaaS company who built a document ingestion pipeline (expense reports, invoices, contracts) using GPT-4o or similar, and is getting ~80-85% extraction accuracy because scanned inputs are skewed, low-contrast, or poorly lit — not because their prompt is wrong
Problem: Document ingestion pipelines regularly fail silently on bad scans: skewed receipts, faint thermal paper, mixed orientations. Developers currently hand-tune preprocessing per client or just accept the error rate. There’s no drop-in fix that’s smarter than a static pipeline but cheaper than hiring a CV specialist.
Pricing: saas-mrr — $800 MRR in 4 months (targeting ~28 paying customers at $29/mo)
Why now
GPT-4o-mini dropped the cost of VLM confidence probing to near-zero (~$0.001/image), making the ‘try N transforms, pick the best’ loop economically viable for the first time. Recent multimodal research is surfacing exactly why pixel-rendered text degrades VLM accuracy — creating developer awareness of the problem and appetite for a targeted fix, not a full model swap.
Go-to-market
- Post a before/after benchmark on Reddit r/LangChain and r/MachineLearning showing OCR accuracy lift on 50 real receipts — no signup required, just the Docker run command in the post
- Open the GitHub repo with a working demo (self-hostable, MIT license) and add a ‘managed hosted version’ CTA in the README — attract developers who want it running in 5 minutes without infra
- Find 10 indie hackers building expense/invoice tools on Product Hunt, Indie Hackers, or Twitter and DM them a free 30-day trial API key with a ‘tell me if it breaks your pipeline’ ask — direct feedback loop, not a sales pitch
- Write one specific SEO post titled ‘Why GPT-4o fails on thermal paper receipts (and how to fix it before the API call)’ — targets the exact frustrated developer search query, links to the hosted tier
Moat (or lack thereof)
No real moat. The core idea is simple enough to replicate in a weekend once someone reads the README. Defensibility is purely execution speed and integration stickiness — if it’s already wired into their pipeline and passing prod traffic, switching cost is real but low. The VLM confidence probe approach could be copied by any competitor. Honest bet: this is a lifestyle business or acqui-hire target, not a platform.