SaaS tool for legal/research analysts to ask questions across whole documents using long-context local inference — no chunking, no hallucinated cross-references

Customer: Solo legal analyst or independent researcher (paralegal, PhD student, IP consultant) who processes 50-200 page PDFs daily and has been burned by RAG missing clauses or citations that span section boundaries

Problem: Chunked RAG loses reasoning across document sections — a clause on page 3 that modifies a term on page 47 gets missed; analysts catch errors manually, which defeats the tool

Pricing: one-time — $800 revenue in month 3 (8 x $99 perpetual licenses)

Why now

Mamba-2 hybrid GGUFs (Nemotron-Ultra class) now fit 128k+ context on 24GB VRAM consumer cards as of mid-2026 — whole-document ingestion became feasible without cloud costs, making a local-first no-chunking product viable for privacy-sensitive legal/research work for first time

Go-to-market

Post a side-by-side demo on r/MachineLearning and r/legaltech: same contract, chunked RAG misses a cross-reference, this tool catches it — link to GitHub with a waiting list form
Reach out to 20 independent paralegals and PhD students on Twitter/X who complain about RAG hallucinations — offer free license for a 15-min feedback call
Ship a free tier (documents up to 20 pages) on Hugging Face Spaces using Gradio — lets users experience whole-doc Q&A without setup friction, upsell to full desktop app
Write one specific teardown post: ‘How chunked RAG fails on NDAs and how to fix it’ — target Hacker News Show HN for launch

Moat (or lack thereof)

No moat. Any dev can wrap llama.cpp with long-context model in a weekend. Advantage is execution speed and distribution — first credible demo with a specific use-case narrative (legal contracts) wins mindshare before the space commoditizes in 6-12 months. Switch to open-core if traction appears.