Long-Context Local RAG Without Chunking
Document Q&A system that exploits Mamba-2 hybrid model’s long-context efficiency to ingest whole files instead of splitting them.
Difficulty: 1-week | Stack: Python, llama.cpp (GGUF) or transformers, Nemotron-Ultra or any Mamba-2 hybrid GGUF, LangChain or raw inference loop, SQLite for doc metadata, Gradio for UI
Who this is for
Researchers and analysts who lose precision from naive chunking — legal contracts, research papers, codebases where cross-section reasoning matters.
Build steps
- Stand up local model inference: pull a Mamba-2 hybrid GGUF, confirm 32k+ context window fits in available VRAM/RAM via llama.cpp —ctx-size flag
- Build an ingestion pipeline: PDF/MD/TXT → plain text, strip boilerplate, store in SQLite with file hash for dedup
- Implement whole-document prompting: stuff entire document + question into context; measure latency and answer quality on 5 test docs
- Add a fallback: if doc exceeds model context, fall back to sliding-window with overlap and flag the answer as partial
- Build Gradio UI: file upload, question box, answer + source snippet display, latency counter
- Benchmark vs. naive chunk-and-embed approach on 3 multi-section documents to quantify accuracy delta
Risks
- Consumer VRAM (8–16 GB) may still overflow on very long docs even with quantized GGUF — need aggressive Q4/Q5 quant and context tuning
- Mamba-2 hybrid models have uneven llama.cpp support; some architectures need nightly builds or custom backends
- Whole-document prompting is slower than vector retrieval — users with >50 docs will hit patience limits without async streaming
Business Angle
SaaS tool for legal/research analysts to ask questions across whole documents using long-context local inference — no chunking, no hallucinated cross-references
Customer: Solo legal analyst or independent researcher (paralegal, PhD student, IP consultant) who processes 50-200 page PDFs daily and has been burned by RAG missing clauses or citations that span section boundaries
Pricing: one-time — $800 revenue in month 3 (8 x $99 perpetual licenses)
Full business breakdown →