AI Pulse
← Projects · 1-week

Long-Context Local RAG Without Chunking

Document Q&A system that exploits Mamba-2 hybrid model’s long-context efficiency to ingest whole files instead of splitting them.

Difficulty: 1-week | Stack: Python, llama.cpp (GGUF) or transformers, Nemotron-Ultra or any Mamba-2 hybrid GGUF, LangChain or raw inference loop, SQLite for doc metadata, Gradio for UI

Who this is for

Researchers and analysts who lose precision from naive chunking — legal contracts, research papers, codebases where cross-section reasoning matters.

Build steps

  1. Stand up local model inference: pull a Mamba-2 hybrid GGUF, confirm 32k+ context window fits in available VRAM/RAM via llama.cpp —ctx-size flag
  2. Build an ingestion pipeline: PDF/MD/TXT → plain text, strip boilerplate, store in SQLite with file hash for dedup
  3. Implement whole-document prompting: stuff entire document + question into context; measure latency and answer quality on 5 test docs
  4. Add a fallback: if doc exceeds model context, fall back to sliding-window with overlap and flag the answer as partial
  5. Build Gradio UI: file upload, question box, answer + source snippet display, latency counter
  6. Benchmark vs. naive chunk-and-embed approach on 3 multi-section documents to quantify accuracy delta

Risks

  • Consumer VRAM (8–16 GB) may still overflow on very long docs even with quantized GGUF — need aggressive Q4/Q5 quant and context tuning
  • Mamba-2 hybrid models have uneven llama.cpp support; some architectures need nightly builds or custom backends
  • Whole-document prompting is slower than vector retrieval — users with >50 docs will hit patience limits without async streaming

Business Angle

SaaS tool for legal/research analysts to ask questions across whole documents using long-context local inference — no chunking, no hallucinated cross-references

Customer: Solo legal analyst or independent researcher (paralegal, PhD student, IP consultant) who processes 50-200 page PDFs daily and has been burned by RAG missing clauses or citations that span section boundaries

Pricing: one-time — $800 revenue in month 3 (8 x $99 perpetual licenses)

Full business breakdown →