Catch hallucinated or self-contradictory LLM outputs over your knowledge graph before they silently corrupt your RAG pipeline.
Customer: A solo ML engineer or backend dev at a 5–20 person startup who owns a RAG pipeline over a structured schema (e.g., a product catalog graph, a medical ontology, or a financial entity DB) and has been burned at least once by a hallucinated relationship or contradictory claim sneaking into a production response.
Problem: LLM outputs over structured data routinely violate the graph’s own constraints — claiming A→B and B→A are both true, asserting an entity has two mutually exclusive types, or fabricating edges that don’t exist. Standard output validation (regex, JSON schema) doesn’t catch logical contradictions; the dev only finds out when a downstream user reports nonsense.
Pricing: open-core — $800 MRR in 4 months (16 paying teams at $49/mo for the hosted API wrapper + violation report storage; CLI stays free/OSS)
Why now
The cluster’s research explicitly surfaces that LLMs still fail basic rule-induction and logical consistency checks — and enterprises are now deploying RAG over proprietary KGs at scale. The gap between ‘RAG is in prod’ and ‘RAG is trustworthy over structured data’ is widening right now, making a drop-in consistency checker timely rather than academic.
Go-to-market
- Post the OSS CLI to Hacker News ‘Show HN’ with a concrete demo: feed a product ontology, ask GPT-4o a contradictory question, show Z3 catching the violation in <2s. Target the ‘I’ve been burned by this’ crowd in comments.
- Find 10 developers actively discussing RAG + knowledge graphs on the Hugging Face Discord, LlamaIndex Slack, and r/MachineLearning — DM them the GitHub link and ask for 15-min feedback calls. Convert 2–3 into beta testers.
- Write one narrow SEO post: ‘How to catch hallucinated relationships in LLM outputs over Neo4j’ — targets a real, searchable pain point with zero competition. Link to the tool.
- Offer a $0 ‘consistency audit’ to one RAG startup (found via LinkedIn or YC company list) — run their schema through the tool, send them a one-page violation report. Use it as a case study / testimonial.
Moat (or lack thereof)
No real moat. The Z3 + NetworkX combo is reproducible in a weekend by any competent Python dev, and LangChain/LlamaIndex could ship a similar guard in a sprint. The advantage is being first to a specific niche (RAG over KGs), having a polished OSS CLI that gets GitHub stars, and building a reputation before the big players notice the gap. Defensibility is community + integrations, not technology.