Cross-Problem Failure Memory for Coding Agents
Give a coding agent persistent retrieval of past failure traces so it avoids repeating mistakes across LeetCode-style problems
Difficulty: weekend | Stack: Python, LangGraph or bare asyncio, Claude claude-sonnet-4-5 or GPT-4o via API, Chroma (local vector DB), sentence-transformers
Who this is for
Developers building coding agents who want the agent to self-improve across a session without retraining
Build steps
- Wrap a standard solve-and-verify loop (LLM generates code → run tests → capture error trace) around LeetCode-easy problems via their public API or local copies
- On failure, embed the (problem description + error trace + attempted approach) tuple and upsert into Chroma
- Before each new solve attempt, retrieve top-3 similar past failures; inject as ‘lessons learned’ into system prompt
- Compare pass@1 and pass@3 with vs. without retrieval augmentation over 50-problem run
- Log which retrieved memories actually helped (tag with LLM self-report) to evaluate retrieval relevance
Risks
- Retrieval noise: irrelevant past failures may confuse the agent more than help — need similarity threshold tuning
- Problem set is small enough that agent may overfit to specific error patterns rather than generalizing
- LLM context window fills fast with injected failure traces on harder problems — need truncation strategy
Business Angle
Plug-in failure memory layer for coding agents that surfaces past mistake traces before each new attempt
Customer: Solo dev or indie hacker building a LeetCode-style coding agent (Python, LangGraph/asyncio) who's demoing it to employers or selling it as a study tool — tired of watching their agent repeat the same off-by-one errors across problems
Pricing: one-time — $800 in first 60 days (40 sales at $19)
Full business breakdown →