AI Pulse
← Projects · weekend

Cross-Problem Failure Memory for Coding Agents

Give a coding agent persistent retrieval of past failure traces so it avoids repeating mistakes across LeetCode-style problems

Difficulty: weekend | Stack: Python, LangGraph or bare asyncio, Claude claude-sonnet-4-5 or GPT-4o via API, Chroma (local vector DB), sentence-transformers

Who this is for

Developers building coding agents who want the agent to self-improve across a session without retraining

Build steps

  1. Wrap a standard solve-and-verify loop (LLM generates code → run tests → capture error trace) around LeetCode-easy problems via their public API or local copies
  2. On failure, embed the (problem description + error trace + attempted approach) tuple and upsert into Chroma
  3. Before each new solve attempt, retrieve top-3 similar past failures; inject as ‘lessons learned’ into system prompt
  4. Compare pass@1 and pass@3 with vs. without retrieval augmentation over 50-problem run
  5. Log which retrieved memories actually helped (tag with LLM self-report) to evaluate retrieval relevance

Risks

  • Retrieval noise: irrelevant past failures may confuse the agent more than help — need similarity threshold tuning
  • Problem set is small enough that agent may overfit to specific error patterns rather than generalizing
  • LLM context window fills fast with injected failure traces on harder problems — need truncation strategy

Business Angle

Plug-in failure memory layer for coding agents that surfaces past mistake traces before each new attempt

Customer: Solo dev or indie hacker building a LeetCode-style coding agent (Python, LangGraph/asyncio) who's demoing it to employers or selling it as a study tool — tired of watching their agent repeat the same off-by-one errors across problems

Pricing: one-time — $800 in first 60 days (40 sales at $19)

Full business breakdown →