Cross-Problem Failure Memory for Coding Agents

Give a coding agent persistent retrieval of past failure traces so it avoids repeating mistakes across LeetCode-style problems

Difficulty: weekend | Stack: Python, LangGraph or bare asyncio, Claude claude-sonnet-4-5 or GPT-4o via API, Chroma (local vector DB), sentence-transformers

Who this is for

Developers building coding agents who want the agent to self-improve across a session without retraining

Build steps

Wrap a standard solve-and-verify loop (LLM generates code → run tests → capture error trace) around LeetCode-easy problems via their public API or local copies
On failure, embed the (problem description + error trace + attempted approach) tuple and upsert into Chroma
Before each new solve attempt, retrieve top-3 similar past failures; inject as ‘lessons learned’ into system prompt
Compare pass@1 and pass@3 with vs. without retrieval augmentation over 50-problem run
Log which retrieved memories actually helped (tag with LLM self-report) to evaluate retrieval relevance

Risks

Retrieval noise: irrelevant past failures may confuse the agent more than help — need similarity threshold tuning
Problem set is small enough that agent may overfit to specific error patterns rather than generalizing
LLM context window fills fast with injected failure traces on harder problems — need truncation strategy

Cross-Problem Failure Memory for Coding Agents

Who this is for

Build steps

Risks

Business Angle