Agent Behavior Pattern Library (ADRA-Bank Clone)

A personal catalogue of recorded agent trajectories—tagged by failure mode—that you can replay, diff, and query to understand why an agent regressed between versions.

Difficulty: 1-week | Stack: Python, FastAPI, SQLite + SQLAlchemy, Pydantic, Next.js (minimal UI), OpenTelemetry-style trace format

Who this is for

Developer teams iterating rapidly on a production agent who need to answer ‘did this prompt change make the tool-selection worse?’ without re-running a full benchmark suite from scratch.

Build steps

Define a canonical trace schema in Pydantic: each trace captures agent_version, task_id, step list (observation → reasoning → action → result), final_outcome, and a free-text failure_tag (e.g., ‘wrong_tool_order’, ‘hallucinated_api_call’, ‘early_stop’).
Write a thin logging decorator that wraps any LangChain/LangGraph or raw SDK agent loop and serialises traces to SQLite automatically.
Build a FastAPI backend with four endpoints: POST /traces (ingest), GET /traces?tag=&version= (filter), GET /traces/{id}/diff/{id2} (step-level diff between two runs), and GET /stats (per-tag counts by version).
Create a minimal Next.js UI with a trace explorer: a filterable list on the left, a step-by-step timeline on the right, and a two-pane diff view when two traces are selected.
Add a CLI command python bank.py regress --from v1 --to v2 --tag wrong_tool_order that prints whether the failure rate for a given tag went up or down between versions.

Risks

Trace schemas ossify quickly—if you hard-code the action format, adding a new tool type later requires a painful migration; use a JSON blob column for the step payload from day one.
The diff view is only useful if task IDs are stable across versions; if you don’t fix the random seed or task sampling, the same ‘task’ will be different runs and the diff is meaningless.
SQLite write locks become a bottleneck if you run parallel agent evaluations that all write traces simultaneously; switch to WAL mode (PRAGMA journal_mode=WAL) or queue writes through a single worker.

Agent Behavior Pattern Library (ADRA-Bank Clone)

Who this is for

Build steps

Risks

Business Angle