Trace-Level Agent Safety Monitor

A tool that records multi-step agent execution traces and runs heuristic + LLM-based checks to flag dangerous action sequences before they complete.

Difficulty: 1-week | Stack: Python, LangChain, SQLite, Claude API (claude-sonnet-4-5), Streamlit

Who this is for

Teams running autonomous agents on real user tasks who need visibility into whether an in-progress multi-step execution is heading toward a harmful outcome — catching it at step 3 rather than after step 10.

Build steps

Define a trace schema: each agent action (tool call, URL visited, form field written, API called) is appended as a JSON event to a SQLite log with timestamp, action type, and payload hash.
Implement a sliding-window analyzer that runs after every N actions: compute a ‘suspicion score’ using heuristics (e.g., visited domain age < 30 days, form contains PII field names, action count exceeds task-type baseline).
When suspicion score crosses a threshold, send the last K trace events to Claude with a structured prompt asking it to classify the sequence as Safe / Suspicious / Halt and provide a one-sentence rationale — cache the prompt prefix to reduce cost.
Expose a simple webhook that the agent framework calls after each action; the monitor responds with {continue: true/false, reason: string} so the agent can self-pause awaiting human review.
Build a Streamlit dashboard showing live traces, per-run suspicion score timeline, and a review queue where a human can approve or terminate flagged runs.
Test against a synthetic task library: 20 benign runs (booking a flight, filling a form) and 10 adversarial runs (scam site scenarios), and report precision/recall of the halt decisions.

Risks

LLM classification latency (1-3s per check) creates a bottleneck if the agent executes actions faster than the monitor can evaluate them, requiring async queuing that complicates the halt-signal flow.
Heuristic suspicion scores are brittle — legitimate financial sites also have PII fields and young domains, leading to high false-positive halt rates that erode developer trust in the tool.
Storing action payloads in SQLite for analysis may itself create a PII retention problem if the agent handles real user data, requiring an additional scrubbing step before persistence.

Trace-Level Agent Safety Monitor

Who this is for

Build steps

Risks

Business Angle