AI Pulse
← Projects · 1-week

Trace-Level Agent Safety Monitor

A tool that records multi-step agent execution traces and runs heuristic + LLM-based checks to flag dangerous action sequences before they complete.

Difficulty: 1-week | Stack: Python, LangChain, SQLite, Claude API (claude-sonnet-4-5), Streamlit

Who this is for

Teams running autonomous agents on real user tasks who need visibility into whether an in-progress multi-step execution is heading toward a harmful outcome — catching it at step 3 rather than after step 10.

Build steps

  1. Define a trace schema: each agent action (tool call, URL visited, form field written, API called) is appended as a JSON event to a SQLite log with timestamp, action type, and payload hash.
  2. Implement a sliding-window analyzer that runs after every N actions: compute a ‘suspicion score’ using heuristics (e.g., visited domain age < 30 days, form contains PII field names, action count exceeds task-type baseline).
  3. When suspicion score crosses a threshold, send the last K trace events to Claude with a structured prompt asking it to classify the sequence as Safe / Suspicious / Halt and provide a one-sentence rationale — cache the prompt prefix to reduce cost.
  4. Expose a simple webhook that the agent framework calls after each action; the monitor responds with {continue: true/false, reason: string} so the agent can self-pause awaiting human review.
  5. Build a Streamlit dashboard showing live traces, per-run suspicion score timeline, and a review queue where a human can approve or terminate flagged runs.
  6. Test against a synthetic task library: 20 benign runs (booking a flight, filling a form) and 10 adversarial runs (scam site scenarios), and report precision/recall of the halt decisions.

Risks

  • LLM classification latency (1-3s per check) creates a bottleneck if the agent executes actions faster than the monitor can evaluate them, requiring async queuing that complicates the halt-signal flow.
  • Heuristic suspicion scores are brittle — legitimate financial sites also have PII fields and young domains, leading to high false-positive halt rates that erode developer trust in the tool.
  • Storing action payloads in SQLite for analysis may itself create a PII retention problem if the agent handles real user data, requiring an additional scrubbing step before persistence.

Business Angle

Execution trace monitor that catches dangerous agent actions mid-run, before damage is done

Customer: Solo developer or 2-person founding team shipping a B2B SaaS product where the core feature IS an autonomous agent (e.g., an AI SDR that books meetings, an AI ops agent that modifies cloud infra, an AI finance assistant that moves money) — they've had at least one 'oh shit' moment where the agent did something unexpected in production

Pricing: saas-mrr — $800 MRR in 3 months (8 paying teams at $99/mo)

Full business breakdown →