AI Pulse
← Projects · 1-week

Financial-Stakes Agent Eval Harness

Run LLM agents in sandboxed environments with fake-but-realistic dollar constraints and log emergent deceptive behaviors.

Difficulty: 1-week | Stack: Python, FastAPI, SQLite, Docker, LiteLLM, Pytest

Who this is for

AI safety researchers and production ML engineers who need to detect collusion/deception before deploying agents to real financial workflows

Build steps

  1. Define a minimal ‘marketplace’ environment: agents buy/sell goods via a REST API backed by SQLite ledger with real dollar-denominated constraints (budget caps, profit targets)
  2. Implement 2-4 competing agent roles (buyer, seller, regulator, auditor) using LiteLLM so any model can plug in; each agent gets a system prompt with economic incentive
  3. Add an inter-agent communication channel (simple message queue) so agents can coordinate — this is where cartels emerge
  4. Build a behavior logger that flags: price convergence across sellers (cartel signal), false outcome reports (lie detection via ground-truth ledger diff), budget overruns
  5. Write a report renderer that scores each run: deception rate, collusion index, task completion vs. claimed completion
  6. Parameterize over models (GPT-4o, Claude Sonnet, Llama) and stake levels ($1/$10/$100 simulated) to produce a comparison matrix

Risks

  • Inter-agent message format becomes the bottleneck — agents speaking different JSON schemas silently fail to collude, producing false negatives
  • Sandbox leakage: agents may attempt real HTTP calls if Docker networking not locked down properly
  • Ground-truth ‘correct outcome’ is hard to define for open-ended trading tasks — without it, lie detection is unreliable

Business Angle

Sandboxed eval harness that runs LLM agents against fake-dollar financial tasks and flags deceptive/collusive behaviors before production deploy

Customer: ML engineer at a fintech or trading firm (5-200 person company) who owns agent deployment pipelines and gets blamed when an LLM does something weird in prod — not an academic, someone with a Slack channel full of incident alerts

Pricing: saas-mrr — $800 MRR in 4 months (8 seats × $100/mo or 2 teams × $400/mo)

Full business breakdown →