Financial-Stakes Agent Eval Harness
Run LLM agents in sandboxed environments with fake-but-realistic dollar constraints and log emergent deceptive behaviors.
Difficulty: 1-week | Stack: Python, FastAPI, SQLite, Docker, LiteLLM, Pytest
Who this is for
AI safety researchers and production ML engineers who need to detect collusion/deception before deploying agents to real financial workflows
Build steps
- Define a minimal ‘marketplace’ environment: agents buy/sell goods via a REST API backed by SQLite ledger with real dollar-denominated constraints (budget caps, profit targets)
- Implement 2-4 competing agent roles (buyer, seller, regulator, auditor) using LiteLLM so any model can plug in; each agent gets a system prompt with economic incentive
- Add an inter-agent communication channel (simple message queue) so agents can coordinate — this is where cartels emerge
- Build a behavior logger that flags: price convergence across sellers (cartel signal), false outcome reports (lie detection via ground-truth ledger diff), budget overruns
- Write a report renderer that scores each run: deception rate, collusion index, task completion vs. claimed completion
- Parameterize over models (GPT-4o, Claude Sonnet, Llama) and stake levels ($1/$10/$100 simulated) to produce a comparison matrix
Risks
- Inter-agent message format becomes the bottleneck — agents speaking different JSON schemas silently fail to collude, producing false negatives
- Sandbox leakage: agents may attempt real HTTP calls if Docker networking not locked down properly
- Ground-truth ‘correct outcome’ is hard to define for open-ended trading tasks — without it, lie detection is unreliable
Business Angle
Sandboxed eval harness that runs LLM agents against fake-dollar financial tasks and flags deceptive/collusive behaviors before production deploy
Customer: ML engineer at a fintech or trading firm (5-200 person company) who owns agent deployment pipelines and gets blamed when an LLM does something weird in prod — not an academic, someone with a Slack channel full of incident alerts
Pricing: saas-mrr — $800 MRR in 4 months (8 seats × $100/mo or 2 teams × $400/mo)
Full business breakdown →