AI Claim Veracity Auditor
A CLI tool that takes a forwarded AI success/failure story URL and returns a structured evidentiary scorecard — sourcing quality, baseline presence, metric specificity — to fight Slack-channel narrative drift.
Difficulty: weekend | Stack: Python, Click, Claude API (claude-sonnet-4-5), Jinja2, Rich
Who this is for
Enterprise decision-makers, CTOs, and strategy teams who receive forwarded AI ‘cautionary tales’ or hype pieces and need a fast, structured reality check before citing them in board decks.
Build steps
- Build a CLI with Click that accepts a URL and fetches the article text (httpx + BeautifulSoup for extraction, fallback to reader-mode heuristics).
- Design a structured prompt for Claude that extracts: primary claim, named organization, quantitative metrics present (yes/no), baseline comparison present (yes/no), primary source vs. second-hand attribution, and any conflicting evidence mentioned.
- Parse the model’s JSON output into a Rich-rendered terminal scorecard with red/yellow/green indicators per evidentiary dimension.
- Add a ‘—compare’ flag that accepts a second URL (e.g., a rebuttal or follow-up piece) and produces a side-by-side diff of the two scorecards.
- Cache results to a local SQLite DB so repeated queries on the same URL are instant and you can build a personal library of audited claims.
Risks
- Paywalled articles return near-empty text, making the audit useless for the exact high-signal sources (WSJ, FT) where enterprise narratives often originate.
- Claude can hallucinate ‘missing baselines’ or misclassify second-hand sourcing in nuanced long-form journalism, reducing trust in the scorecard itself.
- The tool is only as good as the prompt taxonomy — if the evidentiary dimensions don’t match what enterprise audiences actually care about, adoption will be low even if the tech works.