Verifiability Scorer for Personal Task Lists
A CLI tool that analyzes your to-do list and scores each task by how automatable it is using Verifier’s Law heuristics.
Difficulty: weekend | Stack: Python, Typer, Rich, Claude API (claude-sonnet-4-5), JSON
Who this is for
Knowledge workers and indie hackers who want to prioritize which tasks to automate first — it surfaces the low-hanging fruit by quantifying verifiability and description-execution gap for each item.
Build steps
- Define a scoring rubric: map Verifier’s Law properties (objective truth, fast verification, low noise, continuous reward) and description-execution gap onto a structured JSON schema with 0–5 scales per dimension.
- Build a Typer CLI that accepts a plain-text or markdown task list as input and streams each task through Claude API with the rubric as a structured output schema.
- Aggregate scores into a composite ‘automation-readiness’ index and render a Rich table sorted by score, with a one-sentence explanation per task.
- Add a
--exportflag that writes results to a CSV so users can track changes over time as they refine or complete tasks. - Write a small test suite using a fixed 10-task fixture list to lock in score stability across prompt tweaks.
Risks
- Claude’s scoring of subjective tasks may be inconsistent across runs — mitigate by lowering temperature to 0 and pinning the model version.
- Users with very large task lists (200+ items) will hit latency and cost walls — add batching with progress display early.
- The rubric conflates two distinct frameworks (Verifier’s Law + description-execution gap); keeping them as separate subscores rather than blending them too early avoids a misleading single number.