Verifiability Scorer for Personal Task Lists

A CLI tool that analyzes your to-do list and scores each task by how automatable it is using Verifier’s Law heuristics.

Difficulty: weekend | Stack: Python, Typer, Rich, Claude API (claude-sonnet-4-5), JSON

Who this is for

Knowledge workers and indie hackers who want to prioritize which tasks to automate first — it surfaces the low-hanging fruit by quantifying verifiability and description-execution gap for each item.

Build steps

Define a scoring rubric: map Verifier’s Law properties (objective truth, fast verification, low noise, continuous reward) and description-execution gap onto a structured JSON schema with 0–5 scales per dimension.
Build a Typer CLI that accepts a plain-text or markdown task list as input and streams each task through Claude API with the rubric as a structured output schema.
Aggregate scores into a composite ‘automation-readiness’ index and render a Rich table sorted by score, with a one-sentence explanation per task.
Add a --export flag that writes results to a CSV so users can track changes over time as they refine or complete tasks.
Write a small test suite using a fixed 10-task fixture list to lock in score stability across prompt tweaks.

Risks

Claude’s scoring of subjective tasks may be inconsistent across runs — mitigate by lowering temperature to 0 and pinning the model version.
Users with very large task lists (200+ items) will hit latency and cost walls — add batching with progress display early.
The rubric conflates two distinct frameworks (Verifier’s Law + description-execution gap); keeping them as separate subscores rather than blending them too early avoids a misleading single number.