Unlearning Provenance Probe
A CLI tool that stress-tests whether an unlearning method actually erased pretraining knowledge versus only SFT-injected facts.
Difficulty: weekend | Stack: Python, HuggingFace Transformers, datasets, rich (CLI)
Who this is for
ML engineers and safety researchers who need to audit unlearning runs before claiming a model has ‘forgotten’ something — especially teams responding to GDPR deletion requests or capability removal requirements.
Build steps
- Curate two small fact sets: ~200 Wikidata triplets likely in pretraining corpora (high-popularity entities) and ~200 facts injected via a quick LoRA fine-tune on a base model like Llama-3-8B.
- Implement a probing harness: for each fact, prompt the model in 3 ways (cloze, QA, paraphrase) and score recall with exact-match and token-probability checks.
- Run a baseline unlearning method (e.g., gradient ascent on target facts) and record per-fact recall before and after.
- Generate a provenance-split report: shows unlearning efficacy separately for pretraining-origin vs SFT-origin facts, plus a heatmap of which categories survive.
- Package as a CLI command:
probe-unlearn --model ./checkpoint --facts facts.jsonl --report html.
Risks
- Distinguishing ‘pretraining origin’ requires heuristics (popularity proxies, training-data membership signals) that may be noisy for mid-popularity facts.
- Gradient-ascent unlearning can cause catastrophic forgetting of unrelated knowledge, making results confounded — need a retention benchmark alongside the target facts.
- Small hobby compute budgets may limit you to <3B parameter models, and unlearning dynamics differ noticeably at scale.
Business Angle
Sell an auditable unlearning verification report to ML teams who need compliance evidence before shipping a 'forgotten' model.
Customer: A solo ML engineer at a 10-50 person AI startup who owns the model lifecycle and gets tagged in every GDPR deletion ticket — technically strong, no dedicated safety team, needs a paper trail fast.
Pricing: one-time — $800 in month 3 (mix of 6-8 one-time report purchases at $99–$149 each)
Full business breakdown →