AI Pulse
← Projects · weekend

Unlearning Provenance Probe

A CLI tool that stress-tests whether an unlearning method actually erased pretraining knowledge versus only SFT-injected facts.

Difficulty: weekend | Stack: Python, HuggingFace Transformers, datasets, rich (CLI)

Who this is for

ML engineers and safety researchers who need to audit unlearning runs before claiming a model has ‘forgotten’ something — especially teams responding to GDPR deletion requests or capability removal requirements.

Build steps

  1. Curate two small fact sets: ~200 Wikidata triplets likely in pretraining corpora (high-popularity entities) and ~200 facts injected via a quick LoRA fine-tune on a base model like Llama-3-8B.
  2. Implement a probing harness: for each fact, prompt the model in 3 ways (cloze, QA, paraphrase) and score recall with exact-match and token-probability checks.
  3. Run a baseline unlearning method (e.g., gradient ascent on target facts) and record per-fact recall before and after.
  4. Generate a provenance-split report: shows unlearning efficacy separately for pretraining-origin vs SFT-origin facts, plus a heatmap of which categories survive.
  5. Package as a CLI command: probe-unlearn --model ./checkpoint --facts facts.jsonl --report html.

Risks

  • Distinguishing ‘pretraining origin’ requires heuristics (popularity proxies, training-data membership signals) that may be noisy for mid-popularity facts.
  • Gradient-ascent unlearning can cause catastrophic forgetting of unrelated knowledge, making results confounded — need a retention benchmark alongside the target facts.
  • Small hobby compute budgets may limit you to <3B parameter models, and unlearning dynamics differ noticeably at scale.

Business Angle

Sell an auditable unlearning verification report to ML teams who need compliance evidence before shipping a 'forgotten' model.

Customer: A solo ML engineer at a 10-50 person AI startup who owns the model lifecycle and gets tagged in every GDPR deletion ticket — technically strong, no dedicated safety team, needs a paper trail fast.

Pricing: one-time — $800 in month 3 (mix of 6-8 one-time report purchases at $99–$149 each)

Full business breakdown →