A $49 one-time Jupyter/Marimo notebook toolkit for mechanistic interpretability researchers to ablate attention heads and visualize negation accuracy in real time
Customer: PhD students and postdocs in ML interpretability labs (Anthropic, EleutherAI, independent researchers) who have read the ROME/MEMIT/negation papers and want to reproduce or extend findings without spending a week wiring up TransformerLens from scratch
Problem: Setting up a clean, interactive ablation sandbox with TransformerLens + live Plotly dashboards + the right NLI/negation eval datasets takes 2–3 days of glue code — time that should go to actual research, not infrastructure
Pricing: one-time — $400 in first 90 days (8–10 sales at $49)
Why now
The mechanistic interpretability wave (2024–2026) has created a large cohort of researchers actively replicating ‘internal representation ≠ output’ findings — negation is a canonical benchmark in this cluster, and demand for hands-on tooling is peaking exactly as labs like Anthropic publish hiring pushes for interp researchers
Go-to-market
- Post a free ‘teaser’ Colab notebook on the TransformerLens Discord and EleutherAI Slack reproducing one key negation head finding — link to the paid full toolkit in the README
- Write a 600-word LessWrong/Alignment Forum post titled ‘What I found when I ablated the negation heads in GPT-2 Small’ — embed interactive screenshots, sell the notebook at the bottom
- DM 15 first authors of mechanistic interp papers on Twitter/X offering a free copy in exchange for honest feedback and a retweet if they find it useful
- Submit to the ICLR/NeurIPS 2026 workshop demo tracks (Mechanistic Interpretability workshop) — free distribution to the exact persona
Moat (or lack thereof)
No real moat — TransformerLens is open source and anyone can write this notebook. The only defensibility is being first and having the clearest UX; expect to be cloned within months. Treat it as a credibility and audience-building asset rather than a durable revenue stream.