Watermark Robustness Sandbox
An interactive web tool that lets you embed a token-level watermark into LLM output, then attack it with paraphrasing and synonym substitution to measure survival rate.
Difficulty: 1-week | Stack: Python, FastAPI, HuggingFace Transformers, Next.js, Tailwind CSS
Who this is for
Developers building attribution pipelines for synthetic content (news, legal drafts, code) who want empirical data on how robust a chosen watermarking scheme is before committing to it in production.
Build steps
- Implement two watermarking schemes: the classic Kirchenbauer green-list scheme and a simplified seed-pooling variant (inspired by WaterSearch) that spreads the signal across multiple token windows.
- Build a FastAPI backend with three endpoints: /generate (watermarked text), /detect (returns p-value and scheme confidence), and /attack (runs paraphrase via a small model + synonym swap and returns post-attack detectability).
- Create a Next.js UI with a split-pane: left shows watermarked text with highlighted ‘green-list’ tokens; right shows attack output with detectability score delta.
- Add a comparison table that benchmarks both schemes on: text quality (perplexity delta), detection AUC, and survival rate under three attack intensities.
- Write a one-page methodology note auto-generated as PDF from the run results, suitable for sharing with a compliance team.
Risks
- Seed-pooling schemes require careful parameter tuning — a naive implementation may produce barely-detectable watermarks that look good in unit tests but fail on real diverse text.
- Paraphrase attacks using a separate model introduce a confound: the attack model quality determines the ceiling, not just the watermark strength.
- Perplexity as a quality metric can be gamed; you may need human eval or a reference-free metric to make quality claims credible.
Business Angle
A self-serve sandbox to benchmark LLM watermark robustness before you ship attribution infrastructure.
Customer: A solo ML engineer or technical founder at a 1–10 person startup building synthetic-content pipelines — think AI legal-brief generators, AI news wires, or AI code-review tools — who needs to pick a watermarking scheme and justify that choice to a client or investor before going to production.
Pricing: freemium — $600 MRR in 4 months (12 paying teams at $50/mo)
Full business breakdown →