AI Pulse
← Projects · 1-week

Watermark Robustness Sandbox

An interactive web tool that lets you embed a token-level watermark into LLM output, then attack it with paraphrasing and synonym substitution to measure survival rate.

Difficulty: 1-week | Stack: Python, FastAPI, HuggingFace Transformers, Next.js, Tailwind CSS

Who this is for

Developers building attribution pipelines for synthetic content (news, legal drafts, code) who want empirical data on how robust a chosen watermarking scheme is before committing to it in production.

Build steps

  1. Implement two watermarking schemes: the classic Kirchenbauer green-list scheme and a simplified seed-pooling variant (inspired by WaterSearch) that spreads the signal across multiple token windows.
  2. Build a FastAPI backend with three endpoints: /generate (watermarked text), /detect (returns p-value and scheme confidence), and /attack (runs paraphrase via a small model + synonym swap and returns post-attack detectability).
  3. Create a Next.js UI with a split-pane: left shows watermarked text with highlighted ‘green-list’ tokens; right shows attack output with detectability score delta.
  4. Add a comparison table that benchmarks both schemes on: text quality (perplexity delta), detection AUC, and survival rate under three attack intensities.
  5. Write a one-page methodology note auto-generated as PDF from the run results, suitable for sharing with a compliance team.

Risks

  • Seed-pooling schemes require careful parameter tuning — a naive implementation may produce barely-detectable watermarks that look good in unit tests but fail on real diverse text.
  • Paraphrase attacks using a separate model introduce a confound: the attack model quality determines the ceiling, not just the watermark strength.
  • Perplexity as a quality metric can be gamed; you may need human eval or a reference-free metric to make quality claims credible.

Business Angle

A self-serve sandbox to benchmark LLM watermark robustness before you ship attribution infrastructure.

Customer: A solo ML engineer or technical founder at a 1–10 person startup building synthetic-content pipelines — think AI legal-brief generators, AI news wires, or AI code-review tools — who needs to pick a watermarking scheme and justify that choice to a client or investor before going to production.

Pricing: freemium — $600 MRR in 4 months (12 paying teams at $50/mo)

Full business breakdown →