A self-serve sandbox to benchmark LLM watermark robustness before you ship attribution infrastructure.

Customer: A solo ML engineer or technical founder at a 1–10 person startup building synthetic-content pipelines — think AI legal-brief generators, AI news wires, or AI code-review tools — who needs to pick a watermarking scheme and justify that choice to a client or investor before going to production.

Problem: Watermarking research papers report survival rates under controlled lab conditions. A practitioner has no easy way to test their specific model + scheme + content-type combination against realistic attacks (paraphrase APIs, synonym swaps, sentence reordering) without writing 300 lines of eval scaffolding from scratch.

Pricing: freemium — $600 MRR in 4 months (12 paying teams at $50/mo)

Why now

The 2025–2026 wave of stealthier watermarking schemes (KGW, Kirchenbauer variants, semantic watermarks) has created a fragmented landscape — practitioners can’t compare them apples-to-apples. Simultaneously, EU AI Act and emerging US synthetic-content disclosure rules are pushing teams to make defensible attribution decisions fast. The research is ahead of the tooling.

Go-to-market

Post a free hosted version on Hacker News ‘Show HN’ with a live demo using a GPT-2 baseline — target the AI safety / governance crowd who already read these papers and will recognize the problem immediately.
Write one very specific teardown post (‘We tested 4 watermarking schemes against GPT-4 paraphrasing — here’s what survived’) and publish on Substack + cross-post to r/MachineLearning and the Alignment Forum.
DM 20 founders in AI legal-tech and AI journalism (find them on LinkedIn via job title ‘AI content’ + company size <20) — offer a free 30-min call to run their specific model/content-type through the sandbox in exchange for a testimonial.
Gate the attack-intensity controls (adversarial paraphrasing via a real LLM, not just synonym swap) behind a $50/mo plan — free tier only runs lightweight NLTK synonym substitution so the upsell is obvious and honest.

Moat (or lack thereof)

No real moat. The core eval logic is open research code and a determined engineer can replicate it in a weekend. The defensible edge, if any, comes from being the first indexed resource when someone Googles ‘watermark robustness benchmark tool’ and from accumulated benchmark datasets across content types. That’s an SEO and network-effects play, not a technical moat — be honest about this with yourself and don’t over-invest before validating that practitioners will actually pay rather than just DIY.