Plug-in personalization API that injects user context into LLM calls across all your products without rebuilding user modeling from scratch.

Customer: A solo platform engineer or early CTO at a 5–20 person AI-native startup running 2–4 LLM-powered products (e.g., a writing assistant, a support bot, and a search UI) who is tired of copy-pasting ad-hoc user context logic across codebases and has no dedicated ML team.

Problem: Every new LLM feature they ship requires them to hand-roll user context retrieval, decide what to include in the system prompt, and figure out caching — all over again, per product. There’s no shared layer, so personalization is either skipped entirely or inconsistent across their suite.

Pricing: saas-mrr — $800 MRR in 4 months (8 paying tenants at $99/mo on a 50k API-call/month plan)

Why now

The CURP-style insight — compact codebook embeddings reuse behavioral patterns without per-user fine-tuning — is exactly the architectural unlock that makes a lightweight sidecar credible. A year ago you’d need GPU infra; sentence-transformers + pgvector makes this deployable as a cheap Docker service. Simultaneously, AI startups are now on their second or third LLM product and feeling the duplication pain acutely.

Go-to-market

Post a ‘Show HN’ with a Docker Compose one-liner: spin up the sidecar, point it at your Postgres, and get a working /personalize endpoint in under 10 minutes — make the demo show before/after system prompts for a fake e-commerce support bot.
DM 20 founders in the Latent Space or Lenny’s Slack #ai-products channels who have publicly mentioned shipping multiple LLM features; offer free setup on their staging environment in exchange for a 30-min call and a testimonial.
Publish one deeply technical blog post on ‘Why your multi-product LLM stack needs a shared user-modeling layer’ on Substack/Towards Data Science, linking to the open GitHub repo — use it to collect waitlist emails for the hosted version.
Launch the hosted (managed) tier on a simple Stripe checkout page before you have all the features done; use the first 3 paying customers’ feedback to decide which tenant controls (embedding refresh rate, prompt injection templates, PII masking) to build next.

Moat (or lack thereof)

No real moat. The core tech — pgvector embeddings + FastAPI — is standard and reproducible in a weekend by any competent engineer. Defensibility, if it comes, will be from accumulated per-tenant behavioral data making the embeddings more accurate over time (data flywheel) and switching costs once it’s wired into multiple products. Realistically, this lives and dies by distribution and developer experience, not technology. Ship fast, be the easiest thing to integrate, and win on support quality at the indie scale.