Persistent Persona Chatbot with Compressed Session Memory
A FastAPI chatbot that summarizes each session into a codebook-quantized user profile, then retrieves and injects it on the next visit—keeping context costs flat regardless of history length.
Difficulty: 1-week | Stack: Python, FastAPI, Claude API (with prompt caching), sentence-transformers, scikit-learn, PostgreSQL, Redis
Who this is for
Product teams building customer-facing assistants that need consistent personalization across sessions without paying for a full history replay on every turn.
Build steps
- After each session ends, extract preference signals (topics, tone feedback, explicit corrections) and embed them with a sentence-transformer; store raw embeddings per user in PostgreSQL.
- Run a nightly or on-demand job that fits/updates a shared codebook (MiniBatchKMeans, k=128) across all users; assign each user’s embedding to its top-3 nearest centroids with weights.
- Store the quantized user code (3 centroid IDs + weights) in Redis with a TTL; on session start, decode back into a concise system-prompt block (‘User prefers bullet lists, avoids marketing language, domain: fintech’).
- Wire Claude API prompt caching on the system-prompt block so the compressed persona prefix is a cache hit on repeated calls—keeping per-turn cost low.
- Add a
/profileendpoint that shows the user their inferred persona and lets them edit centroid weights manually, creating a feedback loop to correct drift. - Evaluate: compare session-open latency and token cost vs. a naïve full-history-stuffing baseline over 50 synthetic users.
Risks
- Codebook staleness: centroids fitted on early users may not represent new user archetypes well—refit cadence needs tuning or new users get poor representations.
- Privacy risk: the decoded natural-language persona description stored in Redis is human-readable PII-adjacent content; needs encryption and access control from day one.
- Claude prompt caching requires exact prefix matches—any dynamic element (timestamp, session ID) in the system prompt before the persona block will bust the cache and eliminate the cost benefit.
Business Angle
Drop-in personalization memory layer for indie devs building Claude-powered support or onboarding bots
Customer: Solo developer or two-person team who shipped a Claude-backed customer support or onboarding chatbot and is watching their API bill grow because they're stuffing 10-turn histories into every prompt
Pricing: saas-mrr — $800 MRR in 4 months (16 customers at $49/mo)
Full business breakdown →