Persistent Persona Chatbot with Compressed Session Memory

A FastAPI chatbot that summarizes each session into a codebook-quantized user profile, then retrieves and injects it on the next visit—keeping context costs flat regardless of history length.

Difficulty: 1-week | Stack: Python, FastAPI, Claude API (with prompt caching), sentence-transformers, scikit-learn, PostgreSQL, Redis

Who this is for

Product teams building customer-facing assistants that need consistent personalization across sessions without paying for a full history replay on every turn.

Build steps

After each session ends, extract preference signals (topics, tone feedback, explicit corrections) and embed them with a sentence-transformer; store raw embeddings per user in PostgreSQL.
Run a nightly or on-demand job that fits/updates a shared codebook (MiniBatchKMeans, k=128) across all users; assign each user’s embedding to its top-3 nearest centroids with weights.
Store the quantized user code (3 centroid IDs + weights) in Redis with a TTL; on session start, decode back into a concise system-prompt block (‘User prefers bullet lists, avoids marketing language, domain: fintech’).
Wire Claude API prompt caching on the system-prompt block so the compressed persona prefix is a cache hit on repeated calls—keeping per-turn cost low.
Add a /profile endpoint that shows the user their inferred persona and lets them edit centroid weights manually, creating a feedback loop to correct drift.
Evaluate: compare session-open latency and token cost vs. a naïve full-history-stuffing baseline over 50 synthetic users.

Risks

Codebook staleness: centroids fitted on early users may not represent new user archetypes well—refit cadence needs tuning or new users get poor representations.
Privacy risk: the decoded natural-language persona description stored in Redis is human-readable PII-adjacent content; needs encryption and access control from day one.
Claude prompt caching requires exact prefix matches—any dynamic element (timestamp, session ID) in the system prompt before the persona block will bust the cache and eliminate the cost benefit.

Persistent Persona Chatbot with Compressed Session Memory

Who this is for

Build steps

Risks

Business Angle