Compact User Models for Personalized LLM Generation: What CURP Gets Right
Personalizing LLM outputs at scale is expensive, and most existing approaches trade quality for efficiency in unsatisfying ways. CURP introduces codebook-based user representations that compress individual behavioral patterns into reusable embeddings, offering a more practical path to personalized generation without the overhead of per-user fine-tuning or bloated prompts.
Personalization in language model outputs has long been a tension between what is technically desirable and what is computationally tractable. Fine-tuning a model per user is prohibitively expensive. Stuffing user history into a prompt works up to a point, then hits context limits and inference cost ceilings. The question of how to represent a user compactly—without losing what makes them distinct—is the central engineering and research challenge in this space.
(CURP: Codebook-based Continuous User Representation for Personalized Generation with LLMs) takes a direct approach to this problem. The framework builds a bidirectional user encoder that maps a user’s preferences and behavioral patterns into a continuous representation, then quantizes that representation against a learned codebook. The codebook acts as a shared vocabulary of user archetypes: individual users are expressed as combinations of codebook entries rather than as entirely unique parameter spaces.
This matters for a few reasons. First, codebook quantization introduces a compression that makes user representations cacheable and reusable across inference calls. Rather than reconstructing a user profile from raw history at every generation step, you look up a compact code. Second, because the codebook is shared and learned jointly across users, it benefits from population-level signal—rare users with sparse histories can still be represented meaningfully by proximity to well-populated codebook entries. Third, the bidirectional encoder means the representation captures context in both directions through a user’s interaction history, rather than treating it as a left-to-right sequence.
The practical implications reach across a range of applications. Chatbots that need to maintain consistent tone and style preferences across sessions, recommendation systems generating personalized summaries or explanations, and content platforms adapting writing style to individual readers all face variants of the same core problem CURP addresses. The framework is agnostic to the downstream LLM, which means the user representation layer can sit in front of whatever generation backbone a system already uses.
What CURP does not fully resolve is the cold-start problem—new users with no behavioral history remain difficult to represent with confidence—and the paper does not deeply engage with the question of representation drift over time as user preferences evolve. These are known hard problems in user modeling generally, not unique failures of this approach.
One adjacent consideration worth noting: the semantic structure of what gets captured in a user representation depends heavily on how behavioral signals are encoded before they reach the codebook. Work on topic modeling, such as (Disentangling Similarity and Relatedness in Topic Models), highlights that different modeling choices capture fundamentally different types of semantic relationships—thematic relatedness versus taxonomic similarity, for instance. A user who reads articles about dogs and bones is different from one who reads about dogs and wolves, even if surface co-occurrence patterns look similar. User modeling pipelines that feed into systems like CURP inherit these upstream distinctions, and the quality of the final personalization depends on whether the input representations are capturing the right kind of semantic structure.
The broader trajectory here is toward modular personalization: user state as a portable, compact artifact that can be computed once, stored cheaply, and injected into generation without restructuring the underlying model. CURP is a credible step in that direction. The codebook approach is not novel in representation learning generally—vector quantization has a long history—but applying it systematically to user modeling for LLM generation, with the bidirectional encoding architecture, gives the framework a coherent identity rather than feeling like an assembly of existing parts.
For teams building production personalization systems, the efficiency argument is the most immediate one. Prompt-based personalization at scale gets expensive fast, and training-based methods require infrastructure most organizations cannot sustain per-user. A codebook representation that amortizes user modeling cost across a shared structure is a pragmatic middle path worth taking seriously.