Thematic Relatedness vs. Taxonomic Similarity: What Topic Models Actually Learn
A new study formalizes a long-overlooked distinction in topic modeling: thematic relatedness (dog/bone) versus taxonomic similarity (dog/wolf). PLM-augmented topic models capture a fundamentally different semantic structure than classical LDA, and conflating the two leads to misleading evaluations and downstream misapplication.
When evaluating a topic model, it is tempting to treat coherence as a monolithic property. If a topic surfaces words that feel related, the model seems to be working. But “related” papers over a meaningful divide that a new study makes explicit: the difference between words that tend to appear in the same context versus words that belong to the same category.
Disentangling Similarity and Relatedness in Topic Models formalizes this distinction using two axes borrowed from psycholinguistics. Thematic relatedness captures associative co-occurrence — dog and bone belong together because one evokes the other in typical situations. Taxonomic similarity captures categorical membership — dog and wolf belong together because they share structural and biological properties. Human cognition handles both, but it handles them differently, and so do language models.
The paper’s central argument is that classical co-occurrence models like Latent Dirichlet Allocation lean toward thematic relatedness. LDA is trained on word co-occurrence statistics, so it surfaces words that appear in similar contexts and documents — the associative, situational kind of relationship. Pre-trained language models, by contrast, encode richer distributional structure that includes taxonomic information. When PLMs are integrated into topic modeling pipelines, the resulting topics reflect a different semantic geometry than what LDA produces, even when both appear superficially coherent.
This is not just a theoretical concern. Topic models feed into downstream tasks: document clustering, information retrieval, content recommendation, and qualitative analysis in social science. If a researcher assumes their PLM-augmented topic model is producing the same kind of semantic groupings as LDA — just better — they may draw incorrect inferences. A topic that clusters wolf, dog, and fox taxonomically is doing something categorically different from one that clusters dog, bone, and leash thematically, even if both score well on standard coherence metrics that do not distinguish between the two.
The formalization the authors provide is useful precisely because it gives practitioners a vocabulary and a measurement framework to ask: what kind of relatedness am I actually capturing, and is that what my application requires? Retrieval systems often benefit from taxonomic groupings — you want documents about wolves to surface when someone queries canids. Recommendation systems may benefit more from thematic groupings — someone reading about dogs is probably interested in training, nutrition, and veterinary care, not taxonomy.
The work also has implications for how PLM-augmented topic models should be evaluated. Standard coherence metrics tend to reward co-occurrence-based relationships, which means they may systematically undervalue or mischaracterize models that capture taxonomic structure well. Building evaluation benchmarks that separately probe thematic and taxonomic coherence would give a clearer picture of what any given model is actually doing.
There is a broader lesson here about the gap between performance and understanding in NLP. PLMs have improved results across almost every benchmark they have been applied to, including topic modeling. But improvement on a benchmark does not guarantee that the underlying semantic structure being captured is the one the benchmark was designed to measure. The field has largely moved faster than its evaluative frameworks, and work like this — which slows down to ask what is actually being learned — provides necessary conceptual grounding.
For anyone building systems that depend on semantic organization, whether topic models, knowledge graphs, or retrieval pipelines, the distinction between thematic and taxonomic relationships deserves more explicit attention than it typically receives.