Context Vocabulary Scope Visualizer

Interactive tool that shows, token by token, how small the ‘active’ vocabulary really is for any given prompt.

Difficulty: weekend | Stack: Python, Transformers (HuggingFace), Gradio, Plotly

Who this is for

ML engineers and researchers who want intuition for the NanoSpec insight — seeing empirically that >90% of vocabulary logits are noise for any real context, motivating dynamic pruning work.

Build steps

Load a small causal LM (GPT-2 medium or TinyLlama) via HuggingFace and run a greedy forward pass on a user-supplied prompt.
At each generation step, capture the full logit vector over the vocabulary before softmax and record which tokens fall in the top-K (sweep K from 50 to 3000 to 30000).
Compute cumulative probability mass vs. vocabulary size and plot the curve — show the ‘knee’ where 3k tokens capture 99%+ of probability mass.
Build a Gradio interface with a text input, a token-step slider, and a Plotly bar chart of the top-100 active tokens colored by rank.
Add a side-by-side latency estimate panel comparing projected compute cost at full vocab (100k), NanoSpec size (3k), and static pruning (30k) using FLOP counts.

Risks

Full logit tensors for large models are memory-intensive — stick to models ≤7B or you’ll OOM on a consumer GPU.
Vocabulary ‘knees’ vary significantly by domain (code vs. prose vs. math), so a single demo prompt may not generalize — need diverse examples to tell a compelling story.
Gradio’s real-time slider updates can be sluggish if you’re re-running inference per step; pre-compute all steps and cache results instead.

Context Vocabulary Scope Visualizer

Who this is for

Build steps

Risks

Business Angle