Latent-State Streaming Chat UI
Build a streaming chat interface that shows a ‘thinking indicator’ driven by real concurrent reasoning tokens, not a spinner hack
Difficulty: weekend | Stack: TypeScript, Next.js, Vercel AI SDK, Claude claude-sonnet-4-5 (extended thinking mode), Tailwind CSS
Who this is for
Developers demoing agent UX who want to visualize concurrent reasoning state rather than hiding latency behind fake animations
Build steps
- Scaffold Next.js app with Vercel AI SDK useChat hook targeting Claude claude-sonnet-4-5 with extended thinking enabled (budget_tokens: 8000)
- Stream thinking blocks and text blocks separately — render thinking block content in a collapsible side panel that updates in real-time
- Add per-token timing overlay: show thinking tokens/sec vs. response tokens/sec as live sparklines
- Implement ‘reasoning compression’ toggle: when on, summarize the thinking block via a second cheap LLM call before displaying
- Deploy to Vercel; measure perceived latency (time-to-first-text-token) with and without thinking block visible
Risks
- Claude extended thinking adds wall-clock latency before first text token — UI must handle 5-15s blank period gracefully
- Thinking block content is not always coherent prose — raw display may confuse non-technical users
- Vercel hobby plan function timeout (10s) too short for long thinking budgets — need pro plan or edge streaming workaround
Business Angle
SaaS boilerplate + live demo for streaming Claude extended-thinking UIs, sold to devs who need to ship agent interfaces fast
Customer: Indie dev or small agency building AI-powered SaaS products who needs to demo reasoning-aware chat to investors/clients within days, not weeks — has TypeScript skills but hasn't wired extended thinking + streaming before
Pricing: one-time — $800 one-time sales in month 1 (targeting 16 sales at $49), then layer in $29/mo hosted demo tier by month 3
Full business breakdown →