AI Pulse
← Projects · weekend

Latent-State Streaming Chat UI

Build a streaming chat interface that shows a ‘thinking indicator’ driven by real concurrent reasoning tokens, not a spinner hack

Difficulty: weekend | Stack: TypeScript, Next.js, Vercel AI SDK, Claude claude-sonnet-4-5 (extended thinking mode), Tailwind CSS

Who this is for

Developers demoing agent UX who want to visualize concurrent reasoning state rather than hiding latency behind fake animations

Build steps

  1. Scaffold Next.js app with Vercel AI SDK useChat hook targeting Claude claude-sonnet-4-5 with extended thinking enabled (budget_tokens: 8000)
  2. Stream thinking blocks and text blocks separately — render thinking block content in a collapsible side panel that updates in real-time
  3. Add per-token timing overlay: show thinking tokens/sec vs. response tokens/sec as live sparklines
  4. Implement ‘reasoning compression’ toggle: when on, summarize the thinking block via a second cheap LLM call before displaying
  5. Deploy to Vercel; measure perceived latency (time-to-first-text-token) with and without thinking block visible

Risks

  • Claude extended thinking adds wall-clock latency before first text token — UI must handle 5-15s blank period gracefully
  • Thinking block content is not always coherent prose — raw display may confuse non-technical users
  • Vercel hobby plan function timeout (10s) too short for long thinking budgets — need pro plan or edge streaming workaround

Business Angle

SaaS boilerplate + live demo for streaming Claude extended-thinking UIs, sold to devs who need to ship agent interfaces fast

Customer: Indie dev or small agency building AI-powered SaaS products who needs to demo reasoning-aware chat to investors/clients within days, not weeks — has TypeScript skills but hasn't wired extended thinking + streaming before

Pricing: one-time — $800 one-time sales in month 1 (targeting 16 sales at $49), then layer in $29/mo hosted demo tier by month 3

Full business breakdown →