AI Pulse
← Projects · 1-week

CoT Graph Compressor

A Streamlit app that converts a model’s chain-of-thought trace into a Mermaid reasoning graph, lets you prune redundant nodes, and re-injects the compressed graph as a structured prompt prefix.

Difficulty: 1-week | Stack: Python, Streamlit, Anthropic SDK, Mermaid.js (via streamlit-mermaid), NetworkX, spaCy

Who this is for

Prompt engineers and researchers who want to explore the Render-of-Thought hypothesis practically — making CoT reasoning inspectable and shorter without sacrificing accuracy.

Build steps

  1. Build a Streamlit UI that sends a user question to Claude with an explicit chain-of-thought instruction and streams back the reasoning trace.
  2. Use spaCy’s dependency parser to extract (subject, predicate, object) triples from each CoT sentence and build a directed NetworkX reasoning graph.
  3. Render the graph as a Mermaid diagram embedded in Streamlit; highlight nodes by estimated semantic redundancy (cosine similarity between adjacent node embeddings using sentence-transformers).
  4. Add an interactive node-pruning panel where the user can collapse or remove redundant steps, then serialize the pruned graph back to a compact bullet-point summary.
  5. Re-inject the compact summary as a structured prefix for a second LLM call and display accuracy delta and token savings side-by-side.
  6. Run 20 benchmark questions from GSM8K to compare original CoT token count vs. compressed prefix token count vs. final answer accuracy.

Risks

  • Triple extraction from natural language CoT is noisy — spaCy often misparses conditional or hypothetical sentences, producing a garbled graph.
  • The re-injection format (bullets vs. JSON vs. pseudo-logic) significantly affects downstream accuracy and requires empirical tuning rather than a principled answer.
  • Mermaid graphs become unreadable beyond ~15 nodes, so longer reasoning chains (math proofs, multi-step code) will exceed the visualization limit quickly.

Business Angle

Sell CoT Graph Compressor as a token-cost reduction tool for AI teams burning money on long reasoning chains

Customer: Solo AI engineers or small startup CTOs (1-5 person teams) who are using Claude or GPT-4o with extended thinking / chain-of-thought prompting and are getting $500–$2,000/month token bills they want to cut

Pricing: one-time — $800 in one-time sales within 3 months (roughly 16 × $49 licenses via Gumroad or Lemon Squeezy)

Full business breakdown →