CoT Graph Compressor

A Streamlit app that converts a model’s chain-of-thought trace into a Mermaid reasoning graph, lets you prune redundant nodes, and re-injects the compressed graph as a structured prompt prefix.

Difficulty: 1-week | Stack: Python, Streamlit, Anthropic SDK, Mermaid.js (via streamlit-mermaid), NetworkX, spaCy

Who this is for

Prompt engineers and researchers who want to explore the Render-of-Thought hypothesis practically — making CoT reasoning inspectable and shorter without sacrificing accuracy.

Build steps

Build a Streamlit UI that sends a user question to Claude with an explicit chain-of-thought instruction and streams back the reasoning trace.
Use spaCy’s dependency parser to extract (subject, predicate, object) triples from each CoT sentence and build a directed NetworkX reasoning graph.
Render the graph as a Mermaid diagram embedded in Streamlit; highlight nodes by estimated semantic redundancy (cosine similarity between adjacent node embeddings using sentence-transformers).
Add an interactive node-pruning panel where the user can collapse or remove redundant steps, then serialize the pruned graph back to a compact bullet-point summary.
Re-inject the compact summary as a structured prefix for a second LLM call and display accuracy delta and token savings side-by-side.
Run 20 benchmark questions from GSM8K to compare original CoT token count vs. compressed prefix token count vs. final answer accuracy.

Risks

Triple extraction from natural language CoT is noisy — spaCy often misparses conditional or hypothetical sentences, producing a garbled graph.
The re-injection format (bullets vs. JSON vs. pseudo-logic) significantly affects downstream accuracy and requires empirical tuning rather than a principled answer.
Mermaid graphs become unreadable beyond ~15 nodes, so longer reasoning chains (math proofs, multi-step code) will exceed the visualization limit quickly.

CoT Graph Compressor

Who this is for

Build steps

Risks

Business Angle