AI Pulse
← Projects · weekend

Pipe-level Token Filter for Agent CLIs

A configurable stdin→stdout filter that strips low-signal CLI output before it hits your LLM context.

Difficulty: weekend | Stack: Python, Click, regex/AST rules, pytest

Who this is for

Agent builders running shell tools who pay per token — cuts costs on ls/git/build output without touching prompts.

Build steps

  1. Define a YAML rule schema: rule has a pattern (regex), a mode (drop-line | truncate | summarize), and optional max-lines cap.
  2. Build a streaming stdin reader that applies rules in priority order, emitting filtered bytes to stdout.
  3. Ship 5 built-in rule sets: git diff, directory listing, pytest output, docker logs, npm/pip install traces.
  4. Add a —benchmark flag that prints original vs filtered token count using tiktoken.
  5. Write a CLI entry point installable via pipx so it composes with any agent framework via pipe.

Risks

  • Overly aggressive rules silently drop signal the model needed — need a —dry-run mode that shows what would be stripped.
  • Streaming line-by-line breaks on multi-line constructs (stack traces, JSON blobs) — need lookahead buffer logic.
  • Token count benchmarks vary by model tokenizer — tiktoken cl100k may not match the target model’s actual count.

Business Angle

Pipe-level token filter that strips noisy CLI output before it reaches your LLM context window

Customer: Solo dev or indie hacker running LLM-powered coding agents (Claude Code, aider, cursor background agents) who shells out to git/npm/pytest and watches token costs spike from verbose stdout

Pricing: freemium — $800 MRR in 4 months

Full business breakdown →