Pipe-level Token Filter for Agent CLIs

A configurable stdin→stdout filter that strips low-signal CLI output before it hits your LLM context.

Difficulty: weekend | Stack: Python, Click, regex/AST rules, pytest

Who this is for

Agent builders running shell tools who pay per token — cuts costs on ls/git/build output without touching prompts.

Define a YAML rule schema: rule has a pattern (regex), a mode (drop-line | truncate | summarize), and optional max-lines cap.
Build a streaming stdin reader that applies rules in priority order, emitting filtered bytes to stdout.
Ship 5 built-in rule sets: git diff, directory listing, pytest output, docker logs, npm/pip install traces.
Add a —benchmark flag that prints original vs filtered token count using tiktoken.
Write a CLI entry point installable via pipx so it composes with any agent framework via pipe.

Overly aggressive rules silently drop signal the model needed — need a —dry-run mode that shows what would be stripped.
Streaming line-by-line breaks on multi-line constructs (stack traces, JSON blobs) — need lookahead buffer logic.
Token count benchmarks vary by model tokenizer — tiktoken cl100k may not match the target model’s actual count.