Self-Improvement Loop Sandbox

A small automated research pipeline where a language model iteratively rewrites its own few-shot prompts and measures whether downstream task performance actually improves run-over-run

Difficulty: 1-week | Stack: Python, OpenAI API (or Anthropic API), LangChain, SQLite, Rich (CLI dashboard)

Who this is for

Developers and AI safety practitioners who want hands-on empirical evidence of how hard (or easy) LLM self-improvement actually is in a controlled, observable setting — directly instantiating the blog post’s core argument

Build steps

Pick a narrow, evaluable task with an automated scorer: e.g., solving grade-school math word problems (GSM8K subset) or generating syntactically valid SQL from natural language
Implement a baseline few-shot prompt and an automated eval loop that scores the model on a fixed 50-question hold-out set, storing results and the prompt version in SQLite
Add a ‘self-improver’ step: after each eval, pass the failing examples plus the current prompt to the model with instructions to propose a revised prompt; record the proposed change
Run 10-20 improvement iterations automatically, tracking score trajectory, prompt diff size, and the frequency of regressions (cases where the new prompt scores worse)
Render a Rich CLI dashboard showing the iteration history, a diff of each prompt mutation, and a running chart of pass-rate — making the ‘slow climb with frequent regressions’ dynamic visible

Risks

API costs can escalate quickly with 20 iterations × 50 eval questions × self-improver call — set a hard budget cap and use a cheap model (e.g., gpt-4o-mini) for the eval loop to control spend
The model may overfit the prompt to the eval set rather than genuinely improving reasoning — use a separate validation set to catch this and document it as a finding, since it illustrates a real self-improvement failure mode
Results are heavily prompt-sensitive and may not generalize; frame the project as an exploration tool, not a definitive experiment, and make the task and eval set swappable