RL Environment Spec Generator

A web app that takes a natural-language task description and generates a complete reinforcement learning environment specification — reward function, observation space, termination conditions, and verification harness.

Difficulty: 1-week | Stack: Next.js, TypeScript, Vercel AI SDK, shadcn/ui, Zod, Python (generated output), Claude API

Who this is for

ML engineers and researchers who want to bootstrap a new RL environment without writing boilerplate from scratch — the generated spec is immediately runnable with Gymnasium.

Build steps

Design a Zod schema for the RL environment spec: observation space definition, action space, reward function signature, termination conditions, and a verification test suite skeleton.
Build a Next.js form where users describe their task in plain English, optionally annotate verifiability properties (objective truth, noise level), and submit.
Stream the structured spec generation through the Vercel AI SDK using Claude with structured output mode, rendering each section as it arrives in an editable code block.
Add a ‘verifiability audit’ panel that scores the generated reward function against Verifier’s Law criteria and flags reward functions that are likely too noisy or too sparse.
Generate a downloadable Python file with a complete Gymnasium-compatible Env class stub, populated with the spec’s values and TODO comments for user logic.
Include three built-in example tasks (code formatting, receipt parsing, math proof checking) so users can see the tool’s output before committing their own description.

Risks

Generated reward functions will often be syntactically correct but semantically wrong for the user’s domain — frame the output explicitly as a ‘spec draft’ requiring human review, not a final artifact.
Gymnasium’s API surface changes between versions; pin to a specific version in the generated output and note it prominently.
Users may describe tasks with fundamentally low verifiability (e.g., ‘write a good poem’) — the verifiability audit panel must catch these and surface a clear warning rather than generating a junk spec.