Project Ideas

Buildable projects inspired by the latest AI frontier research.

Agent Behavior Pattern Library (ADRA-Bank Clone)

A personal catalogue of recorded agent trajectories—tagged by failure mode—that you can replay, diff, and query to understand why an agent regressed between versions.

Python FastAPI SQLite + SQLAlchemy Pydantic

weekend

Agent PII Sentinel

A proxy layer that intercepts and redacts PII before an autonomous web agent submits it to any endpoint.

Python mitmproxy Playwright spaCy

weekend

Agent Session Archivist

A CLI tool that captures, tags, and links AI coding-session transcripts to the git commits they produced.

Python Click SQLite GitPython

1-month

Agentic PR Review Bot

GitHub App that assigns sub-tasks from an open PR to a Codex agent: write missing tests, fix lint errors, suggest refactors — then pushes results as commits.

Project Ideas

Agent Behavior Pattern Library (ADRA-Bank Clone)

Agent PII Sentinel

Agent Session Archivist

Agentic PR Review Bot

Agentic Task Runner with Hardware-Aware Model Routing

AGI Takeoff Speed Simulator

AI Claim Veracity Auditor

AI Policy Tracker & Stance Comparator

AI Velocity Ledger

Architecture-Aware Model Router

Async Codex Task Dashboard

Backdoor Trigger Generalization Stress-Tester

Benchmark Blindspot Detector

Branch-Aware Trajectory Sampler for Multi-Turn Agents

Code-to-Math Problem Synthesizer

Codebase Context Index

Confidence-Gated Distillation Trainer

Constraint-Violation Detector for Robot Trajectory Descriptions

Context Vocabulary Scope Visualizer

Cooperative SFT+RL Interleaving Scheduler

CoT Graph Compressor

Counterfactual Consistency Probe for Vision-Language Models

Critic-Generator Research Agent

Cross-Problem Failure Memory for Coding Agents

Cultural Commonsense Probe Harness

CulturalBench: Automated Cultural-Knowledge Probe for LLMs

CultureCaptions: Native-Sourced Image-Text Collector

Data-Residency Compliance Checker for AI Pipelines

Decision Log Weaver

Dense Reward Agent Trainer: From Sparse Outcomes to Step Signals

Depth-Memory Spatial Q&A

Developer Session Productivity Estimator

Dialect-Adaptive ASR Benchmark Dashboard

Domain Capability Ceiling Tracker

Domain-Specialized Offline Assistant via Synthetic Fine-Tuning

Enterprise AI Adoption Tracker

Evolving-World Memory Probe

Financial-Stakes Agent Eval Harness

Hidden-State Lie Detector

Hybrid Moderation Queue

InfoDensity Reasoning Compressor

Information-Weighted Video Frame Compressor for Vision LLMs

Interactive Algorithm Visualizer from Paper Abstract

Language-Agnostic SWE Mini-Bench Runner

Latent-State Streaming Chat UI

LLM Architecture Throughput Benchmarker

Local Inference Benchmark Dashboard

Logic Drift Detector

Long-Context Local RAG Without Chunking

LowResAdapt: Principled LoRA Fine-Tuning CLI for Low-Resource Languages

Mini RoboTrustBench: Four-Scenario Robustness Suite for Pluggable World Models

Modality Gap Probe

Model Edit Reversal Curse Auditor

Multi-Agent Safety Debate Arena

Multi-Hop RAG with Evolving Evidence Tracker

Multi-Tenant Personalization Sidecar API

Multimodal RAG Evaluator

Natural-Language-to-Simulation Scenario Expander for Embodied AI

Natural-Language Video Edit Agent

Negation Ablation Sandbox

Novelty Memory Bot for Your Reading List

Observational Equivalence Test Generator

On-Device Private Code Reviewer with Nemotron Ultra

On-Device Whisper Fine-Tuner for Noisy Telephony Audio

Ontology-Grounded Agent Compliance Checker

Persistent Persona Chatbot with Compressed Session Memory

Personal Workflow Distiller

Physical Plausibility Filter for Synthetic Video Datasets

Physics-Regime Gym Wrapper

Pipe-level Token Filter for Agent CLIs

Privacy-First Desktop Automation Agent

Privacy-Preserving Federated ASR Adapter Aggregator

Probe-Based Topic Coherence Benchmark Generator

Probe Format Confounder Benchmark

RAG Parser Canary Suite

Regulatory Landscape Briefing Bot

Rendering-Aware Document Preprocessor

Repo Pattern Guard

RL Environment Spec Generator