On-Device Private Code Reviewer with Nemotron Ultra
A git pre-push hook that runs Nemotron Ultra locally via llama.cpp and outputs a structured JSON review of your diff before it leaves your machine.
Difficulty: weekend | Stack: Python, llama.cpp (GGUF backend), Nemotron-3-Ultra-GGUF weights, Click CLI, Pydantic
Who this is for
Developers at companies with strict data-residency rules who want LLM code review without sending source to a cloud API.
Build steps
- Download Nemotron 3 Ultra GGUF weights and verify they fit in VRAM/RAM; benchmark tokens/sec on target hardware (RTX 3090 or better recommended).
- Write a Python wrapper around llama-cpp-python that accepts a git diff string and a Pydantic schema (ReviewComment list with file, line, severity, message fields) and returns structured JSON via constrained decoding (grammar mode).
- Build a Click CLI entry point:
nemoreview --stagedthat callsgit diff --cached, feeds it to the wrapper, and pretty-prints findings grouped by severity. - Wire up a git pre-push hook that invokes the CLI, blocks the push if HIGH severity findings exist, and writes a
.nemoreview_report.jsonartifact. - Add a
--thresholdflag and a project-level.nemoreview.tomlconfig for per-repo severity cutoffs and file-pattern excludes.
Risks
- Nemotron Ultra at full precision may exceed 24 GB VRAM; quantized GGUF (Q4_K_M) trades quality for fit — verify structured-output reliability degrades acceptably before relying on it in CI.
- Constrained JSON decoding via llama.cpp grammars can time out or loop on malformed partial output; add a hard token-budget limit and fallback to a lenient regex parser.
- Git diffs for large refactors can exceed the context window; need a chunking strategy that preserves enough surrounding context to avoid false positives on moved code.
Business Angle
A CLI tool + git hook that runs Nemotron Ultra locally to review your diffs before push — zero data leaves the machine.
Customer: Mid-level to senior software engineer at a fintech, healthtech, or govtech company with a strict data-residency or IP-protection policy, who is personally frustrated that tools like Copilot and CodeRabbit are blocked by InfoSec but still wants LLM-assisted review without filing a ticket
Pricing: one-time — $800 in one-time sales within 3 months (~16 licenses at $49)
Full business breakdown →