AI Pulse
← Projects · weekend

On-Device Private Code Reviewer with Nemotron Ultra

A git pre-push hook that runs Nemotron Ultra locally via llama.cpp and outputs a structured JSON review of your diff before it leaves your machine.

Difficulty: weekend | Stack: Python, llama.cpp (GGUF backend), Nemotron-3-Ultra-GGUF weights, Click CLI, Pydantic

Who this is for

Developers at companies with strict data-residency rules who want LLM code review without sending source to a cloud API.

Build steps

  1. Download Nemotron 3 Ultra GGUF weights and verify they fit in VRAM/RAM; benchmark tokens/sec on target hardware (RTX 3090 or better recommended).
  2. Write a Python wrapper around llama-cpp-python that accepts a git diff string and a Pydantic schema (ReviewComment list with file, line, severity, message fields) and returns structured JSON via constrained decoding (grammar mode).
  3. Build a Click CLI entry point: nemoreview --staged that calls git diff --cached, feeds it to the wrapper, and pretty-prints findings grouped by severity.
  4. Wire up a git pre-push hook that invokes the CLI, blocks the push if HIGH severity findings exist, and writes a .nemoreview_report.json artifact.
  5. Add a --threshold flag and a project-level .nemoreview.toml config for per-repo severity cutoffs and file-pattern excludes.

Risks

  • Nemotron Ultra at full precision may exceed 24 GB VRAM; quantized GGUF (Q4_K_M) trades quality for fit — verify structured-output reliability degrades acceptably before relying on it in CI.
  • Constrained JSON decoding via llama.cpp grammars can time out or loop on malformed partial output; add a hard token-budget limit and fallback to a lenient regex parser.
  • Git diffs for large refactors can exceed the context window; need a chunking strategy that preserves enough surrounding context to avoid false positives on moved code.

Business Angle

A CLI tool + git hook that runs Nemotron Ultra locally to review your diffs before push — zero data leaves the machine.

Customer: Mid-level to senior software engineer at a fintech, healthtech, or govtech company with a strict data-residency or IP-protection policy, who is personally frustrated that tools like Copilot and CodeRabbit are blocked by InfoSec but still wants LLM-assisted review without filing a ticket

Pricing: one-time — $800 in one-time sales within 3 months (~16 licenses at $49)

Full business breakdown →