AI Pulse
← Projects · 1-month

Domain-Specialized Offline Assistant via Synthetic Fine-Tuning

Fine-tune a small open-weight model on a narrow regulated domain using cloud-generated synthetic data, then deploy fully air-gapped.

Difficulty: 1-month | Stack: Python, Axolotl or TRL for LoRA fine-tuning, Claude API or GPT-4o for synthetic data generation (one-time), Qwen-2.5 7B or Mistral 7B as base, CUDA GPU (A100/H100 rented, or RTX 4090 local), vLLM for serving, Pytest for eval harness

Who this is for

Teams in healthcare, legal, or industrial settings that need reliable domain Q&A but cannot send queries to cloud APIs due to compliance — they want a model they can audit and run on-prem.

Build steps

  1. Pick a narrow domain with clear input/output structure (e.g., ICD-10 coding from clinical notes, contract clause extraction, PLC fault diagnosis); collect 20–50 real examples to anchor quality
  2. Generate synthetic dataset: use a capable cloud model to produce 2,000–5,000 instruction/response pairs in the domain; apply rejection sampling — score outputs with a rubric, drop bottom 20%
  3. Fine-tune with LoRA (r=16, alpha=32) on the base model using Axolotl; train 3 epochs, eval on held-out 10% set; checkpoint every epoch
  4. Build a domain eval harness: 50 hand-labeled test cases, automated scoring (exact match / F1 / GPT-judge), compare fine-tuned vs. base vs. cloud model
  5. Serve with vLLM behind a FastAPI wrapper; add a confidence-threshold layer that flags low-certainty answers for human review
  6. Package into a Docker image with model weights baked in; verify it runs fully offline and document VRAM requirements

Risks

  • Synthetic data quality ceiling: if the cloud model makes domain errors, fine-tuning bakes them in — need SME spot-checks on at least 10% of training data
  • Catastrophic forgetting on general reasoning if LoRA rank or learning rate is too high — monitor eval on a general benchmark (MMLU subset) in parallel
  • Regulated domains have compliance requirements beyond model accuracy (audit logs, version pinning, explainability) — shipping to prod requires more than a fine-tuned weight file

Business Angle

Sell air-gapped domain AI to compliance-locked teams who can't touch cloud APIs

Customer: IT director or lead engineer at a 50-500 person healthcare clinic, law firm, or industrial manufacturer — they have a GPU server gathering dust, a compliance officer blocking cloud AI, and junior staff drowning in repetitive document Q&A

Pricing: one-time — $8,000 one-time per client, 2 clients in first 3 months = $16k; then aim for 1/month steady state

Full business breakdown →