Domain-Specialized Offline Assistant via Synthetic Fine-Tuning
Fine-tune a small open-weight model on a narrow regulated domain using cloud-generated synthetic data, then deploy fully air-gapped.
Difficulty: 1-month | Stack: Python, Axolotl or TRL for LoRA fine-tuning, Claude API or GPT-4o for synthetic data generation (one-time), Qwen-2.5 7B or Mistral 7B as base, CUDA GPU (A100/H100 rented, or RTX 4090 local), vLLM for serving, Pytest for eval harness
Who this is for
Teams in healthcare, legal, or industrial settings that need reliable domain Q&A but cannot send queries to cloud APIs due to compliance — they want a model they can audit and run on-prem.
Build steps
- Pick a narrow domain with clear input/output structure (e.g., ICD-10 coding from clinical notes, contract clause extraction, PLC fault diagnosis); collect 20–50 real examples to anchor quality
- Generate synthetic dataset: use a capable cloud model to produce 2,000–5,000 instruction/response pairs in the domain; apply rejection sampling — score outputs with a rubric, drop bottom 20%
- Fine-tune with LoRA (r=16, alpha=32) on the base model using Axolotl; train 3 epochs, eval on held-out 10% set; checkpoint every epoch
- Build a domain eval harness: 50 hand-labeled test cases, automated scoring (exact match / F1 / GPT-judge), compare fine-tuned vs. base vs. cloud model
- Serve with vLLM behind a FastAPI wrapper; add a confidence-threshold layer that flags low-certainty answers for human review
- Package into a Docker image with model weights baked in; verify it runs fully offline and document VRAM requirements
Risks
- Synthetic data quality ceiling: if the cloud model makes domain errors, fine-tuning bakes them in — need SME spot-checks on at least 10% of training data
- Catastrophic forgetting on general reasoning if LoRA rank or learning rate is too high — monitor eval on a general benchmark (MMLU subset) in parallel
- Regulated domains have compliance requirements beyond model accuracy (audit logs, version pinning, explainability) — shipping to prod requires more than a fine-tuned weight file
Business Angle
Sell air-gapped domain AI to compliance-locked teams who can't touch cloud APIs
Customer: IT director or lead engineer at a 50-500 person healthcare clinic, law firm, or industrial manufacturer — they have a GPU server gathering dust, a compliance officer blocking cloud AI, and junior staff drowning in repetitive document Q&A
Pricing: one-time — $8,000 one-time per client, 2 clients in first 3 months = $16k; then aim for 1/month steady state
Full business breakdown →