AI Pulse
← Projects · weekend

Privacy-First Desktop Automation Agent

Natural-language task runner for GUI automation using a locally-hosted computer-use model — screen data never leaves the machine.

Difficulty: weekend | Stack: Python, Holo3.1 (via Ollama or HF transformers), PyAutoGUI or pygetwindow, PIL for screenshots, FastAPI for optional local REST trigger

Who this is for

Developers and power users who want Copilot-style automation for their desktop but won’t pipe screenshots to a cloud endpoint — common in finance, law, healthcare.

Build steps

  1. Serve Holo3.1 locally via Ollama or transformers pipeline; verify screenshot → action inference works on a simple open-browser task
  2. Build a screen-capture loop: grab screenshot every N ms, encode to base64, send to local model with a task prompt
  3. Parse model output into pyautogui calls (click x,y / type text / key combo); add a dry-run mode that prints actions without executing
  4. Add a simple task queue: user types goal in terminal, agent loops until done or hits max-steps guard
  5. Wire a stop-hotkey (global keyboard listener) to kill the loop safely

Risks

  • Holo3.1 action parsing format may differ from what pyautogui expects — need a prompt template tuned to its output schema
  • Screenshot latency on CPU-only machines will make the loop too slow for fast UIs; may need to cap resolution or use CUDA
  • Runaway agent with no stop condition can destructively click through anything — must ship the kill-switch before testing

Business Angle

Local GUI automation agent for regulated-industry knowledge workers who can't send screenshots to the cloud

Customer: Solo compliance analyst or paralegal at a 10–50 person firm — owns their own machine, runs repetitive multi-app workflows (copy from court portal → paste into case management → log in spreadsheet), IT won't approve cloud tools, personally accountable if data leaks

Pricing: one-time — $1,200 in first 90 days (12 × $99 lifetime licenses)

Full business breakdown →