AI Pulse
← Projects · 1-month

Privacy-Preserving Federated ASR Adapter Aggregator

A minimal federated learning server that aggregates LoRA adapter updates from multiple edge clinic nodes without ever collecting raw audio, then redistributes an improved shared adapter.

Difficulty: 1-month | Stack: Python, PyTorch, PEFT, Flower (flwr), FastAPI, gRPC, Docker Compose, SQLite

Who this is for

A hospital network or telehealth platform running identical ASR devices at multiple sites — each site adapts to its own speakers, but all sites benefit from aggregated dialect knowledge without sharing patient audio.

Build steps

  1. Stand up a Flower federated learning server with a custom FedAvg strategy that aggregates only LoRA adapter weight deltas (not full model weights), keeping communication payload under 10 MB per round.
  2. Implement a Flower client that wraps the on-device Whisper + PEFT training loop from Project 1; each round it receives the global adapter, fine-tunes locally for N steps on its private audio buffer, then sends back the weight delta.
  3. Add differential privacy noise injection (Gaussian mechanism via Opacus) to the client’s gradient updates before transmission, with configurable ε budget so operators can tune the privacy-utility tradeoff.
  4. Build a FastAPI coordination layer that tracks client participation, enforces a minimum quorum (e.g., 3 of 5 nodes must contribute) before aggregation, and stores global adapter checkpoints with metadata (round, participating_sites, avg_WER).
  5. Create a Docker Compose setup with one server container and three simulated client containers each trained on different accent subsets of Mozilla Common Voice, so a developer can run the full federated cycle locally and measure WER improvement per round.
  6. Write an evaluation script that replays federation rounds and plots per-site WER over rounds, showing that the shared adapter outperforms both the base model and site-only adaptation after ~5 rounds.

Risks

  • Differential privacy kills utility at small datasets: at clinically realistic dataset sizes (hundreds of utterances per site), adding ε<2 DP noise often prevents any meaningful WER improvement — you may need to demonstrate the tradeoff honestly rather than claiming a working system.
  • Flower version instability: flwr’s API changes significantly between minor versions; pin your dependencies tightly and test client-server version mismatch explicitly or federation silently fails with cryptic gRPC errors.
  • Simulated federation gap: Docker Compose clients on one machine share CPU cache and memory bandwidth, so RTF and timing results won’t transfer to real multi-device deployments — flag this clearly in your README so users don’t deploy based on simulated performance numbers.