Privacy-Preserving Federated ASR Adapter Aggregator
A minimal federated learning server that aggregates LoRA adapter updates from multiple edge clinic nodes without ever collecting raw audio, then redistributes an improved shared adapter.
Difficulty: 1-month | Stack: Python, PyTorch, PEFT, Flower (flwr), FastAPI, gRPC, Docker Compose, SQLite
Who this is for
A hospital network or telehealth platform running identical ASR devices at multiple sites — each site adapts to its own speakers, but all sites benefit from aggregated dialect knowledge without sharing patient audio.
Build steps
- Stand up a Flower federated learning server with a custom FedAvg strategy that aggregates only LoRA adapter weight deltas (not full model weights), keeping communication payload under 10 MB per round.
- Implement a Flower client that wraps the on-device Whisper + PEFT training loop from Project 1; each round it receives the global adapter, fine-tunes locally for N steps on its private audio buffer, then sends back the weight delta.
- Add differential privacy noise injection (Gaussian mechanism via Opacus) to the client’s gradient updates before transmission, with configurable ε budget so operators can tune the privacy-utility tradeoff.
- Build a FastAPI coordination layer that tracks client participation, enforces a minimum quorum (e.g., 3 of 5 nodes must contribute) before aggregation, and stores global adapter checkpoints with metadata (round, participating_sites, avg_WER).
- Create a Docker Compose setup with one server container and three simulated client containers each trained on different accent subsets of Mozilla Common Voice, so a developer can run the full federated cycle locally and measure WER improvement per round.
- Write an evaluation script that replays federation rounds and plots per-site WER over rounds, showing that the shared adapter outperforms both the base model and site-only adaptation after ~5 rounds.
Risks
- Differential privacy kills utility at small datasets: at clinically realistic dataset sizes (hundreds of utterances per site), adding ε<2 DP noise often prevents any meaningful WER improvement — you may need to demonstrate the tradeoff honestly rather than claiming a working system.
- Flower version instability: flwr’s API changes significantly between minor versions; pin your dependencies tightly and test client-server version mismatch explicitly or federation silently fails with cryptic gRPC errors.
- Simulated federation gap: Docker Compose clients on one machine share CPU cache and memory bandwidth, so RTF and timing results won’t transfer to real multi-device deployments — flag this clearly in your README so users don’t deploy based on simulated performance numbers.