Privacy

Federated Learning Without Compromise

The Privacy-Utility Tradeoff Federated learning promises to train models on distributed data without centralizing sensitive information. In practice, existing approaches force uncomfortable tradeoffs: Differential privacy adds noise that degrades model quality Secure aggregation increases communication costs Data heterogeneity causes convergence problems Byzantine participants can poison the model We present techniques that mitigate these tradeoffs. Our Approach Adaptive Clipping Standard gradient clipping uses a fixed threshold $C$: $$g_i^{clipped} = g_i \cdot \min\left(1, \frac{C}{|g_i|}\right)$$...

May 30, 2022 · 3 min · 438 words · Zach Kelling

Federated Learning for Open AI

Training large language models requires vast amounts of data. That data often contains sensitive information. Federated learning offers a path to train on distributed, private data without centralizing it. The Centralization Problem Traditional ML training follows a simple pattern: collect data, aggregate it centrally, train models. This creates problems: Privacy risk: Sensitive data leaves user control Legal barriers: Regulations prevent data movement across jurisdictions Trust requirements: Data holders must trust the training party Single points of failure: Central aggregation creates vulnerabilities Federated Learning Basics Federated learning inverts the pattern....

May 9, 2022 · 3 min · 510 words · Zach Kelling