Practical Locally Private Federated Learning with Communication Efficiency
Federated learning (FL) is a technique that trains machine learning models from decentralized data sources. We study FL under local differential privacy constraints, which provides strong protection against sensitive data disclosures via obfuscating the data before leaving the client.We identify two major concerns in designing practical privacy-preserving FL algorithms: communication efficiency and high-dimensional compatibility. We then develop a gradient-based learning algorithm called \emph{sqSGD} (selective quantized stochastic gradient descent) that addresses both concerns. The proposed algorithm is based on a novel privacy-preserving quantization scheme that uses a constant number of bits per dimension per client. Then we improve the base algorithm in two ways: first, we apply a gradient subsampling strategy that offers simultaneously better training performance and smaller communication costs under a fixed privacy budget. Secondly, we utilize randomized rotation as a preprocessing step to reduce quantization error. We also initialize a discussion about the role of quantization and perturbation in FL algorithm design with privacy and communication constraints. Finally, the practicality of the proposed framework is demonstrated on benchmark datasets. Experiment results show that sqSGD successfully learns large models like LeNet and ResNet with local privacy constraints. In addition, with fixed privacy and communication level, the performance of sqSGD significantly dominates baselines that do not involve quantization.
具有交流效率的实用本地私人联合学习
联合学习(FL)是一种从分散数据源训练机器学习模型的技术。我们在本地差异性隐私约束下研究FL,这通过在离开客户端之前对数据进行混淆来提供针对敏感数据泄露的强大保护。.. 在设计实用的隐私保护FL算法时,我们确定了两个主要问题:通信效率和高维兼容性。然后,我们开发了一种基于梯度的学习算法,称为\ emph {sqSGD}(选择性量化的随机梯度下降),可以解决这两个问题。所提出的算法基于一种新颖的隐私保护量化方案,该方案使用每个客户端每个维度恒定数量的位。然后,我们通过两种方式改进基本算法:首先,我们采用了梯度子采样策略,该策略可在固定的隐私预算下同时提供更好的训练性能和较小的通信成本。其次,我们利用随机旋转作为预处理步骤来减少量化误差。我们还初始化了关于具有隐私和通信约束的FL算法设计中量化和扰动作用的讨论。最后,在基准数据集上证明了所提出框架的实用性。实验结果表明,sqSGD成功学习了具有本地隐私约束的大型模型,例如LeNet和ResNet。此外,在固定的隐私和通信级别的情况下,sqSGD的性能显着主导了不涉及量化的基准。 (阅读更多)
暂无评论