Quantizing data for distributed learning

periodic_26783 11 0 .pdf 2021-01-24 06:01:51

Quantizing data for distributed learning

We consider machine learning applications that train a model by leveraging data distributed over a network, where communication constraints can create a performance bottleneck. A number of recent approaches are proposing to overcome this bottleneck through compression of gradient updates.However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alternate approach, that quantizes data instead of gradients, and can support learning over applications where the size of gradient updates is prohibitive. Our approach combines aspects of: (1) sample selection; (2) dataset quantization; and (3) gradient compensation. We analyze the convergence of the proposed approach for smooth convex and non-convex objective functions and show that we can achieve order optimal convergence rates with communication that mostly depends on the data rather than the model (gradient) dimension. We use our proposed algorithm to train ResNet models on the CIFAR-10 and ImageNet datasets, and show that we can achieve an order of magnitude savings over gradient compression methods.

量化数据以进行分布式学习

我们考虑使用机器学习应用程序，这些应用程序通过利用分布在网络上的数据来训练模型，而通信约束可能会造成性能瓶颈。提出了许多最新方法来通过压缩梯度更新来克服此瓶颈。.. 但是，随着模型变大，渐变的大小也会更新。在本文中，我们提出了另一种方法，该方法量化数据而不是梯度，并且可以支持在梯度更新的大小被禁止的应用程序中学习。我们的方法结合了以下方面：（1）样本选择；（2）数据集量化；（3）梯度补偿。我们分析了针对光滑凸和非凸目标函数的拟议方法的收敛性，并表明我们可以通过主要依赖于数据而不是模型（梯度）维的通信来实现阶最优收敛速度。我们使用提出的算法在CIFAR-10和ImageNet数据集上训练ResNet模型，并表明我们可以在梯度压缩方法上节省一个数量级。（阅读更多）