Withthegrowingimportanceoflargenetwork modelsandenormoustrainingdatasets,GPUs havebecomeincreasinglynecessarytotrainneural networks.Thisislargelybecauseconventional optimizationalgorithmsrelyonstochastic gradientmethodsthatdon’tscalewelltolarge numbersofcoresina