ON LARGE BATCH TRAINING FOR DEEP LEARNING GENERALIZATION GAP AND SHARP MINIMA