A disciplined approach to neural network hyper-parameters Part 1 -- learning rate, batch size, momentum, and weight decay, by Leslie N. Smith