我们建议 米什 ,一种新颖的自规则非单调激活函数,可以在数学上定义为: F(X)=X谭⁡(sØFŤp升üs(X)) 。由于激活功能在神经网络的性能和训练动力学中起着至关重要的作用,因此我们针对几种架构和激活功能的最佳组合,在几个知名基准上进行了实验验证。..

Mish: A Self Regularized Non-Monotonic Activation Function

We propose $\textit{Mish}$, a novel self-regularized non-monotonic activation function which can be mathematically defined as: $f(x)=x\tanh(softplus(x))$. As activation functions play a crucial role in the performance and training dynamics in neural networks, we validated experimentally on several well-known benchmarks against the best combinations of architectures and activation functions.We also observe that data augmentation techniques have a favorable effect on benchmarks like ImageNet-1k and MS-COCO across multiple architectures. For example, Mish outperformed Leaky ReLU on YOLOv4 with a CSP-DarkNet-53 backbone on average precision ($AP_{50}^{val}$) by 2.1$\%$ in MS-COCO object detection and ReLU on ResNet-50 on ImageNet-1k in Top-1 accuracy by $\approx$1$\%$ while keeping all other network parameters and hyperparameters constant. Furthermore, we explore the mathematical formulation of Mish in relation with the Swish family of functions and propose an intuitive understanding on how the first derivative behavior may be acting as a regularizer helping the optimization of deep neural networks. Code is publicly available at https://github.com/digantamisra98/Mish.