ByteCover: Cover Song Identification via Multi-Loss Training

We present in this paper ByteCover, which is a new feature learning method for cover song identification (CSI). ByteCover is built based on the classical ResNet model, and two major improvements are designed to further enhance the capability of the model for CSI.In the first improvement, we introduce the integration of instance normalization (IN) and batch normalization (BN) to build IBN blocks, which are major components of our ResNet-IBN model. With the help of the IBN blocks, our CSI model can learn features that are invariant to the changes of musical attributes such as key, tempo, timbre and genre, while preserving the version information. In the second improvement, we employ the BNNeck method to allow a multi-loss training and encourage our method to jointly optimize a classification loss and a triplet loss, and by this means, the inter-class discrimination and intra-class compactness of cover songs, can be ensured at the same time. A set of experiments demonstrated the effectiveness and efficiency of ByteCover on multiple datasets, and in the Da-TACOS dataset, ByteCover outperformed the best competitive system by 20.9\%.

ByteCover:通过多次损失培训来识别翻唱歌曲

我们在本文中介绍了ByteCover,这是一种用于翻唱歌曲识别(CSI)的新功能学习方法。ByteCover基于经典的ResNet模型构建,并且设计了两个主要改进以进一步增强CSI模型的功能。.. 在第一个改进中,我们引入了实例规范化(IN)和批处理规范化(BN)的集成来构建IBN块,这是我们ResNet-IBN模型的主要组成部分。借助IBN块,我们的CSI模型可以在保留版本信息的同时,学习与音乐属性(例如键,速度,音色和流派)的变化无关的特征。在第二个改进中,我们采用BNNeck方法进行多次损失训练,并鼓励我们的方法共同优化分类损失和三重损失,并以此来优化翻唱歌曲的类间区分和类内紧凑性,可以同时确保。一组实验证明了ByteCover在多个数据集以及Da-TACOS数据集中的有效性和效率, (阅读更多)