Densely connected multidilated convolutional networks for dense prediction tasks
Tasks that involve high-resolution dense prediction require a modeling of both local and global patterns in a large input field. Although the local and global structures often depend on each other and their simultaneous modeling is important, many convolutional neural network (CNN)-based approaches interchange representations in different resolutions only a few times.In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net). D3Net involves a novel multidilated convolution that has different dilation factors in a single layer to model different resolutions simultaneously. By combining the multidilated convolution with the DenseNet architecture, D3Net incorporates multiresolution learning with an exponentially growing receptive field in almost all layers, while avoiding the aliasing problem that occurs when we naively incorporate the dilated convolution in DenseNet. Experiments on the image semantic segmentation task using Cityscapes and the audio source separation task using MUSDB18 show that the proposed method has superior performance over state-of-the-art methods.
密集连接的多重卷积网络,用于密集的预测任务
涉及高分辨率密集预测的任务需要对大型输入字段中的本地和全局模式进行建模。尽管局部和全局结构通常相互依赖,并且它们的同时建模很重要,但是许多基于卷积神经网络(CNN)的方法仅几次交换不同分辨率的表示。.. 在本文中,我们主张进行多分辨率表示的密集同时建模的重要性,并提出一种新颖的CNN体系结构,称为密集连接的多重DenseNet(D3Net)。D3Net涉及一种新颖的多层卷积,该多层卷积在单个层中具有不同的膨胀因子,以同时对不同的分辨率进行建模。通过将多重卷积与DenseNet架构相结合,D3Net将多分辨率学习与几乎所有层中呈指数增长的接受域结合在一起,同时避免了当我们天真地将扩张卷积并入DenseNet时出现的混叠问题。使用Cityscapes进行图像语义分割任务和使用MUSDB18进行音频源分离任务的实验表明,所提出的方法具有优于最新方法的性能。 (阅读更多)
暂无评论