Conditioning Trick for Training Stable GANs

In this paper we propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.This binding makes the generator amenable to truncation and does not limit exploring all the possible modes. We slightly modify the BigGAN architecture incorporating residual network for synthesizing 2D representations of audio signals which enables reconstructing high quality sounds with some preserved phase information. Additionally, the proposed conditional training scenario makes a trade-off between fidelity and variety for the generated spectrograms. The experimental results on UrbanSound8k and ESC-50 environmental sound datasets and the Mozilla common voice dataset have shown that the proposed GAN configuration with the conditioning trick remarkably outperforms baseline architectures, according to three objective metrics: inception score, Frechet inception distance, and signal-to-noise ratio.

训练稳定GAN的空调技巧

在本文中,我们提出了一种应对技巧,称为差异偏离常态,适用于发电机网络,以应对GAN训练期间的不稳定性问题。我们迫使发生器更接近于在Schur分解的光谱域中计算的真实样本的正态函数的偏离。.. 这种绑定使生成器可以被截断,并且不限制探索所有可能的模式。我们略微修改了包含残差网络的BigGAN架构,以合成音频信号的2D表示形式,从而可以使用一些保留的相位信息来重建高质量的声音。另外,提出的条件训练场景在生成的频谱图的保真度和多样性之间进行了权衡。在UrbanSound8k和ESC-50环境声音数据集以及Mozilla通用语音数据集上的实验结果表明,根据三个客观指标,所提出的带有调节技巧的GAN配置明显优于基线架构:初始得分,Frechet初始距离和信号强度。噪声比。 (阅读更多)