Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition

In this paper, we propose novel stochastic modeling of various components of a continuous sign language recognition (CSLR) system that is based on the transformer encoder and connectionist temporal classification (CTC). Most importantly, We model each sign gloss with multiple states, and the number of states is a categorical random variable that follows a learned probability distribution, providing stochastic fine-grained labels for training the CTC decoder.We further propose a stochastic frame dropping mechanism and a gradient stopping method to deal with the severe overfitting problem in training the transformer model with CTC loss. These two methods also help reduce the training computation, both in terms of time and space, significantly. We evaluated our model on popular CSLR datasets, and show its effectiveness compared to the state-of-the-art methods.

多状态手语的随机细粒度标签用于连续手语识别

在本文中,我们提出了一种基于变压器编码器和连接器时间分类(CTC)的连续手语识别(CSLR)系统各个组成部分的新型随机建模方法。最重要的是,我们用多个状态对每个符号光泽进行建模,状态数是遵循学习的概率分布的分类随机变量,为训练CTC解码器提供了随机的细粒度标签。.. 我们还提出了一种随机丢帧机制和梯度停止方法,以解决在训练带有CTC损耗的变压器模型时出现的严重过拟合问题。这两种方法还有助于显着减少时间和空间方面的训练计算量。我们在流行的CSLR数据集上评估了我们的模型,并显示了与最新方法相比的有效性。 (阅读更多)