Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU) because of automatic speech recognition (ASR) errors. To improve the performance of slot filling, a successful approach is to use a statistical model that is trained on ASR one-best hypotheses. The state of the art models for slot filling rely on using discriminative sequence modeling methods, such as conditional random fields (CRFs), recurrent neural networks (RNNs) and the recent recurrent CRF