Preface v Foreword 1 1 Introduction 3 1.1 Introduction to Multimodal Deep Learning . . 3 1.2 Outline of the Booklet . . 4 2 Introducing the modalities 7 2.1 State-of-the-art in NLP . . . 9 2.2 State-of-the-art in Computer Vision . . 33 2.3 Resources and Benchmarks for NLP, CV and multimodal tasks 54 3 Multimodal architectures 83 3.1 Image2Text . .