Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering