Bilinear attention networks for visual question answering