Contextual bandit learning is a reinforcement learning problemwhere the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context