Efficient average reward reinforcement learning using constant shifting values