Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking