3_Monte-Carlo_RL 文章目录1.1. 前言1.1.1. 算法特性1.1.2. 目标1.2. 两种Monte-Carlo 估计价值函数1.2.1. First Visit1.2.2. Every Visit1.2.3. 小tips: Incremental Mean1.3. Monte Carlo Control (Approximate optimal policies)1.3.1. Over all idea2. Temporal-difference reinforcement learning (TD)2.1. 概念:2.2. MC和TDTD target、TD error