
扫码访问手机版
1 人学习
扫码访问手机版
课程目录
会员 Policy Gradient (Review)
00:00会员 Proximal Policy Optimization (PPO)
00:00学员 Q-learning (Basic Idea)
00:00会员 Q-learning (Advanced Tips)
00:00会员 Q-learning (Continuous Action)
00:00会员 Actor-Critic
00:00会员 Sparce Reward
00:00会员 Imitation Learning
00:00