徒然weed

アウトプットの場

Human Level Control Through Deep Reinforcement Learning (Vlad Mnih, Koray Kavukcuoglu, et al. 2015)

f:id:shintaro-football7:20200607135930j:plain

 Today's Thesis

Vlad Mnih, Koray Kavukcuoglu, et al. Human Level Control Through Deep Reinforcement Learning | DeepMind, Nature 2015

deepmind.com

 

 Summary

- similar to the previous thesis ([1312.5602] Playing Atari with Deep Reinforcement Learning

- in the [1312.5602] Playing Atari with Deep Reinforcement Learning, they generated the target Q from the network whose parameter is one time-step before but here, they prepare two network  Q and \hat{Q}

-  Q estimates the current Q and will be updated performing gradient descent step on square error but  \hat{Q} will be updated every C step. (C is given)

- in the standard Q-learning an update that increases  Q(s_t, a) also increases Q(s_t, a) for all a and hence increase target y_j which will cause a divergence of the policy. The proposed technique will prevent this.

 

 

 本日のおまけ

youtu.be