Today's Thesis
Vlad Mnih, Koray Kavukcuoglu, et al. Human Level Control Through Deep Reinforcement Learning | DeepMind, Nature 2015
Summary
- similar to the previous thesis ([1312.5602] Playing Atari with Deep Reinforcement Learning)
- in the [1312.5602] Playing Atari with Deep Reinforcement Learning, they generated the target Q from the network whose parameter is one time-step before but here, they prepare two network and
- estimates the current Q and will be updated performing gradient descent step on square error but
will be updated every C step. (C is given)
- in the standard Q-learning an update that increases also increases
for all a and hence increase target
which will cause a divergence of the policy. The proposed technique will prevent this.
本日のおまけ