徒然weed

アウトプットの場

Playing Atari with Deep Reinforcement Learning (Volodymyr Mnih et al., 2013)

f:id:shintaro-football7:20200607135930j:plain

This series is an easy summary(introduction) of the thesis I read. 

[Today' s  thesis]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou,
Daan Wierstra, Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS Deep Learning Workshop 2013.

 

summary 

- a classic introducing "deep Q-network" (DQN

- the purpose to construct a Q-network is that, when the number of states of actions gets bigger, we can no longer use a state-action table.

- So what should we do instead of updating the action-value function according to the bellman equation ? → Use the state as an input and construct a network whose output is a action-value function which means the whole network is a approximate function of Q-value

- the aim of this technique is to bring the current  Q(s, a, \theta_{t}) closer to the optimal action-space function  Q^{*}(s, a)= r_{t+1} + max_a Q(s_{t+1}, a_{t+1})

- how do you update the network ? →Construct the loss function using the previous parameter  \theta_{t-1} 

- when you train your network, to avoid the influence of the consecutive samples, you have to set a replay memory and choose a tuple  (s_{j}, a_{j}, r_{j}, s_{j+1}) randomly from it and update the parameter

 

youtu.be

 

 

 

本日のおまけ

youtu.be