This series is an easy summary(introduction) of the thesis I read.
[Today' s thesis]
- a classic introducing "deep Q-network" (DQN)
- the purpose to construct a Q-network is that, when the number of states of actions gets bigger, we can no longer use a state-action table.
- So what should we do instead of updating the action-value function according to the bellman equation ? → Use the state as an input and construct a network whose output is a action-value function which means the whole network is a approximate function of Q-value
- the aim of this technique is to bring the current closer to the optimal action-space function
- how do you update the network ? →Construct the loss function using the previous parameter
- when you train your network, to avoid the influence of the consecutive samples, you have to set a replay memory and choose a tuple randomly from it and update the parameter