Simplifying Deep Temporal Difference Learning
·6 mins
The authors propose PQN, a simplified version of DQN that leverages LayerNorm and L2 regularization to get rid of the target networks while maintaining a high level of training stability. The authors also prove a convergence property on PQN, which wasn’t available for the current de-facto algorithm PPO.