TD Learning

Simplifying Deep Temporal Difference Learning

8 April 2025·6 mins

The authors propose PQN, a simplified version of DQN that leverages LayerNorm and L2 regularization to get rid of the target networks while maintaining a high level of training stability. The authors also prove a convergence property on PQN, which wasn’t available for the current de-facto algorithm PPO.