Skip to main content

Rl

Simplifying Deep Temporal Difference Learning

·6 mins
The authors propose PQN, a simplified version of DQN that leverages LayerNorm and L2 regularization to get rid of the target networks while maintaining a high level of training stability. The authors also prove a convergence property on PQN, which wasn’t available for the current de-facto algorithm PPO.

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

·4 mins
Review of a MARL algorithm that enables learning a set of different policies while sharing gradients from a policy to all others. Thus enabling the agents to learn from it’s own on-policy data and also from off-policy data coming from other agents in that environment.