Rl

Simplifying Deep Temporal Difference Learning

8 April 2025·6 mins

The authors propose PQN, a simplified version of DQN that leverages LayerNorm and L2 regularization to get rid of the target networks while maintaining a high level of training stability. The authors also prove a convergence property on PQN, which wasn’t available for the current de-facto algorithm PPO.

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

31 May 2024·4 mins

Review of a MARL algorithm that enables learning a set of different policies while sharing gradients from a policy to all others. Thus enabling the agents to learn from it’s own on-policy data and also from off-policy data coming from other agents in that environment.

Deep reinforcement learning from human preferences

2 February 2024·4 mins

Review of the original technique which is widely used today for LLM alignment. It’s the foundation to many other alignment techinques proposed in the recent litterature.