Rlhf · Jeremi's Blog

Deep reinforcement learning from human preferences

2 February 2024·4 mins

Review of the original technique which is widely used today for LLM alignment. It’s the foundation to many other alignment techinques proposed in the recent litterature.