Deep reinforcement learning from human preferences2 February 2024·4 minsReview of the original technique which is widely used today for LLM alignment. It’s the foundation to many other alignment techinques proposed in the recent litterature.