Arash Ahmadian on Rethinking RLHF

TalkRL: The Reinforcement Learning Podcast

Treść dostarczona przez Robin Ranjit Singh Chauhan. Cała zawartość podcastów, w tym odcinki, grafika i opisy podcastów, jest przesyłana i udostępniana bezpośrednio przez Robin Ranjit Singh Chauhan lub jego partnera na platformie podcastów. Jeśli uważasz, że ktoś wykorzystuje Twoje dzieło chronione prawem autorskim bez Twojej zgody, możesz postępować zgodnie z procedurą opisaną tutaj https://pl.player.fm/legal.

7M ago 33:30

MP3•Źródło odcinka

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

Featured Reference

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References

Self-Rewarding Language Models, Yuan et al 2024
Reinforcement Learning: An Introduction, Sutton and Barto 1992
Learning from Delayed Rewards, Chris Watkins 1989
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

61 odcinków

#Reinforcement Learning #Machine Learning #Robin Ranjit Singh Chauhan #Artificial Intelligence #Tech