2025-09-27
In this way, the RL that AI labs have been doing is more akin to what we call “imitation learning” for humans. The LLM is shown a question and its desired output and must discover the internal actions to take to produce that output
Dwarkesh Podcast
Q&A with reinforcement learning pioneer Richard Sutton on why LLMs are not the path to achieving human intelligence, world models, continual learning, and more
Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson.
Everyone posting about the Dwarkesh interview (including Dwarkesh himself!) is missing this subtle point. When LLMs imitate, they imitate the ACTION (ie the token prediction to produce the sequence). When humans imitate, they imitate the OUTPUT but must discover the action
Dwarkesh Podcast
Q&A with reinforcement learning pioneer Richard Sutton on why LLMs are not the path to achieving human intelligence, world models, continual learning, and more
Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson.
2024-05-09
@thesteinegger @MoAlQuraishi They did give a pretty detailed breakdown of the model and loss function in the supplementary information (about as much info as was given for AlphaFold2), so an open source reproduction should be possible, assuming access to enough compute
Financial Times
Google DeepMind and Isomorphic Labs detail AlphaFold 3, an AI model to predict interactions and structures of proteins, DNA, RNA, more, beating many top methods
DeepMind adds a diffusion engine to latest protein-folding software … Glyn Moody / @glynmoody@mastodon.social : AlphaFold 3 predicts the structure and interactions of all of life's...