chris_hayduk1 · TEXXR

2025-09-27

In this way, the RL that AI labs have been doing is more akin to what we call “imitation learning” for humans. The LLM is shown a question and its desired output and must discover the internal actions to take to produce that output

2025-09-27 View on X

Dwarkesh Podcast

Q&A with reinforcement learning pioneer Richard Sutton on why LLMs are not the path to achieving human intelligence, world models, continual learning, and more

Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson.

View original

Everyone posting about the Dwarkesh interview (including Dwarkesh himself!) is missing this subtle point. When LLMs imitate, they imitate the ACTION (ie the token prediction to produce the sequence). When humans imitate, they imitate the OUTPUT but must discover the action

2025-09-27 View on X

Dwarkesh Podcast

Q&A with reinforcement learning pioneer Richard Sutton on why LLMs are not the path to achieving human intelligence, world models, continual learning, and more

Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson.

View original

2024-05-09

@thesteinegger @MoAlQuraishi They did give a pretty detailed breakdown of the model and loss function in the supplementary information (about as much info as was given for AlphaFold2), so an open source reproduction should be possible, assuming access to enough compute

2024-05-09 View on X

Financial Times

Google DeepMind and Isomorphic Labs detail AlphaFold 3, an AI model to predict interactions and structures of proteins, DNA, RNA, more, beating many top methods

DeepMind adds a diffusion engine to latest protein-folding software … Glyn Moody / @glynmoody@mastodon.social : AlphaFold 3 predicts the structure and interactions of all of life's...

View original