2025-12-20
“In 2025, Reinforcement Learning from Verifiable Rewards (RLVR) emerged as the de facto new major stage” “Supervision bits-wise, human neural nets are optimized for survival ... but LLM neural nets are optimized for imitating humanity's text”
karpathy
2025 LLM Year in Review: shift toward RLVR, Claude Code emerged as the first convincing example of an LLM agent, Nano Banana was paradigm shifting, and more
Andrej Karpathy / karpathy :