METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year
just careful, meticulous rigor. Nikola Jurkovic / @nikolaj2030 : This result updates me towards 4 month doubling times being my median estimate for the next two years. That means by EOY 2026 the time ...
Sources: Trump's team recently discussed letting Nvidia sell H200 chips to China, a major departure from the administration's earlier public stances
If It Happens Emily Jarvie / Proactive : US weighs allowing Nvidia to sell H200 chips to China, sources say Ukrainian National News : Trump's team considers allowing Nvidia H200 chip sales to China Ca...
OpenAI says its new o3 and o4-mini AI models hallucinate more often than its previous reasoning and traditional models, and the company doesn't know why
OpenAI's internal tests show o3 hallucinated on 33% of person-related questions, double the rate of previous models. Even worse, o4-mini hit 48%. Mastodon: Aulia Masna / @aulia@mementomori.social : “...
The Arc Prize Foundation says its new ARC-AGI-2 test stumps most AI models; humans get 60% of the questions right but GPT-4.5 and Claude 3.7 Sonnet score ~1%
[image] François Chollet / @fchollet : Unlike ARC-AGI-1, this new version is not easily brute-forced. Current top AI approaches score 0-4%. All base LLMs (GPT-4.5, Claude 3.7 Sonnet, Gemini 2, etc.)...
OpenAI calls DeepSeek “state-controlled” and recommends that the US ban “PRC-produced equipment and models that violate user privacy and create security risks”
https://techcrunch.com/... Threads: Vishvanand Subramanian / @vishvanands : trying hard to steelman this position from openai but unless it's possible to hide malware in the model weights, what exactl...
Anthropic's Claude 3.7 Sonnet reportedly cost “a few tens of millions of dollars” to train, similar to Claude 3.5 and cheaper than GPT-4, which cost over $100M
“Assuming Claude 3.7 Sonnet indeed cost just ‘a few tens of millions of dollars’ to train, not factoring in related expenses, it's a sign of how relatively cheap it's becoming to release state-of-the-...
Anthropic releases Claude 3.7 Sonnet, a hybrid model that can produce fast responses or extended, step-by-step thinking, and Claude Code, an agentic coding tool
and it could be a game changer Ghacks : Anthropic Unveils Claude 3.7: First Hybrid Reasoning AI Model Rowan Cheung / The Rundown AI : Claude enters the reasoning era Siddharth Jindal / Analytics India...