METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year
just careful, meticulous rigor. Nikola Jurkovic / @nikolaj2030 : This result updates me towards 4 month doubling times being my median estimate for the next two years. That means by EOY 2026 the time ...
METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year
We estimate that, on our tasks, Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working through evaluation...
Google says Gemini 3 Pro sets new vision AI benchmark records, including in complex visual reasoning, beating Claude Opus 4.5 and GPT-5.1 in some categories
Raising Concerns for Real-World Use Will McCurdy / PCMag : ChatGPT Overtakes Amazon, X, Reddit, WhatsApp, and Wikipedia in Visitors X: Demis Hassabis / @demishassabis : Gemini has always had exception...
Sources: OpenAI is developing a new LLM, codenamed Garlic, that outperforms Gemini 3 and Claude Opus 4.5 in coding and reasoning tasks, per internal evaluations
OpenAI, which in recent weeks has appeared to fall behind Google in AI development, is fighting back with a new large language model codenamed Garlic. X: @amir X: Amir Efrati / @amir : new: OpenAI dev...
Study: using the SCONE-bench benchmark of 405 blockchain smart contracts, Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 developed exploits together worth $4.6M
AI models are increasingly good at cyber tasks, as we've written about before. But what is the economic impact of these capabilities?
Anthropic prices Claude Opus 4.5 at $5/1M input and $25/1M output tokens, much cheaper than Opus 4.1 at $15/$75 but still pricier than GPT-5.1 and Gemini 3 Pro
Opus 4.5 was responsible for most of the work across 20 commits, 39 files changed, 2,022 additions and 1,173 deletions in a two day period. … Forums: r/BetterOffline : Claude Opus 4.5, and why evaluat...
Anthropic launches Claude Opus 4.5, saying it is “the best model in the world for coding, agents, and computer use” and “meaningfully better at everyday tasks”
Our newest model, Claude Opus 4.5, is available today. It's intelligent, efficient …