OpenAI says GPT‑5.2 Thinking beats or ties industry professionals on 70.9% of GDPval knowledge work tasks, delivering outputs at >11x the speed and <1% the cost
OpenAI eyes January exit from “code red” John Werner / Forbes : The Wonder And The Promise Of GPT 5.2 Is Here Benj Edwards / Ars Technica : OpenAI releases GPT-5.2 after “code red” Google threat alert...
OpenAI launches AgentKit, a toolkit for building and deploying AI agents, including Agent Builder, which Sam Altman described as like Canva for building agents
New tools for building, deploying, and optimizing agents. NDTV Profit : What Is AI Agent Builder And How Does It Work? OpenAI Launches New Set Of Tools For Developers Aman Gupta / Livemint : OpenAI ag...
GPT-5 hands-on: it exudes competence but doesn't feel like a dramatic leap ahead of other LLMs, and the pricing is aggressively competitive with other providers
And It Changes Everything Tyler Cowen / Marginal Revolution : GPT-5, a short and enthusiastic review GPT-5 : GPT-5 — Our hands-on review of OpenAI's newest model based on weeks of testing — The Ve...
OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2; the smaller gpt-oss-20b can run locally on a device with 16GB+ of RAM
gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models Simon Willison / Simon Willison's Weblog : OpenAI's new open weight (Apache 2) models are really good OpenAI on GitHub : ...
Composio, which provides tools to help companies build AI agents, raised a $25M Series A led by Lightspeed, taking its total funding to $29M
we are responsible with our money https://x.com/... Malay Vasa / @malayvasa : Earlier this month I joined @composiohq, where we're building the skill layer for AI agents so they can continuously learn...
Anthropic releases Claude 3.7 Sonnet, a hybrid model that can produce fast responses or extended, step-by-step thinking, and Claude Code, an agentic coding tool
and it could be a game changer Ghacks : Anthropic Unveils Claude 3.7: First Hybrid Reasoning AI Model Rowan Cheung / The Rundown AI : Claude enters the reasoning era Siddharth Jindal / Analytics India...
OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025
12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI model to tackle compl...
A postmortem of HyperWrite's Reflection 70B model blames “a bug in the initial code for benchmarking”, after evaluators couldn't reproduce some claimed results
On September 5th, 2024, Matt Shumer, co-founder and CEO of the startup Hyperwrite AI (also known as OthersideAI) …
Matt Shumer, who was accused of fraud over HyperWrite's 70B-parameter AI model, says he “got ahead” of himself but doesn't explain why his model underperformed
Matt Shumer, co-founder and CEO of OthersideAI, also known as its signature AI assistant writing product HyperWrite …
HyperWrite's 70B parameter AI model, Reflection, has its performance questioned, after CEO Matt Shumer said something about its upload to Hugging Face was off
something got fucked up during the upload process. Will fix today. Forums: r/LocalLLaMA : Smh: Reflection was too good to be true - reference article