Research: AI's ability to complete lengthy software engineering tasks has doubled roughly every six months, but there is a “messiness tax” for real-world tasks
METR has had a very influential work by Kwa and West et al on measuring AI's ability to complete long tasks. X: @kirillzzy , @boazbaraktcs , @benshindel , @jasonfurman , @jasonfurman , and @sama X: Ki...
OpenAI launches AgentKit, a toolkit for building and deploying AI agents, including Agent Builder, which Sam Altman described as like Canva for building agents
New tools for building, deploying, and optimizing agents. NDTV Profit : What Is AI Agent Builder And How Does It Work? OpenAI Launches New Set Of Tools For Developers Aman Gupta / Livemint : OpenAI ag...
GPT-5 hands-on: it exudes competence but doesn't feel like a dramatic leap ahead of other LLMs, and the pricing is aggressively competitive with other providers
And It Changes Everything Tyler Cowen / Marginal Revolution : GPT-5, a short and enthusiastic review GPT-5 : GPT-5 — Our hands-on review of OpenAI's newest model based on weeks of testing — The Ve...
Margit Wennmachers, who has shaped a16z's marketing approach since joining in 2010, is stepping down and “graduating” from operating partner to partner emeritus
they're hiring some talented and influential people. Gina Bianchini / @ginab : @wennmachers @a16z You were impressive the first time Liz Hamren introduced us. You seemed like a natural introduction to...
Anthropic plans to debut new rate limits for Claude Pro and Max on August 28, likely curbing <5% of users, saying some run Code “continuously in the background”
It's Bad Business Bluesky: Ed Zitron / @edzitron.com : That is not what is happening here, this is not Anthropic “doing the drug dealer model” — www.wheresyoured.at/anthropic- is... [embedded post]...
Apple announces visionOS 26 with spatial widgets, including Clock, Weather, Music, and Photos, all-new Personas, Spatial scenes powered by AI, and more
visionOS 26 sets the stage for killer smart glasses Ben Lang / Road to VR : Vision Pro Will Allow ‘Optional’ and ‘Required’ Designations for Apps Using Motion Controllers David Heaney / UploadVR : vis...
A look at The Technology Brothers Podcast Network, a daily tech news talk show that has captured the attention of Silicon Valley's investors and founders
In recent months, John Coogan and Jordi Hays, co-hosts of “TBPN,” a daily tech news talk show that has captured the attention … X: @willmanidis , @dwr , @manasjsaloi , @thefrandawg , @aginnt , @cjgbe...
In a video and a letter signed “Sam & Jony”, Altman and Ive say io, founded in 2024 by Ive, Scott Cannon, Evans Hankey, and Tang Tan will develop new products
Update! It's this: — openai.com/sam-and-jony/ Emma Jacobs / @emmavj : Your dad introducing his new partner — openai.com/sam-and-jony/ Kate Knibbs / @knibbs : but what are they actually making????...
OpenAI's o3-mini costs $1.10 per 1M input tokens and $4.40 per 1M output tokens, cheaper than GPT-4o, which costs $2.50 and $10, and o1, which costs $15 and $60
Simon Willison's Weblog : X: @simonw and @daniel_mac8 . Forums: Hacker News X: Simon Willison / @simonw : Has anyone seen anything interesting done with that increased output limit yet? o1 has 100,00...
OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier
A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't worth its $200-per-mo...