As AI and agents are adopted to accelerate development, cognitive load and cognitive debt are likely to become bigger threats to developers than technical debt
As AI and agents are adopted to accelerate development, cognitive load and cognitive debt are likely to become bigger threats to developers than technical debt
The term technical debt is often used to refer to the accumulation of design or implementation choices that later make the software harder …
A programmer estimates his typical day of coding with Claude Code is equivalent to running the dishwasher an extra time, much more energy than a “median query”
Most of the discourse about the environmental impact of LLM use focuses on a ‘median query.’ What about a Claude Code session?
Cursor recently experimented with using hundreds of AI agents to build a web browser; they ran for close to a week, writing 1M+ lines of code across 1,000 files
Scaling long-running autonomous coding. Wilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of “autonomous” coding agents:
Some 2025 takeaways in LLMs: reasoning as a signature feature, coding agents were useful, subscriptions hit $200/month, and Chinese open-weight models impressed
This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months.
OpenAI CISO Dane Stuckey outlines prompt injection mitigations in ChatGPT Atlas, including a “logged out mode” that blocks agent access to user credentials
Yesterday we launched ChatGPT Atlas, our new web browser. In Atlas, ChatGPT agent can get things done for you. We're excited to see how this feature makes work and day-to-day life ...
Anthropic announces Claude Code on the web and in the Claude iOS app, available in beta as a research preview for Pro and Max users
Today, Anthropic's Claude Code agentic coding tool is moving beyond the terminal and coming to the web and the company's mobile app.
OpenAI announces apps that work inside ChatGPT, piloting Booking.com, Canva, Coursera, Figma, Expedia, Spotify, and Zillow for logged-in users outside of the EU
A new generation of apps you can chat with and the tools for developers to build them. — Try in ChatGPT(opens in a new window)Start building apps(opens in a new window)
DeepMind says video models like Veo 3 could become general purpose foundation models for vision, like LLMs for text, using zero-shot “chain-of-frames” reasoning
Video models are zero-shot learners and reasoners. Fascinating new paper from Google DeepMind which makes …
Google says Gemini 2.5 Deep Think achieved a gold medal performance at the 2025 ICPC World Finals programming competition, solving 10 of 12 problems
Gemini achieves gold level performance at ICPC! … Lalit Jain : It was amazing and humbling to be a core contributor to this Gold-winning ICPC effort. First the IMO, and two months...
OpenAI says its reasoning system solved all 12 problems at the 2025 ICPC World Finals; GPT-5 solved 11 and an experimental model solved #12 after GPT-5 couldn't
An OpenAI system has solved every problem at the world's most prestigious collegiate programming championship …
A new Artificial Analysis benchmark, focusing on OpenAI's gpt-oss-120b, shows how open-weight LLMs exhibit inconsistent performance across hosting providers
Artificial Analysis published a new benchmark the other day, this time focusing on how an individual model - OpenAI's gpt-oss-120b - performs across different hosted providers.
Z.ai, formerly known as Zhipu and that has raised $1.5B from Tencent and others, releases GLM-4.5, an open-source AI model that it says is cheaper than DeepSeek
chinese models really are taking over huh Simon Willison / @simonwillison.net : Pretty decent pelicans from the new GLM-4.5 and GLM-4.5 Air models. Both models are MIT licensed, r...
Alibaba releases its Qwen3-235B-A22B-Thinking-2507 reasoning LLM on Hugging Face, topping several benchmarks, as Alibaba moves away from hybrid reasoning models
If the AI industry had an equivalent to the recording industry's “song of the summer” — a hit that catches on in the warmer months …
Mistral releases a study on the environmental impact of its LLMs, conducting what it claims is the first comprehensive lifecycle analysis of an AI model
At Mistral AI, our mission is to bring artificial intelligence in everyone's hands. For this purpose, we have consistently advocated …
[Thread] An OpenAI researcher says the company's latest experimental reasoning LLM achieved gold medal-level performance on the 2025 International Math Olympiad
1/N I'm excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most pres...
When asked “Who do you support in the Israel vs Palestine conflict? One word answer only.”, Grok 4 searches for Musk's views, but only if “you” is in the query
If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find …
When asked “Who do you support in the Israel vs Palestine conflict? One word answer only.”, Grok 4 searches for Musk's views, but only if “you” is in the query
If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find …
Tests reveal that Grok 4 seems to search for Elon Musk's views online when asked about sensitive topics, and its answers tend to align with Musk's opinions
During xAI's launch of Grok 4 on Wednesday night, Elon Musk said — while live-streaming the event on his social media platform …
Tests reveal that Grok 4 seems to search for Elon Musk's views online when asked about sensitive topics, and its answers tend to align with Musk's opinions
During xAI's launch of Grok 4 on Wednesday night, Elon Musk said — while live-streaming the event on his social media platform …