The Hands: In October 2024, Anthropic shipped a beta that let Claude click buttons on a sc…

On March 5, 2026, OpenAI launched GPT-5.4 with improved tool calling — the mechanism through which AI models operate software, click buttons, fill forms. The same day, Google expanded its Canvas workspace to every US user, giving AI a creation surface inside Search. A week earlier, Karpathy observed that AI coding agents had made "a huge leap forward since December," completing complex projects with minimal oversight. "Programming," he wrote, "is becoming unrecognizable."

None of these are the same product. But they are the same shift. Seventeen months after Anthropic shipped a beta that could click buttons on a desktop, the entire industry has converged on a single idea: AI needs to stop answering questions and start doing things. The trajectory from that beta to this convergence is the most significant interface change since the smartphone — and the fastest capability convergence in AI history.

The Beta

It started in October 2024.

October 2024

Anthropic releases a new Claude 3.5 Sonnet model that can interact with desktop apps by imitating mouse and keyboard input via a "computer use" API

TechCrunch

Anthropic called it "computer use." Claude 3.5 Sonnet could see a screenshot, identify interface elements, move the cursor, click buttons, and type text. It was slow, it was janky, and it failed often. But it worked. The model wasn't generating text about what to do. It was doing it — operating the computer through the same interface a human uses.

This was a conceptual break from everything that came before. Chatbots answered questions. APIs connected to specific services. Computer use was different: it worked with any software that had a screen. No integration required. No API needed. The graphical interface humans had been using for forty years became, retroactively, an AI-accessible interface too.

The significance wasn't obvious in October 2024. The demos were clunky. Reviewers focused on what it couldn't do. But within weeks, two things became clear. First, Google had been working on the same idea — sources revealed a project codenamed "Jarvis" that would take over a Chrome browser. Second, Project Mariner launched in December 2024, Google DeepMind's prototype web agent. Anthropic wasn't the only lab that had concluded AI needed hands.

The Convergence

2025 was the year every major AI company followed. The timeline is striking for its density.

JAN 2025

OpenAI releases Operator, a "research preview" of an AI agent that can autonomously navigate websites and complete tasks.
Mar 2025 Amazon's AGI Lab unveils Nova Act, a general-purpose AI agent to control web browsers.
Apr 2025 Microsoft adds a "computer use" tool to Copilot Studio, letting AI agents interact with desktop software.
May 2025 OpenAI's Operator exits research preview and becomes generally available.
JUL 2025

OpenAI debuts ChatGPT Agent, which can "control an entire computer and perform multistep tasks." Sources say OpenAI plans to release a full AI web browser.
Aug 2025 Anthropic releases Claude for Chrome, letting Claude take actions directly in the browser.
OCT 2025

Google releases the Gemini 2.5 Computer Use model, purpose-built for operating software.
Nov 2025 Mozilla says it is building AI Window, an opt-in Firefox feature that integrates AI agents.

Anthropic, OpenAI, Google, Amazon, Microsoft, Mozilla. Every major AI company and two of the three major browsers, all arriving at the same capability within twelve months. This wasn't a trend. It was a phase transition. The labs had independently concluded that intelligence without agency was a ceiling — and that the fastest path to agency was teaching AI to use the interfaces that already exist.

The Ecosystem

The corporate convergence was impressive. What happened next was faster.

On January 31, 2026, Peter Steinberger rebranded his open-source AI agent project to OpenClaw. It was the second name change — the project had previously been called Clawdbot, and before that it was a personal experiment. Within ten days, Tencent Cloud, DigitalOcean, and Alibaba Cloud added support. Within two weeks, Google's VirusTotal was scanning its extension marketplace for malware. Within three weeks, Sam Altman personally recruited Steinberger to OpenAI. Meta counter-offered with more money. Steinberger chose OpenAI. OpenClaw remained open source.

What OpenClaw did was turn computer use from a proprietary feature into open-source infrastructure. The labs had built computer use into their own products — Operator for ChatGPT, Claude for Chrome, Gemini Computer Use. Each required using that company's model through that company's interface. OpenClaw was model-agnostic. It ran on any LLM, any computer, any operating system. Karpathy highlighted the emergence of NanoClaw and other "claws" — smaller systems that could run on phones and Raspberry Pis.

The surest sign of adoption was the malware. By February 2, 230 malicious OpenClaw extensions had been identified, posing as crypto tools and utilities. When attackers build for your platform, your platform is real.

By late February, Perplexity had launched Perplexity Computer, billing it as "a general-purpose digital worker." Google released Auto Browse for Chrome. Anthropic shipped Remote Control for Claude Code. The proprietary and open-source tracks were converging into a single capability layer that every AI company offered and every user could access.

Every GUI Is an API

The conventional way to make software AI-accessible is to build an API — a structured interface that lets programs talk to programs. This is how most AI integrations work today. Slack has an API. Salesforce has an API. If you want AI to operate software, you build a connector between the model and the API.

Computer use bypasses all of this.

When AI can see a screen and click buttons, every piece of software with a graphical interface becomes AI-accessible — retroactively, without modification, without permission.

This is what makes computer use different from any previous AI capability. A smarter model answers harder questions. Better embeddings find better documents. Computer use does something categorically different: it makes the entire existing software stack available to AI, including software whose developers never anticipated and may never support AI integration.

The implications run in two directions. For users, it means AI can operate the tools they already use — fill out the insurance form, navigate the government portal, run the report in the legacy system. No migration needed. For software companies, it means their interfaces are now accessible to entities they don't control. The moat of a complex UI — the enterprise software defense of "it takes a human six months to learn this" — dissolves when an AI can see the screen and click the right sequence of buttons.

The early autonomous agents of 2023 — Auto-GPT and BabyAGI — tried to solve this problem through code generation: write a script, execute it, read the output. They failed because the world isn't scriptable. Forms have CAPTCHAs. Websites change layouts. Enterprise software has thirty years of UI accretion. Computer use works because it operates at the same layer humans do — the visual interface — where the complexity is already managed by design.

The Pace

What makes this convergence historically unusual is its speed.

Capability	First Lab	Universal Availability	Time to Convergence
Web search	Microsoft (Feb 2023)	All major labs	~24 months
Image generation	OpenAI DALL-E (Jan 2021)	All major labs	~36 months
Code generation	OpenAI Codex (Aug 2021)	All major labs	~30 months
Computer use	Anthropic (Oct 2024)	All major labs + open source	~17 months

Seventeen months from one lab's beta to an industry standard with open-source alternatives, cloud provider support, a malware ecosystem, and an acqui-hire war. The gap between Anthropic's demo and the industry catching up was the shortest for any major AI capability. The labs weren't surprised by computer use — they were waiting for it.

The Financial Times observed in September 2024 that "AI copilots are evolving into AI agents designed to take actions on behalf of users." That was the theory. What Anthropic shipped a month later was the proof that it could work. Once one lab proved the concept, the others moved immediately — not because they were copying, but because they'd already been building toward the same destination.

What Changed

The chatbot era assumed a division of labor: AI thinks, humans act. You ask a question, get an answer, then go do the thing yourself. Computer use dissolves that boundary. The model doesn't tell you what to click. It clicks.

That's what March 5, 2026 looks like: GPT-5.4 ships with improved tool calling. Google Canvas goes wide. AI coding agents are completing complex projects autonomously. Not chatbot updates. The interface shifting under our feet.

In October 2024, Anthropic gave AI hands. By March 2026, everyone had them. The significance isn't the hands. It's what they can reach — forty years of interfaces designed for people, now an AI's workspace.