At GTC 2026, Jensen Huang doubled Nvidia's AI chip forecast: $500 billion in cumulative sales through 2026 became $1 trillion through 2027. One additional year. An equal amount of revenue. All prior years combined, matched in a single year. But the product that shared the keynote stage tells you something the number alone doesn't. Nvidia's headline new hardware was the Groq 3 LPX — a rack-scale inference server. Not a bigger training chip. An inference machine. The trillion-dollar thesis has a new engine, and it isn't training.
The Double
The math is worth isolating. Nvidia's previous forecast, issued roughly a year ago, projected $500 billion in total AI chip revenue from the start of the boom through the end of 2026 — spanning roughly four years of the training era. The new forecast adds $500 billion in a single year.
This is not a linear extrapolation. It's a doubling that implies a phase change in demand. Four years of building AI models produced $500 billion in chip sales. One year of something else is expected to produce the same.
The Product
GTC announcements always include next-generation training hardware — this year's Vera Rubin architecture continues that tradition. But the product that arrived with a pre-conference Wall Street Journal exclusive was the Groq 3 LPX: a rack of 256 inference processing units with 128GB of SRAM and 40 petabytes per second of memory bandwidth. Available in the second half of 2026. The Journal noted that OpenAI is a customer.
Inference hardware has a different job than training hardware. Training builds the model — a one-time computation per model version, measured in weeks on thousands of GPUs. Inference runs the model — a computation that happens every time a user sends a query, an agent executes a step, or a product calls an API. Training is construction. Inference is electricity.
Nvidia has been moving toward inference for years. In 2024, it launched NIM, a microservices platform for deploying models. In January 2025, it released Inference Microservices for enterprise deployment. By March 2025, GTC's focus was already shifting to address inference demand — and the Financial Times reported that Cerebras, Groq, and Big Tech were targeting AI inference specifically to challenge Nvidia's dominance.
GTC 2026 is the conference where that shift became the headline. The Groq 3 LPX isn't an add-on to a training announcement. It's the product that explains where the extra $500 billion comes from.
The Demand
The same day Jensen forecast a trillion dollars in chip sales, OpenAI launched GPT-5.4 mini and nano — models aimed at agents, coding, and multimodal workflows, running more than twice as fast as their predecessors at a fraction of the cost. Mistral released Small 4, its first model to unify reasoning and coding in a small form factor. Nathan Lambert at Interconnects argued that open model providers will lose if they keep chasing closed frontier models — and should instead position as inference tools.
The direction is unanimous. The models that customers want for production deployment — agents that execute tasks, code tools that run continuously, APIs that respond in milliseconds — are small, fast, and cheap to run. They are optimized for inference, not training. And they need to run billions of times.
This is not a contradiction with the trillion-dollar forecast. It's the explanation for it. A single training run of a frontier model uses thousands of GPUs for weeks. But deploying mini and nano models to millions of users, each sending dozens of queries per day, uses more total compute than any training run. The models are getting smaller. The aggregate demand for running them is getting larger. That gap is where the second $500 billion lives.
The Stack
Three executives gave three numbers on the same day, and together they describe the AI infrastructure stack from bottom to top.
Jensen Huang: $1 trillion in chip sales through 2027. The hardware layer.
Andy Jassy, at an Amazon all-hands: AI will help AWS reach $600 billion in annual revenue by 2036 — double his prior estimate, up from $128.7 billion in 2025. The cloud layer.
And Apple, via Horace Dediu at Asymco: $14 billion in 2026 capex while hyperscalers spend a combined $650 billion, or 90% of their cash flow. The application layer. Apple's theory, as this series has tracked, is that you don't need to build the infrastructure when the models running on it are becoming commodities you can buy via API.
In the training era, these three positions seemed contradictory — either you need massive infrastructure or you don't. In the inference era, all three can be right simultaneously. Nvidia sells the chips. AWS runs the cloud. Apple pays per query. The question isn't who's correct. It's where the margins concentrate.
The Meter
The first half trillion in AI infrastructure spending was a construction project. Companies were building — models, data centers, GPU clusters, training pipelines. The market sold off when the bills came due because construction spending is finite. You build it and then it's done.
The second five hundred billion is something different. It's a utility bill. Inference compute recurs every time someone talks to an AI, every time an agent takes an action, every time a model processes an image or writes a line of code. The meter is always running. And as AI embeds into more products, more workflows, more devices — the meter runs faster.
Jensen didn't just double a number. He revealed that the AI hardware market is transitioning from a construction cycle to a utility cycle. Construction ends. Utilities don't. That's where a single year matches all prior years combined — and where the year after that could be larger still.