fchollet · TEXXR

I gained a lot of respect for Dario for being principled on the issues of mass surveillance and autonomous killbots. Principled leaders are rare these days

2026-02-27 View on X

Anthropic

Dario Amodei says Anthropic cannot “in good conscience” accede to DOD's request to remove safeguards and will work to ensure a smooth transition if offboarded

I believe deeply in the existential importance of using AI to defend the United States and other democracies, and to defeat our autocratic adversaries.

View original

I gained a lot of respect for Dario for being principled on the issues of mass surveillance and autonomous killbots. Principled leaders are rare these days

2026-02-27 View on X

Axios

President Trump calls Anthropic a “radical left, woke company” and says he is directing every federal agency in the US to stop using its products

The Trump administration has decided to blacklist Anthropic in the most consequential and controversial policy decision to date …

View original

“Any software engineer can now produce much more value than before” “No one will want to hire software engineers” Both of these cannot be true at the same time.

2026-02-25 View on X

Financial Times

An Evercore ISI economist criticizes Citrini's AI report, calling its assumptions “extreme and improbable”, but says it's a thought-provoking exercise

View original

“Any software engineer can now produce much more value than before” “No one will want to hire software engineers” Both of these cannot be true at the same time.

2026-02-24 View on X

Financial Times

An Evercore ISI economist criticizes Citrini's AI report, calling its assumptions “extreme and improbable”, but says it's a thought-provoking exercise

Most of the sell-side has remained hilariously silent at a mere Substacker seemingly shaking markets, even though the anguish and frustration is almost palpable.

View original

The new Gemini Deep Think is achieving some truly incredible numbers on ARC-AGI-2. We certified these scores in the past few days. [image]

2026-02-13 View on X

The Keyword

Google updates Gemini 3 Deep Think to better solve modern science, research, and engineering challenges and expands it via the Gemini API to some researchers

Our most specialized reasoning mode is now updated to solve modern science, research and engineering challenges.

View original

The new Gemini Deep Think is achieving some truly incredible numbers on ARC-AGI-2. We certified these scores in the past few days. [image]

2026-02-12 View on X

The Keyword

Google updates Gemini 3 Deep Think to better solve modern science, research, and engineering challenges and expands it via the Gemini API to some researchers

Our most specialized reasoning mode is now updated to solve modern science, research and engineering challenges.

View original

Gemini 3 scores 31.1% on ARC-AGI-2. Impressive progress.

2025-11-18 View on X

The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

View original

Impressive work.

2025-10-09 View on X

VentureBeat

Samsung introduces the Tiny Recursion Model, a 7M-parameter model that can outperform LLMs 10,000x larger, like Gemini 2.5 Pro and o3-mini, on specific problems

The trend of AI researchers developing new, small open source generative models that outperform far larger …

View original

Grok 4 is still state-of-the-art on ARC-AGI-2 among frontier models. 15.9% for Grok 4 vs 9.9% for GPT-5. [image]

2025-08-08 View on X

Simon Willison's Weblog

GPT-5 hands-on: it exudes competence but doesn't feel like a dramatic leap ahead of other LLMs, and the pricing is aggressively competitive with other providers

And It Changes Everything Tyler Cowen / Marginal Revolution : GPT-5, a short and enthusiastic review GPT-5 : GPT-5 — Our hands-on review of OpenAI's newest model based on weeks o...

View original

GPT-5 results on ARC-AGI 1 & 2! Top line: 65.7% on ARC-AGI-1 9.9% on ARC-AGI-2

2025-08-08 View on X

Simon Willison's Weblog

GPT-5 hands-on: it exudes competence but doesn't feel like a dramatic leap ahead of other LLMs, and the pricing is aggressively competitive with other providers

And It Changes Everything Tyler Cowen / Marginal Revolution : GPT-5, a short and enthusiastic review GPT-5 : GPT-5 — Our hands-on review of OpenAI's newest model based on weeks o...

View original

Grok 4 is still state-of-the-art on ARC-AGI-2 among frontier models. 15.9% for Grok 4 vs 9.9% for GPT-5. [image]

2025-08-08 View on X

VentureBeat

OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard

After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs) …

View original

GPT-5 results on ARC-AGI 1 & 2! Top line: 65.7% on ARC-AGI-1 9.9% on ARC-AGI-2

2025-08-08 View on X

VentureBeat

OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard

After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs) …

View original

Unlike ARC-AGI-1, this new version is not easily brute-forced. Current top AI approaches score 0-4%. All base LLMs (GPT-4.5, Claude 3.7 Sonnet, Gemini 2, etc.) score 0%. Single-CoT reasoning models (Claude Thinking, R1, o3-mini...) score 0-1%. So you can't solve these tasks via memorization alone. You need the ability to recombine concepts on the fly - you need test-time adaptation...

2025-03-26 View on X

TechCrunch

The Arc Prize Foundation says its new ARC-AGI-2 test stumps most AI models; humans get 60% of the questions right but GPT-4.5 and Claude 3.7 Sonnet score ~1%

[image] François Chollet / @fchollet : Unlike ARC-AGI-1, this new version is not easily brute-forced. Current top AI approaches score 0-4%. All base LLMs (GPT-4.5, Claude 3.7 Son...

View original

Today, we're releasing ARC-AGI-2. It's an AI benchmark designed to measure general fluid intelligence, not memorized skills - a set of never-seen-before tasks that humans find easy, but current AI struggles with. It keeps the same format as ARC-AGI-1, while significantly increasing the signal strength it provides about a system's actual fluid intelligence...

2025-03-26 View on X

TechCrunch

The Arc Prize Foundation says its new ARC-AGI-2 test stumps most AI models; humans get 60% of the questions right but GPT-4.5 and Claude 3.7 Sonnet score ~1%

[image] François Chollet / @fchollet : Unlike ARC-AGI-1, this new version is not easily brute-forced. Current top AI approaches score 0-4%. All base LLMs (GPT-4.5, Claude 3.7 Son...

View original

The key is really this: AI usefulness scales logarithmically with inference time compute. Right now for many use cases the amount of compute you need to operate at human-level is such that AI isn't economically viable for that use case. The more compute efficient AI gets, the more use cases start becoming economically viable, the more we'll deploy AI, and the more compute we'll need. Quote Satya Nadella @saty

2025-01-28 View on X

Bloomberg

Sam Altman says DeepSeek's R1 is an “impressive model, particularly around what they're able to deliver for the price” and OpenAI “will pull up some releases”

OpenAI Chief Executive Officer Sam Altman welcomed the debut of DeepSeek's R1 model in a post on X late on Monday.

View original

The key is really this: AI usefulness scales logarithmically with inference time compute. Right now for many use cases the amount of compute you need to operate at human-level is such that AI isn't economically viable for that use case. The more compute efficient AI gets, the more use cases start becoming economically viable, the more we'll deploy AI, and the more compute we'll need. Quote Satya Nadella @saty

2025-01-28 View on X

CNBC

Nvidia calls DeepSeek's work “an excellent AI advancement”, reiterating “inference requires significant numbers of Nvidia GPUs and high-performance networking”

Nvidia called DeepSeek's R1 model “an excellent AI advancement,” despite the Chinese startup's emergence causing …

View original

I'm joining forces with @mikeknoop to start Ndea (@ndeainc), a new AI lab. Our focus: deep learning-guided program synthesis. We're betting on a different path to build AI capable of true invention, adaptation, and innovation. [image]

2025-01-16 View on X

TechCrunch

AI researcher François Chollet and Zapier co-founder Mike Knoop launch Ndea, an AI research and science lab focused on “developing and operationalizing AGI”

François Chollet, an influential AI researcher, is launching a new startup that aims to build frontier AI systems with novel designs.

View original

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task [image]

2024-12-22 View on X

TechCrunch

OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025

12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI m...

View original

So, is this AGI? While the new model is very impressive and represents a big milestone on the way towards AGI, I don't believe this is AGI — there's still a fair number of very easy ARC-AGI-1 tasks that o3 can't solve, and we have early indications that ARC-AGI-2 will remain extremely challenging for o3.

2024-12-22 View on X

TechCrunch

OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025

12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI m...

View original

One very important thing to understand about the future: the economics of AI are about to change completely. We'll soon be in a world where you can turn test-time compute into competence — for the first time in the history of software, marginal cost will become critical.

2024-12-22 View on X

TechCrunch

OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025

12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI m...

View original