Braintrust, which helps companies evaluate and monitor their AI tools' performance, raised an $80M Series B led by Iconiq at an $800M post-money valuation
Braintrust, which helps companies evaluate and monitor their AI tools' performance, raised an $80M Series B led by Iconiq at an $800M post-money valuation
Braintrust, a startup building AI observability and evaluation tools, raised an $80 million Series B led by Iconiq …
Thoughts on AI progress and why AI labs' actions hint at a worldview in which AI models will continue to fare poorly at generalization and on-the-job learning
Why I'm moderately bearish in the short term, and explosively bullish in the long term — What are we scaling? X: @sriramk , @_simonsmith , @dwarkesh_sp , @emollick , @dwarkesh_sp...
Trump signs an EO establishing the Genesis Mission to boost AI innovation, including by using federal scientific datasets to train models and create AI agents
President Donald Trump on Monday signed an executive order to launch a government-wide effort to build an integrated artificial …
Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond
Google today announced Gemini 3 with the goal of bringing “any idea to life.” The first model available in this family …
Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”
The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’
Samsung introduces the Tiny Recursion Model, a 7M-parameter model that can outperform LLMs 10,000x larger, like Gemini 2.5 Pro and o3-mini, on specific problems
The trend of AI researchers developing new, small open source generative models that outperform far larger …
GPT-5 hands-on: it exudes competence but doesn't feel like a dramatic leap ahead of other LLMs, and the pricing is aggressively competitive with other providers
And It Changes Everything Tyler Cowen / Marginal Revolution : GPT-5, a short and enthusiastic review GPT-5 : GPT-5 — Our hands-on review of OpenAI's newest model based on weeks o...
OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard
After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs) …
Google unveils benchmarking platform Kaggle Game Arena, where LLMs compete head-to-head in strategic games, starting with a chess tournament from August 5 to 7
Watch models compete in complex games providing a verifiable and dynamic measure of their capabilities. Kaggle : Chess Text Input Leaderboard Nick Bild / Hackster : Shall We Play a...
xAI introduces Grok 4, trained on its Colossus supercomputer, with multimodal features, faster reasoning, Grok 4 Voice, Grok 4 Code, a new interface, and more
Deeper thinking and greater reasoning is promised — An hour after the live stream was supposed to start last night (July 9) …
Anthropic releases new API features for building agents: a code execution tool, an MCP connector, a Files API, and extended prompt caching, all in public beta
Today, we're announcing four new capabilities on the Anthropic API that enable developers to build more powerful AI agents …
OpenAI updates its Responses API for building agentic applications to include remote MCP server support, image generation and Code Interpreter tools, and more
OpenAI is rolling out a set of significant updates to its newish Responses API, aiming to make it easier for developers and enterprises …
Many AI features, like Gmail's AI assistant, feel useless because they don't allow users to edit system prompts, constraining the AI models they're built with
Millions Of Email Users Now At Risk Of Attack Mastodon: Dare Obasanjo / @carnage4life@mas.to : This blog captures my frustration with AI tools for work. Microsoft and Google are p...
The Arc Prize Foundation says its new ARC-AGI-2 test stumps most AI models; humans get 60% of the questions right but GPT-4.5 and Claude 3.7 Sonnet score ~1%
[image] François Chollet / @fchollet : Unlike ARC-AGI-1, this new version is not easily brute-forced. Current top AI approaches score 0-4%. All base LLMs (GPT-4.5, Claude 3.7 Son...
Sam Altman says GPT-5 will include o3, which is no longer set to ship as a standalone model, GPT-4.5 will be OpenAI's last non-chain-of-thought model, and more
OpenAI has effectively canceled the release of o3, which was slated to be the company's next major AI model …
AI researcher François Chollet and Zapier co-founder Mike Knoop launch Ndea, an AI research and science lab focused on “developing and operationalizing AGI”
François Chollet, an influential AI researcher, is launching a new startup that aims to build frontier AI systems with novel designs.
OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025
12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI m...
OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025
OpenAI announced its new o3 models on Friday. — In a tweet ahead of its final livestream for its …
California Governor Gavin Newsom vetoes AI safety bill SB 1047, saying it applies only to large AI models and doesn't account for if deployment is high risk
Governor seeks more encompassing rules than the bill opposed by OpenAI, Meta and supported by research scientists