2025-11-18
Gemini 3 Pro (preview) scores 91% on VPCT (spatial reasoning) Uhhhh jesus christ [image]
9to5Google
Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond
Google today announced Gemini 3 with the goal of bringing “any idea to life.” The first model available in this family …
Gemini 3 Pro (preview) scores 91% on VPCT (spatial reasoning) Uhhhh jesus christ [image]
The Verge
Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”
The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’
2025-04-08
This is the first time for any major LLM that I'm genuinely thinking they just straight up trained on the benchmark answers for the mainline benchmarks Llama 4 is failing spectacularly on like every 3rd party bench i've seen
TechCrunch
Meta VP of Generative AI Ahmad Al-Dahle denies a rumor that the company trained Llama 4 Maverick and Scout on test sets, saying that Meta “would never do that”
but the EU doesn't get everything Pascale Davies / Euronews : From a political shift to a more powerful AI: Everything to know about Meta's Llama 4 models Jay Bonggolto / Android C...
This is the first time for any major LLM that I'm genuinely thinking they just straight up trained on the benchmark answers for the mainline benchmarks Llama 4 is failing spectacularly on like every 3rd party bench i've seen
The Verge
LMArena says it is updating its leaderboard policies after a Llama 4 Maverick version, which Meta said in fine print is not public, secured the number two spot
With Llama 4, Meta fudged benchmarks to appear as though its new AI model is better than the competition.