lmarena_ai · TEXXR

🚨 Leaderboard Disrupted! Grok-4-fast by @xAI has arrived in the Arena, and it's shaking things up! ⚡️ 🏆 #1 on the Search Leaderboard Tested under the codename “menlo,” Grok-4-fast-search just rocketed to the top spot with the community. 💠 Tied for #8 on the Text Leaderboard [image]

2025-09-20 View on X

xAI

xAI launches Grok 4 Fast, a multimodal model with a 2M context window and a unified architecture that combines reasoning and non-reasoning modes

Pushing the Frontier of Cost-Efficient Intelligence — We're thrilled to present Grok 4 Fast, our latest advancement in cost-efficient reasoning models.

View original

Grok-4-fast shows strength in key categories: 🔸 Ranks #2 in Multi-Turn 🔸 Tied for #3 in Coding 🔸 Tied for #3 in Longer Query The competition just keeps heating up 🔥 Check out the full leaderboard details here: https://lmarena.ai/... [image]

2025-09-20 View on X

xAI

xAI launches Grok 4 Fast, a multimodal model with a 2M context window and a unified architecture that combines reasoning and non-reasoning modes

Pushing the Frontier of Cost-Efficient Intelligence — We're thrilled to present Grok 4 Fast, our latest advancement in cost-efficient reasoning models.

View original

🚨Text Leaderboard Update: A new model provider, @MicrosoftAI has broken into the Top 15 this week! 💠MAI-1-preview by @MicrosoftAI debuts at #13. Congrats to the Microsoft AI team! As the Text Arena is one of the most competitive races, breaking into the Top 15 is no small [image]

2025-08-29 View on X

Semafor

Microsoft unveils MAI-Voice-1, a speech model that can generate a full minute of audio in under a second on a single GPU, and a text model called MAI-1-preview

On Thursday, Microsoft announced two powerful AI models it built that it says perform at the level of the world's top offerings …

View original

🚨🍌Big Reveal: who was “Nano Banana?” The anonymous model, “nano-banana,” that caught the world's attention with its ability to follow complex instructions, preserve character identity, and maintain contextual details was: Gemini-2.5-Flash-Image-Preview by @GoogleDeepMind 🍌✨ [video]

2025-08-26 View on X

TechCrunch

Google says it is behind the viral “nano-banana” image model and launches it as Gemini 2.5 Flash Image with finer edit controls in the Gemini app, API, and more

Google is upgrading its Gemini chatbot with a new AI image model that gives users finer control over editing photos …

View original

🚨Breaking: New Gemini-2.5-Pro (06-05) takes the #1 spot across all Arenas again! 🥇 #1 in Text, Vision, WebDev 🥇 #1 in Hard, Coding, Math, Creative, Multi-turn, Instruction Following, and Long Queries categories Huge congrats @GoogleDeepMind! [image]

2025-06-06 View on X

9to5Google

Google releases an upgraded preview of Gemini 2.5 Pro, saying its Elo score jumped by 24 points on LMArena and it leads in coding benchmarks like Aider Polyglot

Abner Li / 9to5Google :

View original

We thank the authors' for their feedback. However, there are a number of factual errors and misleading statements in this writeup: Regarding the statement that some model providers are not treated fairly: - This is not true. Given our capacity, we have always tried to honor all

2025-05-01 View on X

TechCrunch

A study from Cohere, Stanford, MIT, and Ai2 accuses LMArena of helping Meta, OpenAI, Google, and Amazon game its popular crowdsourced AI benchmark Chatbot Arena

A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI …

View original

Thanks for the authors' feedback, we're always looking to improve the platform! If a model does well on LMArena, it means that our community likes it! Yes, pre-release testing helps model providers identify which variant our community likes best. But this doesn't mean the

2025-05-01 View on X

TechCrunch

A study from Cohere, Stanford, MIT, and Ai2 accuses LMArena of helping Meta, OpenAI, Google, and Amazon game its popular crowdsourced AI benchmark Chatbot Arena

A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI …

View original

We're excited to invite everyone to a new Beta version of LMArena! 🎉 For months, we've been poring through community feedback to improve the site—fixing errors/bugs, improving our UI layout, and more. To keep supporting the development and continual improvement of this platform, we're also forming a company. Future improvements will continue to be community-driven. We can't wait to hear more feedback! 🧵👇

2025-04-18 View on X

Bloomberg

LMArena says it's starting a company, whose corporate name will be Arena Intelligence, with plans to raise money, and releases a new beta version of its website

fixing errors/bugs, improving our UI layout, and more. To keep supporting the development and continual improvement of this platform, we're also forming a company. Future improve...

View original

LMArena started with just a handful of PhD students and undergrads working day and night on a basic research prototype. We hope that becoming a company will give us the resources to continue to improve it significantly. LMArena will be staying true to its original mission. It

2025-04-18 View on X

Bloomberg

LMArena says it's starting a company, whose corporate name will be Arena Intelligence, with plans to raise money, and releases a new beta version of its website

fixing errors/bugs, improving our UI layout, and more. To keep supporting the development and continual improvement of this platform, we're also forming a company. Future improve...

View original

The Beta was rebuilt from the ground up. It's our first step to address all the feedback we've been receiving from our amazing community. Go and check it out 👀 - you'll notice: ⚡️ Faster, smoother experience 📱 Better on mobile 💬 Chat history + voting tweaks 🧭 Clearer UI &

2025-04-18 View on X

Bloomberg

LMArena says it's starting a company, whose corporate name will be Arena Intelligence, with plans to raise money, and releases a new beta version of its website

fixing errors/bugs, improving our UI layout, and more. To keep supporting the development and continual improvement of this platform, we're also forming a company. Future improve...

View original

We've seen questions from the community about the latest release of Llama-4 on Arena. To ensure full transparency, we're releasing 2,000+ head-to-head battle results for public review. This includes user prompts, model responses, and user preferences. (link in next tweet)

2025-04-08 View on X

The Verge

LMArena says it is updating its leaderboard policies after a Llama 4 Maverick version, which Meta said in fine print is not public, secured the number two spot

With Llama 4, Meta fudged benchmarks to appear as though its new AI model is better than the competition.

View original

BREAKING: Meta's Llama 4 Maverick just hit #2 overall - becoming the 4th org to break 1400+ on Arena!🔥 Highlights: - #1 open model, surpassing DeepSeek - Tied #1 in Hard Prompts, Coding, Math, Creative Writing - Huge leap over Llama 3 405B: 1268 → 1417 - #5 under style control [image]

2025-04-06 View on X

Meta launches Llama 4 Maverick with 400B parameters and Scout with 109B parameters and a 10M context window, and previews Behemoth with 2T total parameters

Takeaways — We're sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

View original

Meta's Llama 4 Maverick hits in the top 5 across all categories. Tied for #1 rank specifically in Hard Prompts, Coding, Math, Creative Writing, Longer Query and Multi-Turn! [image]

2025-04-06 View on X

Meta launches Llama 4 Maverick with 400B parameters and Scout with 109B parameters and a 10M context window, and previews Behemoth with 2T total parameters

Takeaways — We're sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

View original

🎉 Congrats to @GoogleDeepMind on Gemma-3-27B, the newest and one of the strongest open models in Arena! 💠 Top 10 overall - beating out many proprietary models with only 27B parameter 💠 2nd best open model only below DeepSeek-R1 💠 128K context window Check out their blog to [image]

2025-03-12 View on X

9to5Google

Google unveils Gemma 3, the “world's best single-accelerator model”, running on a single GPU, in 1B, 4B, 12B, and 27B sizes, and says it outperforms Llama-405B

Following version 1 in February 2024 and 2 in May, Google today announced Gemma 3 as its latest open model for developers.

View original

Here you can see @xai Grok-3's performance across all the top categories: 🔹 Overall w/ Style Control 🔹 Hard Prompts & Hard Prompt w/ Style Control 🔹 Coding 🔹 Math 🔹 Creative Writing 🔹 Instruction Following 🔹 Longer Query 🔹 Multi-Turn [image]

2025-02-18 View on X

TechCrunch

xAI launches Grok-3 beta and Grok-3 mini, its latest AI models with reasoning, trained on 200K GPUs, or “10x” more compute than Grok-2, for X Premium+ users

Elon Musk's AI company, xAI, late on Monday released its latest flagship AI model, Grok 3, and unveiled new capabilities for the Grok iOS and web apps.

View original

BREAKING: @xAI early version of Grok-3 (codename “chocolate") is now #1 in Arena! 🏆 Grok-3 is: - First-ever model to break 1400 score! - #1 across all categories, a milestone that keeps getting harder to achieve Huge congratulations to @xAI on this milestone! View thread 🧵 [image]

2025-02-18 View on X

TechCrunch

xAI launches Grok-3 beta and Grok-3 mini, its latest AI models with reasoning, trained on 200K GPUs, or “10x” more compute than Grok-2, for X Premium+ users

Elon Musk's AI company, xAI, late on Monday released its latest flagship AI model, Grok 3, and unveiled new capabilities for the Grok iOS and web apps.

View original

Gemini-2.0-Flash-Thinking #1 across all categories!

2024-12-20 View on X

TechCrunch

Google releases Gemini 2.0 Flash Thinking, an experimental “reasoning” model that “explicitly shows its thoughts” and can use them to strengthen its reasoning

Quick: what sort of prompts should you run against GPT-4o vs Gemini 1.5 Flash vs o1 vs o1-pro vs gemini-2.0-flash-thinking-exp? X: Jeff Dean / @jeffdean : Introducing Gemini 2.0 Fl...

View original

Breaking news from Chatbot Arena⚡🤔 @GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories! The leap from Gemini-2.0-Flash: - Overall: #3 → #1 - Overall (Style Control): #4 → #1 - Math: #2 → #1 - Creative Writing: #2 → #1 - Hard Prompts: #1 → #1

2024-12-20 View on X

TechCrunch

Google releases Gemini 2.0 Flash Thinking, an experimental “reasoning” model that “explicitly shows its thoughts” and can use them to strengthen its reasoning

Quick: what sort of prompts should you run against GPT-4o vs Gemini 1.5 Flash vs o1 vs o1-pro vs gemini-2.0-flash-thinking-exp? X: Jeff Dean / @jeffdean : Introducing Gemini 2.0 Fl...

View original