Google releases Gemini 2.0 Flash Thinking, an experimental “reasoning” model that “explicitly shows its thoughts” and can use them to strengthen its reasoning

Quick: what sort of prompts should you run against GPT-4o vs Gemini 1.5 Flash vs o1 vs o1-pro vs gemini-2.0-flash-thinking-exp? X: Jeff Dean / @jeffdean : Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash's speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time Logan Kilpatrick / @officiallogank : Just when you thought it was over... we're introducing Gemini 2.0 Flash Thinking, a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more 🧵 Noam Shazeer / @noamshazeer : We've been *thinking* about how to improve model reasoning and explainability Introducing Gemini 2.0 Flash Thinking, an experimental model trained to think out loud, leading to stronger reasoning performance. Excited to get this first model into the hands of developers to try @lmarena_ai : Breaking news from Chatbot Arena⚡🤔 @GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories! The leap from Gemini-2.0-Flash: - Overall: #3 → #1 - Overall (Style Control): #4 → #1 - Math: #2 → #1 - Creative Writing: #2 → #1 - Hard Prompts: #1 → #1 Logan Kilpatrick / @officiallogank : It's still an early version, but check out how the model handles a challenging puzzle involving both visual and textual clues: (2/3) [video] Jeff Dean / @jeffdean : Want to see Gemini 2.0 Flash Thinking in action? Check out this demo where the model solves a physics problem and explains its reasoning. Sundar Pichai / @sundarpichai : Our most thoughtful model yet:) Andrej Karpathy / @karpathy : The new Gemini 2.0 Flash Thinking model (Gemini version of GPT o1 that takes a while to think before responding) is very nice and fast and now available to try on Google AI Studio 🧑‍🍳👏. The prominent and pleasant surprise here is that unlike o1 the reasoning traces of the model @lmarena_ai : Gemini-2.0-Flash-Thinking #1 across all categories! @scaling01 : instead of shipping GPT-4.5 OpenAI launched Saxophone guy on day 11 of shipmas Google doesn't even need to do anything to win [image] Ethan Mollick / @emollick : Gemini 2 Flash Thinking pulls off a good sestina, and then schools me on the correct form. If you didn't know, sestinas are HARD & this was impossible for models before test time compute (o1 preview was first to crack it), and the presumably tiny Gemini model does it very fast. [image] Erin Woo / @erinkwoo : the good news: google released a reasoning model the bad news: it still thinks there are two r's in strawberry (screenshots mine) https://www.theinformation.com/ ... [image] Simon Willison / @simonw : The most impressive example I've seen so far involves vision input: llm -m gemini-2.0-flash-thinking-exp- 1219 \ -a http"s://storage.googleapis.com/ generativeai-downloads/images/ geometry.png \ “What's the area of the overlapping region?” [image] Max Weinbach / @maxwinebach : I think all of OpenAIs model/technical advantage is gone. Google and Anthropic have caught up or exceeded in production models (using production very liberally here) and likely in development pipeline and absolutely in agentic frameworks. Now, the OpenAI advantage is brand. Jaana Dogan / @rakyll : Unpopular opinion: Benchmarking Google against a small research institute is a low bar. With the infra and the quality of engineers we have, we should be consistently five years ahead. @axelgarciak : gemini-2.0-flash-thinking-exp- 1219 released. Google's version of the OpenAI o1 model. 12 days of GoogleAI. 🤓 Alex Volkov / @altryne : I evaluated o1-2024-12-17 (with all 3 reasoning efforts) and gemini-2.0-flash-thinking-exp- 1219 on 10 challenging questions from @AIExplainedYT simple bench and got some surprising results! Flash thinking is standing up to o1 while being MUCH faster 😮 [image] Sholto Douglas / @_sholtodouglas : A taste of what we've been thinking about recently :) Try it out! Its still a little raw, we expect it to have sharp edges - but it represents incredible algorithmic progress on test time compute. Also check out the thoughts - its fun, and a little humanizing. Rohit / @krishnanrohit : The new Gemini reasoning model is quite good! Cracked my wordgrid puzzle, and answered an economic question rather well. [video] Zitong Yang / @zitongyang0 : OMG, this model got the three gambler's problem right (a problem I reserved for testing these reasoning models), it's the first model that got this problem correct, out of o1-preview, o1, r1, QwQ. Problem: Consider three gamblers initially having (a, b, c\) dollars. Each trial [image] Rishabh Srivastava / @rishdotblog : Ooh exciting to see Google's new Flash reasoning model! Comparable to o1-mini in my tests so far. o1 is significantly better on harder code and data analysis problems in my tests, though I'm excited to see what a larger Gemini Pro reasoning model might look like! Dustin Tran / @dustinvtran : Here is what Gemini can do on *Flash*. My favorite perk: Gemini 2.0 Flash Thinking has significant gains in core capabilities while also excellent in user preferences (co-#1 with gemini-exp-1206 on @lmarena_ai). The best of both worlds. Sholto Douglas / @_sholtodouglas : I really like the thoughts in this problem, a cute example of out of the box thinking. As models get stronger, taking them seriously will continue to be the right way to understand both the current gen - and what will be possible in even 3 months. [image] Melvin Johnson / @melvinjohnsonp : Checking out our experimental thinking model built on top of 2.0 Flash. I'm excited about the improvements in hard tasks with this model. More to come in this space. 🤔 Demis Hassabis / @demishassabis : been thinking about thinking for a long time... 🧠 Brij Singh / @brij : Gary Marcus has hit a wall Deedy / @deedydas : Google really cooked with Gemini 2.0 Flash Thinking. It thinks AND it's fast AND it's high quality. Not only is it #1 on LMArena on every category, but it crushes my goto Math riddle in 14s—5x faster than any other model that can solve it! Google is making OpenAI dance. Vint / @minty_vint : Looks like the safety settings on Gemini 2.0 Flash Thinking Experimental apply on both the chain of thought step and the actual output step. Also it sees through the trivial Monty Hall where the host opens a car instead of a goat. Richard Seroter / @rseroter : Wow. Loving these model experiments, and that Gemini 2.0 Flash Thinking shows it “thinking” in a transparent way. You can try this out in @googleaistudio RIGHT NOW. Tell us what you think. Thang Luong / @lmthang : Last announcement of the year from @GoogleDeepMind? Not sure :) but glad to be part of the team that launched Gemini 2.0 Flash Thinking, a model that is both smart & fast! Welcome to the Thinking era. What's next for 2025? Give Thinking a try at https://aistudio.google.com/ ... Vint / @minty_vint : Looks like the safety settings on Gemini 2.0 Flash Thinking Experimental apply on both the chain of thought step and the actual output step Paul Couvert / @itspaulai : Google has just released a very good reasoning model... and it's free 🔥 Already the best model on Chatbot Arena. Select “Gemini 2.0 Flash Thinking” in AI Studio and you can start using it right away. Based on Flash so VERY fast. Thomas Kurian / @thomasortk : Looking forward to seeing how developers use Gemini 2.0 Flash Thinking, available now on Vertex AI & AI Studio. https://aistudio.google.com/ ... Denis Shiryaev / @literallydenis : A unicorns by Gemini 2.0 Flash Thinking Experimental: looks beautiful, like in real life @_akhaliq : Google drops Gemini 2.0 Flash Thinking a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more now available in anychat, try it out: Jeff Dean / @jeffdean : Considering its speed, we're pretty happy with how the experimental Gemini 2.0 Flash Thinking model is performing on lmsys. Dan Sutera / @dansutera : Testing out Gemini Flash 2.0 Thinking Experimental. Just ranked #1 on reasoning. Some pretty mindblowing possibilities... Pallav Agarwal / @pallavmac : @OfficialLoganK Gemini 2.0 Flash Thinking successfully solves this riddle by answering ‘Blue Cone’ @thelobbyistguy : I gave Gemini 2.0 Flash Thinking this screenshot of the Putnam Math Competition exam and it started forming “Mental Sandbox Simulations” to ideate-on and solve each of the problems simultaneously. Kind of freaky. Anton / @abacaj : Not feeling the vibe with the new gemini flash 2.0 thinking model (it seems a lot worse than o1). I am impressed with gemini flash 2.0 vision capabilities though pretty sure it is SOTA or very close Jack Rae / @drjwrae : We released Gemini 2.0 Flash Thinking today! ⚡️🤔 It's a small step towards improved reasoning via inference-time compute, built on top of our small and mighty 2.0 Flash! João / @joaoxcruz : “How many r's in “strawberry""? It's weird that Gemini 2.0 Flash gets it right but Gemini 2.0 Flash Thinking doesn't Aditya Kusupati / @adityakusupati : 2.0 ⚡️🤔 model is amazing & improving quickly! Been playing with it for a while now & the thoughts sometimes surprise me, glad all can see the thoughts as well! “Num w/ same letters as value” in different languages (Telugu, Hindi, Farsi) - pretty good! https://aistudio.google.com/ ... @tsendeemts : I have been thinking a lot about what is a good thought. Here is an exp. Gemini 2.0 flash with thinking. Dan Mac / @daniel_mac8 : 🎥 🔥 🤔 First LOOK at Gemini 2.0 Flash Thinking >>> Google's response to OpenAI's o1 🔑 Key Takeaways: - MUCH faster than o1 - FREE to use on Google AI Studio - Reasoning process more transparent (?) - Hard to say if it's more powerful or not - need to wait for benchmark @theaiveteran : New AI model by Google: Gemini 2.0 Flash Thinking. This model leaves an audit trail of its thoughts behind but is otherwise the same as Gemini 2.0 Flash. This audit trail is a big development for transparent AI, a movement that seeks to have just these features to double check Cedric Chee / @cedric_chee : Google enters the era of reasoning with Gemini 2.0 Flash Thinking. It is not hiding its thinking process, unlike OpenAI o1. I'm curious how it ranks against Qwen QwQ and DeepSeek r1 @koltregaskes : Does Gemini Flash 2.0 Thinking best the MisguidedAttention test? No, but it's the best response to date. It starts off well and spots the “visit” phrase", but then it rephrases it incorrectly: Zoubin Ghahramani / @zoubinghahrama1 : It's cool to see the reasoning steps of Gemini 2.0 Thinking Flash! And it's pretty clear that for some AI use cases more inference time computation is the right way to go. Zoubin Ghahramani / @zoubinghahrama1 : Gemini 2.0 Flash Thinking thinks so fast, I can't keep up! Pallav Agarwal / @pallavmac : AGI achieved by Gemini 2.0 Flash Thinking /s @koltregaskes : Gemini 2.0 Flash Thinking Experimental, Coming to AI Studio @keytryer : New flash thinking Gemini 2.0 on Google AI Studio can kiiind of play Hangman, but it still makes very basic mistakes. Subhash Peshwa / @subhash_peshwa : The new model THINKS FAST! Twice as fast as o1-mini infact! Here's the response time comparison between @GoogleDeepMind Gemini 2.0-flash-thinking and @OpenAI o1-mini. AshutoshShrivastava / @ai_for_success : Google bullying OpenAI on daily basis now.. Google Gemini 2.0 Flash Thinking is now number 1 over all on lmsys arena. @test_tm7873 : Gemini 2.0 Flash Thinking Experimental VS Openai o1 (in short Gemini won) On image task. Task was to fill the tabel from the image. the tabel was also send as a image. Gemini 6 / 8 correct O1 3 / 8 correct (correct according to the key of answers) source of the images. AshutoshShrivastava / @ai_for_success : Google totally won 12 days of Shipmas 🎅 Google new Gemini 2.0 Flash Thinking, a new experimental model in action. Congratulations Google 👏 Alexander Jia / @alexanderjiazx : Gemini-2.0-flash-thinking-exp just answer my tricky “ML accuracy” question correctly and precisely Even OpenAI o1 got it wrong Jeff Dean / @jeffdean : Here's how the new experimental Gemini 2.0 Flash Thinking model compares on the lmsys arena. Joseph Mambwe / @mrmambwe : I didn't even realise the OpenAI livestream came and went while playing around with Gemini 2.0 Flash thinking model, what an absolute contrast from the start of this month to now Tulio Manfredi / @manfreditu : Gemini 2.0 Flash Thinking is really good. Better reasoning and better verbosity on the way to arrive to the answer Xinyun Chen / @xinyun_chen_ : Very excited to be part of the team that builds Gemini 2.0 Flash Thinking. Try our experimental model at https://aistudio.google.com/ .... Any feedback is welcome and appreciated! Simon / @tokumin : “Thinking”, a self portrait. - Gemini 2.0 Flash Thinking & Veo2 @testingcatalog : Looks like we may be getting a new 2.0 Gemini model soon 👀 > “gemini-2.0-flash-thinking-exp” - will it be 2.0 Flash-based Deep Research? h/t @swishfever Omar Sanseviero / @osanseviero : Gemini 2.0 Flash Thinking is out! 🚀(experimental) 🤯Solve complex reasoning 🤔Transparent thinking process 👀Text and image input Try it out for free: https://aistudio.google.com/ ... Docs: https://ai.google.dev/... Max Weinbach / @maxwinebach : New Gemini 2.0 Flash Thinking model! It's a reasoning model like o1, but shows the full output of CoT and reasoning @koltregaskes : Gemini 2.0 Flash Thinking goes straight in at #1 on Chat Arena overall. Sadly, I'm getting less and less confident in this benchmark. Will await @bindureddy's LiveBench results before getting excited: @kimmonismus : Gemini 2.0 Flash Thinking Experimental is available in AI Studio @gm8xx8 : Gemini 2.0 Flash Thinking Experimental > Multimodal Understanding: Handles tasks involving multiple data types. > Reasoning Capabilities: Excels at solving complex problems. > Coding: Tackles difficult code and math challenges. > Visible Thinking Process: Shows the model's Dave Goldblatt / @davegoldblatt : I asked Gemini 2.0 Flash Thinking for a long tail possible outcomes of humanity over the next 10 years, and, interestingly, one of the scenarios sounds a lot like what @balajis speaks to Ale𝕏 Fazio / @alxfazio : Google is absolutely on fire this week. they just announced a model with reasoning: “Gemini 2.0 Flash Thinking” @thejacobdean_ : Google totally won 12 days of Shipmas 🎅 Google new Gemini 2.0 Flash Thinking, a new experimental model in action. Congratulations Google 👏 https://x.com/... — AshutoshShrivastava (@ai_for_success) Dec 19, 2024 Ankesh Anand / @ankesh_anand : Excited to share an early preview of our gemini 2.0 flash thinking model with all it's raw thoughts visible. Here's the model trying to solve a Putnam 2024 with multiple approaches, and then self-verifies that it's answer was correct. AshutoshShrivastava / @ai_for_success : It's so over guys... GOOGLE has new model Gemini 2.0 Flash Thinking Experimental, thier reasoning model and I was correct with my prediction. It's available on AI Studio Thanks you @BalajiAkiri for sharing the info. This is real shipmas🎅 @sullyomarr : This is insane. Gemini flash 2.0 is 2x faster and cheaper while being SIGNIFICANTLY smarter than before Guys deepmind is cooking Forums: r/artificial : One-Minute Daily AI News 12/19/2024 BeauHD / Slashdot : Google Releases Its Own ‘Reasoning’ AI Model Ars OpenForum : Not to be outdone by OpenAI, Google releases its own “reasoning” AI model

TechCrunch 2024-12-20

Chronicles

Google releases Gemini 2.0 Flash Thinking, an experimental “reasoning” model that “explicitly shows its thoughts” and can use them to strengthen its reasoning