sullyomarr · TEXXR

model routing almost never works well cause people suck at explaining what they want thoroughly it's much easier to just pick the right one according to the use case (assuming 1-3 options) for ex, sometimes I type stuff like “best monitor 1440p Reddit” and it's much easier to just click “best model” vs typing out 3 additional sentences on how much thinking it should do...

2025-08-09 View on X

Joanna Stern's Newsletter

OpenAI says ChatGPT Pro users can select old models for now but plans to deprecate them in 60 days; Sam Altman says Plus users will be able to keep using GPT-4o

Mourning the model that knew you best. … RIP, ChatGPT 4o and the rest of the crew. — I won't say I'm deeply attached to GPT-4o, OpenAI's previous main model.

View original

model routing almost never works well cause people suck at explaining what they want thoroughly it's much easier to just pick the right one according to the use case (assuming 1-3 options) for ex, sometimes I type stuff like “best monitor 1440p Reddit” and it's much easier to just click “best model” vs typing out 3 additional sentences on how much thinking it should do...

2025-08-09 View on X

PCMag

With GPT-5's launch, OpenAI has removed its older models from the ChatGPT model selector for some users, sparking backlash from them

‘They have completely ruined ChatGPT,’ one user complains. Some are even canceling their paid subscriptions to ChatGPT, claiming GPT-5 is inferior to the company's previous models...

View original

Good morning [image]

2025-04-03 View on X

CNBC

Tech stocks fall after President Trump announced new global tariffs: Meta drops as much as ~8%, Amazon drops ~7%, Nvidia ~5%, Alphabet ~4%, and Microsoft ~3%

Apple slid more than 6% in late trading Wednesday and led a broader decline in tech stocks after President Donald Trump announced …

View original

Wake up babe new Claude model dropped

2025-02-25 View on X

One Useful Thing

Claude 3.7 and Grok-3 are the first “Gen3” models with big gains in handling complex tasks, using 10x more compute than GPT-4-class models, and better reasoning

Note: After publishing this piece, I was contacted by Anthropic who told me that Sonnet 3.7 would not be considered …

View original

the only let down of claude 3.7 is we didn't get any price reductions it's my go to model, but its slowly getting harder to justify the price (especially once you add reasoning tokens) for comparison gemini 2.0 is 30x cheaper @ 0.1m input and 0.4 output [image]

2025-02-25 View on X

One Useful Thing

Claude 3.7 and Grok-3 are the first “Gen3” models with big gains in handling complex tasks, using 10x more compute than GPT-4-class models, and better reasoning

Note: After publishing this piece, I was contacted by Anthropic who told me that Sonnet 3.7 would not be considered …

View original

the only let down of claude 3.7 is we didn't get any price reductions it's my go to model, but its slowly getting harder to justify the price (especially once you add reasoning tokens) for comparison gemini 2.0 is 30x cheaper @ 0.1m input and 0.4 output [image]

2025-02-25 View on X

TechCrunch

Anthropic releases Claude 3.7 Sonnet, a hybrid model that can produce fast responses or extended, step-by-step thinking, and Claude Code, an agentic coding tool

and it could be a game changer Ghacks : Anthropic Unveils Claude 3.7: First Hybrid Reasoning AI Model Rowan Cheung / The Rundown AI : Claude enters the reasoning era Siddharth Jind...

View original

Wake up babe new Claude model dropped

2025-02-25 View on X

TechCrunch

Anthropic releases Claude 3.7 Sonnet, a hybrid model that can produce fast responses or extended, step-by-step thinking, and Claude Code, an agentic coding tool

and it could be a game changer Ghacks : Anthropic Unveils Claude 3.7: First Hybrid Reasoning AI Model Rowan Cheung / The Rundown AI : Claude enters the reasoning era Siddharth Jind...

View original

Btw I don't say this lightly but SWE in the traditional sense is dead in < 2 years You will still need smart, capable engineers But anything that involves raw coding and no taste is done for o6 will build you basically anything

2024-12-22 View on X

TechCrunch

OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025

12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI m...

View original

yeah its over for coding with o3 this is mindboggling looks like the first big jump since gpt4, because these numbers make 0 sense [image]

2024-12-22 View on X

TechCrunch

OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025

12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI m...

View original

yeah its over for coding with o3 this is mindboggling looks like the first big jump since gpt4, because these numbers make 0 sense [image]

2024-12-21 View on X

TechCrunch

OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025

OpenAI announced its new o3 models on Friday. — In a tweet ahead of its final livestream for its …

View original

Btw I don't say this lightly but SWE in the traditional sense is dead in < 2 years You will still need smart, capable engineers But anything that involves raw coding and no taste is done for o6 will build you basically anything

2024-12-21 View on X

TechCrunch

OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025

OpenAI announced its new o3 models on Friday. — In a tweet ahead of its final livestream for its …

View original

This is insane. Gemini flash 2.0 is 2x faster and cheaper while being SIGNIFICANTLY smarter than before Guys deepmind is cooking

2024-12-20 View on X

TechCrunch

Google releases Gemini 2.0 Flash Thinking, an experimental “reasoning” model that “explicitly shows its thoughts” and can use them to strengthen its reasoning

Quick: what sort of prompts should you run against GPT-4o vs Gemini 1.5 Flash vs o1 vs o1-pro vs gemini-2.0-flash-thinking-exp? X: Jeff Dean / @jeffdean : Introducing Gemini 2.0 Fl...

View original

this is kinda huge - o1 is as smart as PhD students - solves 83% of IMO math problems, vs 13% for gpt4o [image]

2024-09-13 View on X

TechCrunch

OpenAI claims that in a qualifying exam for the International Mathematics Olympiad, o1 correctly solved 83.3% of the problems, while GPT-4o solved only 13.4%

Sam Altman says it “doesn't constitute AGI” Poulami Saha / Financial Express : OpenAI makes big AI breakthrough, ChatGPT can now think and reason: Details Emilia David / VentureBea...

View original

So o1 is as smart as PhD students and solves 83% of IMO math problems, vs 13% for gpt4o Insane improvements in reasoning. [image]

2024-09-13 View on X

Simon Willison's Weblog

OpenAI's o1 models aren't as simple as the next step up from GPT-4o as they introduce major cost and performance trade-offs in exchange for improved “reasoning”

OpenAI released two major new preview models today: o1-preview and o1-mini (that mini one is also a preview …

View original

Fully expect to rearchitect your entire system with new thinking models These are absolutely NOT drop and replace for existing models (see model card, o1-mini is worse than 4mini at some tasks) I see a lot of really cool ways to maximize them (esp with multi-agent systems)

2024-09-13 View on X

Simon Willison's Weblog

OpenAI's o1 models aren't as simple as the next step up from GPT-4o as they introduce major cost and performance trade-offs in exchange for improved “reasoning”

OpenAI released two major new preview models today: o1-preview and o1-mini (that mini one is also a preview …

View original

So o1 is as smart as PhD students and solves 83% of IMO math problems, vs 13% for gpt4o Insane improvements in reasoning. [image]

2024-09-13 View on X

TechCrunch

OpenAI claims that in a qualifying exam for the International Mathematics Olympiad, o1 correctly solved 83.3% of the problems, while GPT-4o solved only 13.4%

Sam Altman says it “doesn't constitute AGI” Poulami Saha / Financial Express : OpenAI makes big AI breakthrough, ChatGPT can now think and reason: Details Emilia David / VentureBea...

View original

Fully expect to rearchitect your entire system with new thinking models These are absolutely NOT drop and replace for existing models (see model card, o1-mini is worse than 4mini at some tasks) I see a lot of really cool ways to maximize them (esp with multi-agent systems)

2024-09-13 View on X

The Verge

OpenAI releases o1, the first of its rumored reasoning-focused Strawberry models, in preview, alongside a smaller o1-mini, for ChatGPT Plus and Team subscribers

Advancing cost-efficient reasoning. — Contributions Sabrina Ortiz / ZDNET : OpenAI trained its new o1 AI models to think before they speak - how to access them Ethan Mollick / On...

View original

this is kinda huge - o1 is as smart as PhD students - solves 83% of IMO math problems, vs 13% for gpt4o [image]

2024-09-13 View on X

Simon Willison's Weblog

OpenAI's o1 models aren't as simple as the next step up from GPT-4o as they introduce major cost and performance trade-offs in exchange for improved “reasoning”

OpenAI released two major new preview models today: o1-preview and o1-mini (that mini one is also a preview …

View original

after using replit's coding agent i think...its over for a lot of traditional saas wanted slack notifications when customers subscribed/cancelled Zapier was 30/mo JUST to add a price filter instead replit's agent built & deployed one in < 5 mins, with tests. 1/10 of the cost [video]

2024-09-08 View on X

Maginative

Replit launches Replit Agent, an AI agent that can build entire apps from scratch based on prompts, available in beta to Replit Core and Teams subscribers

available today for subscribers: [video] @sullyomarr : after using replit's coding agent i think...its over for a lot of traditional saas wanted slack notifications when customers ...

View original

after using replit's coding agent i think...its over for a lot of traditional saas wanted slack notifications when customers subscribed/cancelled Zapier was 30/mo JUST to add a price filter instead replit's agent built & deployed one in < 5 mins, with tests. 1/10 of the cost [video]

2024-09-07 View on X

Maginative

Replit launches Replit Agent, an AI agent that can build entire apps from scratch based on prompts, available in beta to Replit Core and Teams subscribers

Replit has launched an AI agent capable of building entire applications from scratch. This isn't just another copilot coding assistant …

View original