gneubig · TEXXR

To be honest, I'm a bit of a skeptic of claims that models are on par with Claude/GPT, but this is definitely one that I feel is getting there. Especially for tasks that focus on code (as opposed to other things like writing, math, etc.) More in the thread above.

2026-02-13 View on X

MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5. — Extensively trained with reinforcement learning …

View original

MiniMax-M2.5 is a surprising new step in open coding models. The first model where I've been able to independently confirm that it's better than the most recent Claude Sonnet. It showed up in our benchmarks below, and in my vibe checks it felt strong and diverse.

2026-02-13 View on X

MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5. — Extensively trained with reinforcement learning …

View original

MiniMax-M2.5 is a surprising new step in open coding models. The first model where I've been able to independently confirm that it's better than the most recent Claude Sonnet. It showed up in our benchmarks below, and in my vibe checks it felt strong and diverse.

2026-02-12 View on X

MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5. — Extensively trained with reinforcement learning …

View original

To be honest, I'm a bit of a skeptic of claims that models are on par with Claude/GPT, but this is definitely one that I feel is getting there. Especially for tasks that focus on code (as opposed to other things like writing, math, etc.) More in the thread above.

2026-02-12 View on X

MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5. — Extensively trained with reinforcement learning …

View original

Obvious caveat: LLM-generated text detection is not perfect so there will be mistakes. Take this as a guide, not as the truth. But the reason why I asked for this was because I suspected several of my reviews were AI generated, and the results matched w/ my intuition.

2025-11-30 View on X

Nature

Pangram Labs: ~21% of the 75,800 peer reviews submitted for ICLR 2026, a major ML conference, were fully AI-generated, and 50%+ contained signs of AI use

By - Miryam Naddaf 0 — Miryam Naddaf is a science writer based in London. — Search author on: — PubMed Google Scholar

View original

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI? [image]

2025-11-30 View on X

Nature

Pangram Labs: ~21% of the 75,800 peer reviews submitted for ICLR 2026, a major ML conference, were fully AI-generated, and 50%+ contained signs of AI use

By - Miryam Naddaf 0 — Miryam Naddaf is a science writer based in London. — Search author on: — PubMed Google Scholar

View original

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI? [image]

2025-11-29 View on X

Nature

Pangram Labs: ~21% of the 75,800 peer reviews submitted for ICLR 2026, a major ML conference, were fully AI-generated, and 50%+ contained signs of AI use

By - Miryam Naddaf 0 — Miryam Naddaf is a science writer based in London. — Search author on: — PubMed Google Scholar

View original

Obvious caveat: LLM-generated text detection is not perfect so there will be mistakes. Take this as a guide, not as the truth. But the reason why I asked for this was because I suspected several of my reviews were AI generated, and the results matched w/ my intuition.

2025-11-29 View on X

Nature

Pangram Labs: ~21% of the 75,800 peer reviews submitted for ICLR 2026, a major ML conference, were fully AI-generated, and 50%+ contained signs of AI use

By - Miryam Naddaf 0 — Miryam Naddaf is a science writer based in London. — Search author on: — PubMed Google Scholar

View original

In case anyone was wondering, 10GW is about 6% of the energy that all humans in the world spend thinking.

2025-09-23 View on X

CNBC

A look at the Nvidia-OpenAI deal, where Nvidia will invest in $10B tranches; sources say OpenAI informed Microsoft about the deal a day before it was signed

ABILENE, Texas - Sam Altman had a deadline. OpenAI's CEO was headed to Texas to unveil his company's next big infrastructure push …

View original

What would it take to create an open-source Operator? In anticipation of the OpenAI Operator release, I have started gathering together some resources related to Operator and other solutions to task automation: https://github.com/... Let's gather resources and discuss 😃 [image]

2025-01-24 View on X

TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.

View original

@invariant_labs Overall impressions: at the moment Operator seems to solely function the web, significantly less expansive than some had imagined — a MacOS integration. Nice polished user interface, although not far from what we have in OpenHands or other closed alternatives like MultiOn.

2025-01-24 View on X

Every

Hands-on with Operator: limited in what it can browse, can autonomously perform repetitive workflows, and can do lengthy tasks on its own with minimal prompting

but with hiccups Efe Udin / Gizchina.com : OpenAI launches Operator to carry out online tasks for users Mitra Sorrells / Engage Feed : OpenAI debuts “Operator” agent that can book ...

View original

OpenAI Operator mainly benchmarked on OSWorld and and WebArena. I did some (agent-assisted) research and summarized the top open and closed solutions on these two benchmarks. Details here: https://github.com/... [image]

2025-01-24 View on X

The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

View original

A summary of operator safety risks and mitigations. 1. Refusing harmful tasks 2. Blocking particular web sites 3. Asking for confirmation in the case of possibly risky actions [image]

2025-01-24 View on X

TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.

View original

A combo of training the model to be well-aligned, and also post-hoc detection that attempts to monitor anything unsafe. This sort of confirmation+post-hoc monitoring is really important! OpenHands has a “confirmation mode” co-developed with @invariant_labs for this reason. [image]

2025-01-24 View on X

TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.

View original

Currently doing a demo of web navigation booking a table on OpenTable and shopping for groceries. Pretty standard web agent stuff implemented in many agent frameworks and evaluated using WebArena, AssistantBench: * https://webarena.dev/ * https://assistantbench.github.io/

2025-01-24 View on X

TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.

View original

A summary of operator safety risks and mitigations. 1. Refusing harmful tasks 2. Blocking particular web sites 3. Asking for confirmation in the case of possibly risky actions [image]

2025-01-24 View on X

The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

View original

A combo of training the model to be well-aligned, and also post-hoc detection that attempts to monitor anything unsafe. This sort of confirmation+post-hoc monitoring is really important! OpenHands has a “confirmation mode” co-developed with @invariant_labs for this reason. [image]

2025-01-24 View on X

The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

View original

Currently doing a demo of web navigation booking a table on OpenTable and shopping for groceries. Pretty standard web agent stuff implemented in many agent frameworks and evaluated using WebArena, AssistantBench: * https://webarena.dev/ * https://assistantbench.github.io/

2025-01-24 View on X

The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

View original

What would it take to create an open-source Operator? In anticipation of the OpenAI Operator release, I have started gathering together some resources related to Operator and other solutions to task automation: https://github.com/... Let's gather resources and discuss 😃 [image]

2025-01-24 View on X

The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

View original

@invariant_labs Overall impressions: at the moment Operator seems to solely function the web, significantly less expansive than some had imagined — a MacOS integration. Nice polished user interface, although not far from what we have in OpenHands or other closed alternatives like MultiOn.

2025-01-24 View on X

TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.

View original