/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Graham Neubig

@gneubig
24 posts
2026-02-13
To be honest, I'm a bit of a skeptic of claims that models are on par with Claude/GPT, but this is definitely one that I feel is getting there. Especially for tasks that focus on code (as opposed to other things like writing, math, etc.) More in the thread above.
2026-02-13 View on X
MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5.  —  Extensively trained with reinforcement learning …

MiniMax-M2.5 is a surprising new step in open coding models. The first model where I've been able to independently confirm that it's better than the most recent Claude Sonnet. It showed up in our benchmarks below, and in my vibe checks it felt strong and diverse.
2026-02-13 View on X
MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5.  —  Extensively trained with reinforcement learning …

2026-02-12
MiniMax-M2.5 is a surprising new step in open coding models. The first model where I've been able to independently confirm that it's better than the most recent Claude Sonnet. It showed up in our benchmarks below, and in my vibe checks it felt strong and diverse.
2026-02-12 View on X
MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5.  —  Extensively trained with reinforcement learning …

To be honest, I'm a bit of a skeptic of claims that models are on par with Claude/GPT, but this is definitely one that I feel is getting there. Especially for tasks that focus on code (as opposed to other things like writing, math, etc.) More in the thread above.
2026-02-12 View on X
MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5.  —  Extensively trained with reinforcement learning …

2025-11-30
Obvious caveat: LLM-generated text detection is not perfect so there will be mistakes. Take this as a guide, not as the truth. But the reason why I asked for this was because I suspected several of my reviews were AI generated, and the results matched w/ my intuition.
2025-11-30 View on X
Nature

Pangram Labs: ~21% of the 75,800 peer reviews submitted for ICLR 2026, a major ML conference, were fully AI-generated, and 50%+ contained signs of AI use

By - Miryam Naddaf 0  —  Miryam Naddaf is a science writer based in London.  —  Search author on:  —  PubMed Google Scholar

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI? [image]
2025-11-30 View on X
Nature

Pangram Labs: ~21% of the 75,800 peer reviews submitted for ICLR 2026, a major ML conference, were fully AI-generated, and 50%+ contained signs of AI use

By - Miryam Naddaf 0  —  Miryam Naddaf is a science writer based in London.  —  Search author on:  —  PubMed Google Scholar

2025-11-29
ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI? [image]
2025-11-29 View on X
Nature

Pangram Labs: ~21% of the 75,800 peer reviews submitted for ICLR 2026, a major ML conference, were fully AI-generated, and 50%+ contained signs of AI use

By - Miryam Naddaf 0  —  Miryam Naddaf is a science writer based in London.  —  Search author on:  —  PubMed Google Scholar

Obvious caveat: LLM-generated text detection is not perfect so there will be mistakes. Take this as a guide, not as the truth. But the reason why I asked for this was because I suspected several of my reviews were AI generated, and the results matched w/ my intuition.
2025-11-29 View on X
Nature

Pangram Labs: ~21% of the 75,800 peer reviews submitted for ICLR 2026, a major ML conference, were fully AI-generated, and 50%+ contained signs of AI use

By - Miryam Naddaf 0  —  Miryam Naddaf is a science writer based in London.  —  Search author on:  —  PubMed Google Scholar

2025-09-23
In case anyone was wondering, 10GW is about 6% of the energy that all humans in the world spend thinking.
2025-09-23 View on X
CNBC

A look at the Nvidia-OpenAI deal, where Nvidia will invest in $10B tranches; sources say OpenAI informed Microsoft about the deal a day before it was signed

ABILENE, Texas - Sam Altman had a deadline.  OpenAI's CEO was headed to Texas to unveil his company's next big infrastructure push …

2025-01-24
What would it take to create an open-source Operator? In anticipation of the OpenAI Operator release, I have started gathering together some resources related to Operator and other solutions to task automation: https://github.com/... Let's gather resources and discuss 😃 [image]
2025-01-24 View on X
TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.

@invariant_labs Overall impressions: at the moment Operator seems to solely function the web, significantly less expansive than some had imagined — a MacOS integration. Nice polished user interface, although not far from what we have in OpenHands or other closed alternatives like MultiOn.
2025-01-24 View on X
Every

Hands-on with Operator: limited in what it can browse, can autonomously perform repetitive workflows, and can do lengthy tasks on its own with minimal prompting

but with hiccups Efe Udin / Gizchina.com : OpenAI launches Operator to carry out online tasks for users Mitra Sorrells / Engage Feed : OpenAI debuts “Operator” agent that can book ...

OpenAI Operator mainly benchmarked on OSWorld and and WebArena. I did some (agent-assisted) research and summarized the top open and closed solutions on these two benchmarks. Details here: https://github.com/... [image]
2025-01-24 View on X
The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

A summary of operator safety risks and mitigations. 1. Refusing harmful tasks 2. Blocking particular web sites 3. Asking for confirmation in the case of possibly risky actions [image]
2025-01-24 View on X
TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.

A combo of training the model to be well-aligned, and also post-hoc detection that attempts to monitor anything unsafe. This sort of confirmation+post-hoc monitoring is really important! OpenHands has a “confirmation mode” co-developed with @invariant_labs for this reason. [image]
2025-01-24 View on X
TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.

Currently doing a demo of web navigation booking a table on OpenTable and shopping for groceries. Pretty standard web agent stuff implemented in many agent frameworks and evaluated using WebArena, AssistantBench: * https://webarena.dev/ * https://assistantbench.github.io/
2025-01-24 View on X
TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.

A summary of operator safety risks and mitigations. 1. Refusing harmful tasks 2. Blocking particular web sites 3. Asking for confirmation in the case of possibly risky actions [image]
2025-01-24 View on X
The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

A combo of training the model to be well-aligned, and also post-hoc detection that attempts to monitor anything unsafe. This sort of confirmation+post-hoc monitoring is really important! OpenHands has a “confirmation mode” co-developed with @invariant_labs for this reason. [image]
2025-01-24 View on X
The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

Currently doing a demo of web navigation booking a table on OpenTable and shopping for groceries. Pretty standard web agent stuff implemented in many agent frameworks and evaluated using WebArena, AssistantBench: * https://webarena.dev/ * https://assistantbench.github.io/
2025-01-24 View on X
The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

What would it take to create an open-source Operator? In anticipation of the OpenAI Operator release, I have started gathering together some resources related to Operator and other solutions to task automation: https://github.com/... Let's gather resources and discuss 😃 [image]
2025-01-24 View on X
The Verge

OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching to US subscribers of its $200/month ChatGPT Pro tier

A research preview of an agent that can use its own browser to perform tasks for you. OpenAI on YouTube : Introduction to Operator & Agents David Gewirtz / ZDNET : Operator isn't w...

@invariant_labs Overall impressions: at the moment Operator seems to solely function the web, significantly less expansive than some had imagined — a MacOS integration. Nice polished user interface, although not far from what we have in OpenHands or other closed alternatives like MultiOn.
2025-01-24 View on X
TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.