/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Brendan

@brendanfoody
7 posts
2026-03-06
GPT 5.4 is the best model we've ever tested on APEX-Agents. It's also the first model to pass 50% mean score. A year ago, frontier models couldn't even edit an Excel sheet and scored less than 5%. Now, in less than 3 months GPT 5.4 has improved by 15.7%. ChatGPT will imminently [image]
2026-03-06 View on X
The Verge

OpenAI launches GPT-5.4, saying it is its “most capable and efficient frontier model for professional work” and its first with native computer use capabilities

The latest model comes with native computer use capabilities, allowing it to take on jobs across your device and applications.

2026-03-05
GPT 5.4 is the best model we've ever tested on APEX-Agents. It's also the first model to pass 50% mean score. A year ago, frontier models couldn't even edit an Excel sheet and scored less than 5%. Now, in less than 3 months GPT 5.4 has improved by 15.7%. ChatGPT will imminently [image]
2026-03-05 View on X
The Verge

OpenAI launches GPT-5.4, saying it is its “most capable and efficient frontier model for professional work” and its first with native computer use capabilities

The latest model comes with native computer use capabilities, allowing it to take on jobs across your device and applications.

2026-02-21
Gemini 3.1 Pro is now at the top of the APEX-Agents leaderboard. Gemini jumped from 18.4% to 33.5% on Pass@1 in just 90 days. It also completes 5 tasks that no model has ever been able to do before. @GeminiApp shows how quickly agents are improving at real knowledge work. It [image]
2026-02-21 View on X
9to5Google

Google rolls out Gemini 3.1 Pro, which it says is “a step forward in core reasoning”, for all users in the Gemini app; the .1 increment is a first for Google

2026-02-20
Gemini 3.1 Pro is now at the top of the APEX-Agents leaderboard. Gemini jumped from 18.4% to 33.5% on Pass@1 in just 90 days. It also completes 5 tasks that no model has ever been able to do before. @GeminiApp shows how quickly agents are improving at real knowledge work. It [image]
2026-02-20 View on X
9to5Google

Google rolls out Gemini 3.1 Pro, which it says is “a step forward in core reasoning”, for all users in the Gemini app; the .1 increment is a first for Google

In November, Google introduced Gemini 3 Pro in preview, with Gemini 3 Flash following a month later.

2026-02-19
Gemini 3.1 Pro is now at the top of the APEX-Agents leaderboard. Gemini jumped from 18.4% to 33.5% on Pass@1 in just 90 days. It also completes 5 tasks that no model has ever been able to do before. @GeminiApp shows how quickly agents are improving at real knowledge work. It [image]
2026-02-19 View on X
9to5Google

Google rolls out Gemini 3.1 Pro, which it says is “a step forward in core reasoning”, for all users in the Gemini app; the .1 increment is a first for Google

In November, Google introduced Gemini 3 Pro in preview, with Gemini 3 Flash following a month later.

2025-10-03
We collaborated with the world's leading experts to create APEX: - Larry Summers (@LHSummers), former US Treasury Secretary - Cass Sunstein (@CassSunstein), the most cited legal scholar - Eric Topol (@EricTopol), physician and best-selling author - Dominic Barton, former [image]
2025-10-03 View on X
Mercor

Mercor launches the AI Productivity Index (APEX), which evaluates AI models' ability to perform “economically valuable knowledge work”; GPT-5 leads at 64.2%

still not production-ready Nikita Ostrovsky / Time : AI Is Learning to Do the Jobs of Doctors, Lawyers, and Consultants arXiv.org : The AI Productivity Index (APEX) Agnee Ghosh / B...

AI has its PhD and now it's on the job market.  Introducing the AI Productivity Index (APEX), a benchmark that measures how well we've automated the most valuable industries in the world.  Most benchmarks study abstract capabilities.  APEX evaluates model performance on real deliverables across law, finance, consulting, and medicine...
2025-10-03 View on X
Mercor

Mercor launches the AI Productivity Index (APEX), which evaluates AI models' ability to perform “economically valuable knowledge work”; GPT-5 leads at 64.2%

still not production-ready Nikita Ostrovsky / Time : AI Is Learning to Do the Jobs of Doctors, Lawyers, and Consultants arXiv.org : The AI Productivity Index (APEX) Agnee Ghosh / B...