/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

@nrehiew_

@nrehiew_
13 posts
2026-02-24
This will make headlines among people who don't know better. But I am extremely curious to know what novel distillation method they have cooked in China, which requires only ~10M samples (not even logits!) to compete at the frontier. DeepSeek needed only 150,000 samples! [image]
2026-02-24 View on X
Reuters

A Trump administration official says DeepSeek's new model, expected next week, was trained on Nvidia Blackwell chips, in a potential US export control violation

This will make headlines among people who don't know better. But I am extremely curious to know what novel distillation method they have cooked in China, which requires only ~10M samples (not even logits!) to compete at the frontier. DeepSeek needed only 150,000 samples! [image]
2026-02-24 View on X
Wall Street Journal

Anthropic says DeepSeek, MiniMax, and Moonshot violated its ToS by prompting Claude a combined 16M+ times and using distillation to train their own products

The allegations mirror those of OpenAI, which told House lawmakers that DeepSeek used ‘distillation’ to improve models

2025-11-29
Incredible release. It also comes with 13 pages of infra details 🔥 [image]
2025-11-29 View on X
Prime Intellect

Prime Intellect debuts INTELLECT-3, an RL-trained 106B parameter open source MOE model it claims outperforms larger models across math, code, science, reasoning

Today, we release INTELLECT-3, a 100B+ parameter Mixture-of-Experts model trained on our RL stack, achieving state …

2025-08-08
Whenever OpenAI releases something new, everyone else plays catchup and tries to replicate whatever new innovation. When o1 preview/reasoning was released, everyone was speculating about the underlying research. There has been no talk about the GPT5 router at all.
2025-08-08 View on X
VentureBeat

OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard

After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs) …

Whenever OpenAI releases something new, everyone else plays catchup and tries to replicate whatever new innovation. When o1 preview/reasoning was released, everyone was speculating about the underlying research. There has been no talk about the GPT5 router at all.
2025-08-08 View on X
TechCrunch

OpenAI says GPT-5 is a unified system with an efficient model for most questions, a reasoning model for harder problems, and a router that decides which to use

All You Need To Know Lakshay Kumar / Business Today : What is GPT-5? How OpenAI is upgrading your ChatGPT experience Tsveta Ermenkova / PhoneArena : You can now chat with a PhD-lev...

2025-07-23
Everytime I see one of Owain's papers, i always find them hard to wrap my head around. Fascinating work and i think it has really nice implications on watermarking too [image]
2025-07-23 View on X
Anthropic

Anthropic and other researchers detail “subliminal learning”, where LLMs learn traits from model-generated data that is semantically unrelated to those traits

We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits.

2025-07-20
Takeaways + (guesses): 1) This is likely a multi agent system. So it isn't a single reasoner thinking for a million tokens in one go 2) (This likely doesn't use much training compute if at all) 3) They have a general purpose verifier beyond just rule based final answer [image]
2025-07-20 View on X
@alexwei_

[Thread] An OpenAI researcher says the company's latest experimental reasoning LLM achieved gold medal-level performance on the 2025 International Math Olympiad

1/N I'm excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most pres...

2025-05-22
This video is generated btw. almost entirely pretty sure
2025-05-22 View on X
OpenAI

In a video and a letter signed “Sam & Jony”, Altman and Ive say io, founded in 2024 by Ive, Scott Cannon, Evans Hankey, and Tang Tan will develop new products

Update!  It's this:  —  openai.com/sam-and-jony/ Emma Jacobs / @emmavj : Your dad introducing his new partner  —  openai.com/sam-and-jony/ Kate Knibbs / @knibbs : but what are they...

This video is generated btw. almost entirely pretty sure
2025-05-22 View on X
Bloomberg

OpenAI acquires io, Jony Ive's secretive AI startup, for nearly $6.5B in stock; Ive and LoveFrom will remain independent but take over design for all of OpenAI

OpenAI will acquire the AI device startup co-founded by Apple Inc. veteran Jony Ive in a nearly $6.5 billion all-stock deal …

2025-04-08
These examples are extremely damning on the utility of Chatbot arena as a serious benchmark. Look through all the examples that Maverick won, and it's slop after slop after slop. This is the nonsense you are optimizing for if you are trying to goodhart lmsys. Let's be serious [image]
2025-04-08 View on X
The Verge

LMArena says it is updating its leaderboard policies after a Llama 4 Maverick version, which Meta said in fine print is not public, secured the number two spot

With Llama 4, Meta fudged benchmarks to appear as though its new AI model is better than the competition.

2024-12-27
Arch wise they differ significantly from meta which just used a single massive dense transformer For oss Mixture of Experts, mixtral was the first (i think) and DeepSeek popularised it. Multi-Head Latent attention (MLA) comes from their Deepseek v2 paper which basically makes [image]
2024-12-27 View on X
VentureBeat

DeepSeek releases DeepSeek-V3, an open-source MoE model of 671B total parameters, with 37B activated per token, claiming it outperforms top models like GPT-4o

Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3.

This is the image that has been going around so you probably know how nuts this is but some added context is that Llama 3 405B was trained on 16K H100 https://x.com/... [image]
2024-12-27 View on X
VentureBeat

DeepSeek releases DeepSeek-V3, an open-source MoE model of 671B total parameters, with 37B activated per token, claiming it outperforms top models like GPT-4o

Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3.

How to train a 670B parameter model. Let's talk about the DeepSeek v3 report + some comparisons with what Meta did with Llama 405B [image]
2024-12-27 View on X
VentureBeat

DeepSeek releases DeepSeek-V3, an open-source MoE model of 671B total parameters, with 37B activated per token, claiming it outperforms top models like GPT-4o

Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3.