nrehiew_ · TEXXR

This will make headlines among people who don't know better. But I am extremely curious to know what novel distillation method they have cooked in China, which requires only ~10M samples (not even logits!) to compete at the frontier. DeepSeek needed only 150,000 samples! [image]

2026-02-24 View on X

Reuters

A Trump administration official says DeepSeek's new model, expected next week, was trained on Nvidia Blackwell chips, in a potential US export control violation

View original

This will make headlines among people who don't know better. But I am extremely curious to know what novel distillation method they have cooked in China, which requires only ~10M samples (not even logits!) to compete at the frontier. DeepSeek needed only 150,000 samples! [image]

2026-02-24 View on X

Wall Street Journal

Anthropic says DeepSeek, MiniMax, and Moonshot violated its ToS by prompting Claude a combined 16M+ times and using distillation to train their own products

The allegations mirror those of OpenAI, which told House lawmakers that DeepSeek used ‘distillation’ to improve models

View original

Incredible release. It also comes with 13 pages of infra details 🔥 [image]

2025-11-29 View on X

Prime Intellect

Prime Intellect debuts INTELLECT-3, an RL-trained 106B parameter open source MOE model it claims outperforms larger models across math, code, science, reasoning

Today, we release INTELLECT-3, a 100B+ parameter Mixture-of-Experts model trained on our RL stack, achieving state …

View original

Whenever OpenAI releases something new, everyone else plays catchup and tries to replicate whatever new innovation. When o1 preview/reasoning was released, everyone was speculating about the underlying research. There has been no talk about the GPT5 router at all.

2025-08-08 View on X

VentureBeat

OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard

After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs) …

View original

Whenever OpenAI releases something new, everyone else plays catchup and tries to replicate whatever new innovation. When o1 preview/reasoning was released, everyone was speculating about the underlying research. There has been no talk about the GPT5 router at all.

2025-08-08 View on X

TechCrunch

OpenAI says GPT-5 is a unified system with an efficient model for most questions, a reasoning model for harder problems, and a router that decides which to use

All You Need To Know Lakshay Kumar / Business Today : What is GPT-5? How OpenAI is upgrading your ChatGPT experience Tsveta Ermenkova / PhoneArena : You can now chat with a PhD-lev...

View original

Everytime I see one of Owain's papers, i always find them hard to wrap my head around. Fascinating work and i think it has really nice implications on watermarking too [image]

2025-07-23 View on X

Anthropic

Anthropic and other researchers detail “subliminal learning”, where LLMs learn traits from model-generated data that is semantically unrelated to those traits

We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits.

View original

Takeaways + (guesses): 1) This is likely a multi agent system. So it isn't a single reasoner thinking for a million tokens in one go 2) (This likely doesn't use much training compute if at all) 3) They have a general purpose verifier beyond just rule based final answer [image]

2025-07-20 View on X

@alexwei_

[Thread] An OpenAI researcher says the company's latest experimental reasoning LLM achieved gold medal-level performance on the 2025 International Math Olympiad

1/N I'm excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most pres...

View original

This video is generated btw. almost entirely pretty sure

2025-05-22 View on X

OpenAI

In a video and a letter signed “Sam & Jony”, Altman and Ive say io, founded in 2024 by Ive, Scott Cannon, Evans Hankey, and Tang Tan will develop new products

Update! It's this: — openai.com/sam-and-jony/ Emma Jacobs / @emmavj : Your dad introducing his new partner — openai.com/sam-and-jony/ Kate Knibbs / @knibbs : but what are they...

View original

This video is generated btw. almost entirely pretty sure

2025-05-22 View on X

Bloomberg

OpenAI acquires io, Jony Ive's secretive AI startup, for nearly $6.5B in stock; Ive and LoveFrom will remain independent but take over design for all of OpenAI

OpenAI will acquire the AI device startup co-founded by Apple Inc. veteran Jony Ive in a nearly $6.5 billion all-stock deal …

View original

These examples are extremely damning on the utility of Chatbot arena as a serious benchmark. Look through all the examples that Maverick won, and it's slop after slop after slop. This is the nonsense you are optimizing for if you are trying to goodhart lmsys. Let's be serious [image]

2025-04-08 View on X

The Verge

LMArena says it is updating its leaderboard policies after a Llama 4 Maverick version, which Meta said in fine print is not public, secured the number two spot

With Llama 4, Meta fudged benchmarks to appear as though its new AI model is better than the competition.

View original

Arch wise they differ significantly from meta which just used a single massive dense transformer For oss Mixture of Experts, mixtral was the first (i think) and DeepSeek popularised it. Multi-Head Latent attention (MLA) comes from their Deepseek v2 paper which basically makes [image]

2024-12-27 View on X

VentureBeat

DeepSeek releases DeepSeek-V3, an open-source MoE model of 671B total parameters, with 37B activated per token, claiming it outperforms top models like GPT-4o

Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3.

View original

This is the image that has been going around so you probably know how nuts this is but some added context is that Llama 3 405B was trained on 16K H100 https://x.com/... [image]

2024-12-27 View on X

VentureBeat

DeepSeek releases DeepSeek-V3, an open-source MoE model of 671B total parameters, with 37B activated per token, claiming it outperforms top models like GPT-4o

Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3.

View original

How to train a 670B parameter model. Let's talk about the DeepSeek v3 report + some comparisons with what Meta did with Llama 405B [image]

2024-12-27 View on X

VentureBeat

DeepSeek releases DeepSeek-V3, an open-source MoE model of 671B total parameters, with 37B activated per token, claiming it outperforms top models like GPT-4o

Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3.

View original