iamgrigorev · TEXXR

2026-01-02

residuals in transformers are great for stability and scaling; deeper layers update the signal along the residual stream. few people questioned this choice publicly, and since 2025 there's been progress. few thoughts about hyper connections (wrt the newly released DeepSeek paper [image]

2026-01-02 View on X

South China Morning Post

DeepSeek researchers detail mHC, a new architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

View original

2026-01-01

2026-01-01 View on X

South China Morning Post

DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

View original

2024-12-07

OpenAI o1 fine-tuning literally looks like a RL with verifiable rewards with a list of pre-defined rewards + their specific reasoning stack to come up to the answers in a specific format.

2024-12-07 View on X

OpenAI

OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data

the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a [...

View original

OpenAI improved their FT API with o1 fine-tuning using “Reinforcement” fine-tuning (instead of a supervised one) They prepared list of “graders”, basically pre-defined reward functions and they use true RL to make o1-mini task specific

2024-12-07 View on X

OpenAI

OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data

the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a [...

View original