2026-01-02
🚨 DeepSeek just dropped a paper that quietly exposes why modern neural networks get unstable as they scale. It's called mHC: Manifold-Constrained Hyper-Connections, and the core idea is deceptively simple: Neural networks keep breaking their own geometry. Here's what that means. Modern deep models stack layers and then add skip connections everywhere. Residuals, dense connections, cross-layer shortcuts. These help gradients flow, but they also do something subtle and bad: they mix representations that live on different manifolds as if they were compatible. They usually aren't....
South China Morning Post
DeepSeek researchers detail mHC, a new architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden
DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture
2026-01-01
🚨 DeepSeek just dropped a paper that quietly exposes why modern neural networks get unstable as they scale. It's called mHC: Manifold-Constrained Hyper-Connections, and the core idea is deceptively simple: Neural networks keep breaking their own geometry. Here's what that [image]
South China Morning Post
DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden
DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture