jiqizhixin · TEXXR

2026-01-02

On the first day of the New Year, DeepSeek released a major paper. They try to fix the training instability that plagues advanced neural network designs. Enter mHC: Manifold-Constrained Hyper-Connections. They take the powerful but unstable “Hyper-Connections” architecture [image]

2026-01-02 View on X

South China Morning Post

DeepSeek researchers detail mHC, a new architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

View original

2026-01-01

2026-01-01 View on X

South China Morning Post

DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

View original

2025-10-21

Compress everything visually! DeepSeek has just released DeepSeek-OCR, a state-of-the-art OCR model with 3B parameters. Core idea: explore long-context compression via 2D optical mapping. Architecture: - DeepEncoder → compresses high-res inputs into few vision tokens; - [image]

2025-10-21 View on X

The Decoder

DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute

the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8....

View original