/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Merve

@mervenoyann
14 posts
2026-02-12
GLM-5 is out on @huggingface 🔥 > A40B/744B, trained on more tokens (28.5T) > outperforms/on par with closed sota > allows commercial use (MIT licensed) 💗 use with vLLM/SGLang locally or through HF Inference Providers thanks to @novita_labs and @Zai_org 📦 [image]
2026-02-12 View on X
Z.ai

Z.ai launches GLM-5, saying its flagship open-weight model has “best-in-class performance among all open-source models” in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks.  Scaling is still one of the most important ways …

GLM-5 is out on @huggingface 🔥 > A40B/744B, trained on more tokens (28.5T) > outperforms/on par with closed sota > allows commercial use (MIT licensed) 💗 use with vLLM/SGLang locally or through HF Inference Providers thanks to @novita_labs and @Zai_org 📦 [image]
2026-02-12 View on X
Reuters

Z.ai says it will raise prices by at least 30% for new GLM coding plan subscribers to accommodate surging demand for its AI coding tools

2025-11-30
DeepSeek released DeepSeekMathv2 (based on DeepSeek-V3.2-Exp-Base) outperforming Gemini DeepThink on IMO ProofBench and CNML from paper: > they train an LLM-based verifier for reward function > they train this model using verifier, and ask it to resolve issues on its own > [image]
2025-11-30 View on X
The Decoder

DeepSeek says its new DeepSeekMath-V2 model got gold-medal level status on the International Mathematical Olympiad 2025 and Chinese Mathematical Olympiad 2024

where models prove formal mathematical theorems—GPT-5 scores 20%.  Gemini Deep Think IMO Gold hits 65.7%.  DeepSeek Math V2 (Heavy) scores 61.9%.  That's second place—but Gemini is...

2025-11-29
DeepSeek released DeepSeekMathv2 (based on DeepSeek-V3.2-Exp-Base) outperforming Gemini DeepThink on IMO ProofBench and CNML from paper: > they train an LLM-based verifier for reward function > they train this model using verifier, and ask it to resolve issues on its own > [image]
2025-11-29 View on X
The Decoder

DeepSeek says its new DeepSeekMath-V2 model got gold-medal level status on the International Mathematical Olympiad 2025 and Chinese Mathematical Olympiad 2024

Chinese startup Deepseek reports its new DeepseekMath-V2 model has reached gold medal status at the Math Olympiad …

2025-10-21
DeepSeek-OCR is out! 🔥 my take ⤵️ > pretty insane it can parse and re-render charts in HTML > it uses CLIP and SAM features concatenated, so better grounding > very efficient per vision tokens/performance ratio > covers 100 languages [image]
2025-10-21 View on X
The Decoder

DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute

the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8....

2025-09-24
my vibe tests with Qwen3-Omni family of models > document performance with Instruct is very good 🎯 > video understanding is nice ⏯️ > Thinking performs better in English > I suggest to use Captioner if you really want audio output, other two hallucinates a bit [video]
2025-09-24 View on X
Simon Willison's Weblog

Alibaba releases the Qwen3-VL vision models, the Qwen3Guard “safety moderation” models, and three closed-weight models, including Qwen3-Max with 1T+ parameters

Qwen 50.6k  —  Safetensors qwen3_vl_moe Julian Nabil / Forbes Middle East : Alibaba Introduces Qwen3-Max AI Model With Over 1T Parameters Markus Kasanmascheff / WinBuzzer : Alibaba...

my vibe tests with Qwen3-Omni family of models > document performance with Instruct is very good 🎯 > video understanding is nice ⏯️ > Thinking performs better in English > I suggest to use Captioner if you really want audio output, other two hallucinates a bit [video]
2025-09-24 View on X
Bloomberg

Alibaba's Hong Kong-listed shares hit a nearly four-year high after CEO Eddie Wu announced plans to increase AI spending beyond the $53B target over three years

Alibaba Group Holding Ltd.'s shares surged to their highest in nearly four years after revealing plans to ramp up AI spending past …

2025-03-25
Qwen just dropped Qwen2.5-VL-32B-Instruct ☄️ > 32B vision LM > Qwen-2.5-VL-72B in math, reasoning, RAG > better aligned for human preferences > same architecture, available in @huggingface transformers find model and demo on the next one ⤵️ [image]
2025-03-25 View on X
Simon Willison's Weblog

Alibaba releases Qwen2.5-VL-32B, a 32B open model under Apache 2.0, claiming better math reasoning and alignment with human preferences than earlier 2.5 models

Qwen2.5-VL-32B: Smarter and Lighter.  The second big open weight LLM release from China today - the first being DeepSeek v3-0324.

2025-01-27
people who are baffled by DeepSeek have been and still are sleeping on Qwen, InternLM, ByteDance and Tencent here's couple of fan-favorite models from them 🪭
2025-01-27 View on X
Bloomberg

DeepSeek's iOS app tops the App Store's Top Free Apps chart in the US, beating ChatGPT, stirring doubts in Silicon Valley about the strength of the US' AI lead

- App's lower-cost model upends premise for AI spending boom  — Stocks of chip gear makers ASML and Advantest plunge

2024-12-26
QwQ can see 🔥 @Alibaba_Qwen released QvQ, a vision LM with reasoning 😱 it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo! in the next one ⬇️ [image]
2024-12-26 View on X
Qwen

Alibaba releases QvQ-72B-Preview, an experimental research model focused on “enhancing visual reasoning capabilities”, built on Qwen2-VL-72B

QVQ-72B-Preview is an experimental research model developed by the Qwen team … QwenLM on GitHub : Qwen2-VL  —  Introduction After a year's relentless efforts, today we are thrilled...

2024-04-11
8x22B but checkpoints are also here if you feel like checking them out 👀 https://huggingface.co/...
2024-04-11 View on X
VentureBeat

Mistral AI launches Mixtral 8x22B, its latest sparse mixture-of-experts model, after releasing Mixtral 8x7B in December 2023

As Google unleashed a barrage of artificial intelligence announcements at its Cloud Next conference, Mistral AI decided to jump into action with the launch …

2023-03-23
you cannot just blame open-source library for a GDPR breach. open-source doesn't work like that 🙂 https://twitter.com/...
2023-03-23 View on X
Reuters

Sam Altman says OpenAI has fixed a “significant issue” in ChatGPT after a bug in an open-source library let some users see titles of other users' chat history

ChatGPT-owner OpenAI said on Wednesday it had fixed a bug that caused a “significant issue” of a small set of users …

2023-03-16
I mean not explaining one thing you don't do and expecting others to understand why you don't do it is a bit (????) also why do we keep using the phrase AGI so conveniently is a bit weird as well https://twitter.com/...
2023-03-16 View on X
The Verge

Experts criticize OpenAI for not disclosing GPT-4's training data or methods used; OpenAI co-founder says its past approach to openly sharing research was wrong

The system's capabilities are still being assessed, but as researchers and experts pore over its accompanying materials …

2021-05-19
Google has announced MUM (https://blog.google/...) “1000 times more powerful than BERT and can generate as well” First thing coming into mind: how is it different than T5 then? (an encoder-decoder model) Answer: It is trained on downstream tasks on 75 non-English languages!
2021-05-19 View on X
The Verge

Google announces LaMDA, a language model for dialogue applications that it says represents a “breakthrough” for having natural conversations with AI