mervenoyann · TEXXR

GLM-5 is out on @huggingface 🔥 > A40B/744B, trained on more tokens (28.5T) > outperforms/on par with closed sota > allows commercial use (MIT licensed) 💗 use with vLLM/SGLang locally or through HF Inference Providers thanks to @novita_labs and @Zai_org 📦 [image]

2026-02-12 View on X

Z.ai

Z.ai launches GLM-5, saying its flagship open-weight model has “best-in-class performance among all open-source models” in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways …

View original

GLM-5 is out on @huggingface 🔥 > A40B/744B, trained on more tokens (28.5T) > outperforms/on par with closed sota > allows commercial use (MIT licensed) 💗 use with vLLM/SGLang locally or through HF Inference Providers thanks to @novita_labs and @Zai_org 📦 [image]

2026-02-12 View on X

Reuters

Z.ai says it will raise prices by at least 30% for new GLM coding plan subscribers to accommodate surging demand for its AI coding tools

View original

DeepSeek released DeepSeekMathv2 (based on DeepSeek-V3.2-Exp-Base) outperforming Gemini DeepThink on IMO ProofBench and CNML from paper: > they train an LLM-based verifier for reward function > they train this model using verifier, and ask it to resolve issues on its own > [image]

2025-11-30 View on X

The Decoder

DeepSeek says its new DeepSeekMath-V2 model got gold-medal level status on the International Mathematical Olympiad 2025 and Chinese Mathematical Olympiad 2024

where models prove formal mathematical theorems—GPT-5 scores 20%. Gemini Deep Think IMO Gold hits 65.7%. DeepSeek Math V2 (Heavy) scores 61.9%. That's second place—but Gemini is...

View original

DeepSeek released DeepSeekMathv2 (based on DeepSeek-V3.2-Exp-Base) outperforming Gemini DeepThink on IMO ProofBench and CNML from paper: > they train an LLM-based verifier for reward function > they train this model using verifier, and ask it to resolve issues on its own > [image]

2025-11-29 View on X

The Decoder

DeepSeek says its new DeepSeekMath-V2 model got gold-medal level status on the International Mathematical Olympiad 2025 and Chinese Mathematical Olympiad 2024

Chinese startup Deepseek reports its new DeepseekMath-V2 model has reached gold medal status at the Math Olympiad …

View original

DeepSeek-OCR is out! 🔥 my take ⤵️ > pretty insane it can parse and re-render charts in HTML > it uses CLIP and SAM features concatenated, so better grounding > very efficient per vision tokens/performance ratio > covers 100 languages [image]

2025-10-21 View on X

The Decoder

DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute

the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8....

View original

my vibe tests with Qwen3-Omni family of models > document performance with Instruct is very good 🎯 > video understanding is nice ⏯️ > Thinking performs better in English > I suggest to use Captioner if you really want audio output, other two hallucinates a bit [video]

2025-09-24 View on X

Simon Willison's Weblog

Alibaba releases the Qwen3-VL vision models, the Qwen3Guard “safety moderation” models, and three closed-weight models, including Qwen3-Max with 1T+ parameters

Qwen 50.6k — Safetensors qwen3_vl_moe Julian Nabil / Forbes Middle East : Alibaba Introduces Qwen3-Max AI Model With Over 1T Parameters Markus Kasanmascheff / WinBuzzer : Alibaba...

View original

my vibe tests with Qwen3-Omni family of models > document performance with Instruct is very good 🎯 > video understanding is nice ⏯️ > Thinking performs better in English > I suggest to use Captioner if you really want audio output, other two hallucinates a bit [video]

2025-09-24 View on X

Bloomberg

Alibaba's Hong Kong-listed shares hit a nearly four-year high after CEO Eddie Wu announced plans to increase AI spending beyond the $53B target over three years

Alibaba Group Holding Ltd.'s shares surged to their highest in nearly four years after revealing plans to ramp up AI spending past …

View original

Qwen just dropped Qwen2.5-VL-32B-Instruct ☄️ > 32B vision LM > Qwen-2.5-VL-72B in math, reasoning, RAG > better aligned for human preferences > same architecture, available in @huggingface transformers find model and demo on the next one ⤵️ [image]

2025-03-25 View on X

Simon Willison's Weblog

Alibaba releases Qwen2.5-VL-32B, a 32B open model under Apache 2.0, claiming better math reasoning and alignment with human preferences than earlier 2.5 models

Qwen2.5-VL-32B: Smarter and Lighter. The second big open weight LLM release from China today - the first being DeepSeek v3-0324.

View original

people who are baffled by DeepSeek have been and still are sleeping on Qwen, InternLM, ByteDance and Tencent here's couple of fan-favorite models from them 🪭

2025-01-27 View on X

Bloomberg

DeepSeek's iOS app tops the App Store's Top Free Apps chart in the US, beating ChatGPT, stirring doubts in Silicon Valley about the strength of the US' AI lead

- App's lower-cost model upends premise for AI spending boom — Stocks of chip gear makers ASML and Advantest plunge

View original

QwQ can see 🔥 @Alibaba_Qwen released QvQ, a vision LM with reasoning 😱 it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo! in the next one ⬇️ [image]

2024-12-26 View on X

Qwen

Alibaba releases QvQ-72B-Preview, an experimental research model focused on “enhancing visual reasoning capabilities”, built on Qwen2-VL-72B

QVQ-72B-Preview is an experimental research model developed by the Qwen team … QwenLM on GitHub : Qwen2-VL — Introduction After a year's relentless efforts, today we are thrilled...

View original

8x22B but checkpoints are also here if you feel like checking them out 👀 https://huggingface.co/...

2024-04-11 View on X

VentureBeat

Mistral AI launches Mixtral 8x22B, its latest sparse mixture-of-experts model, after releasing Mixtral 8x7B in December 2023

As Google unleashed a barrage of artificial intelligence announcements at its Cloud Next conference, Mistral AI decided to jump into action with the launch …

View original

you cannot just blame open-source library for a GDPR breach. open-source doesn't work like that 🙂 https://twitter.com/...

2023-03-23 View on X

Reuters

Sam Altman says OpenAI has fixed a “significant issue” in ChatGPT after a bug in an open-source library let some users see titles of other users' chat history

ChatGPT-owner OpenAI said on Wednesday it had fixed a bug that caused a “significant issue” of a small set of users …

View original

I mean not explaining one thing you don't do and expecting others to understand why you don't do it is a bit (????) also why do we keep using the phrase AGI so conveniently is a bit weird as well https://twitter.com/...

2023-03-16 View on X

The Verge

Experts criticize OpenAI for not disclosing GPT-4's training data or methods used; OpenAI co-founder says its past approach to openly sharing research was wrong

The system's capabilities are still being assessed, but as researchers and experts pore over its accompanying materials …

View original

Google has announced MUM (https://blog.google/...) “1000 times more powerful than BERT and can generate as well” First thing coming into mind: how is it different than T5 then? (an encoder-decoder model) Answer: It is trained on downstream tasks on 75 non-English languages!

2021-05-19 View on X

The Verge

Google announces LaMDA, a language model for dialogue applications that it says represents a “breakthrough” for having natural conversations with AI

View original