eliebakouch · TEXXR

ok this is very interesting, this is not the same perf than gpt5.3, and might not be the same arch as well? > Codex-Spark marks the first milestone in our partnership with Cerebras. Codex-Spark is optimized to feel near-instant when served on ultra-low latency hardware (from [image]

2026-02-14 View on X

ZDNET

OpenAI debuts a research preview of GPT-5.3-Codex-Spark, a smaller version of GPT-5.3-Codex that it claims generates code 15 times faster, for ChatGPT Pro users

View original

wtf, minimax M2.5 benchmark are insane and it's probably the same base model so only 10B active parameters??? [image]

2026-02-13 View on X

MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5. — Extensively trained with reinforcement learning …

View original

ok this is very interesting, this is not the same perf than gpt5.3, and might not be the same arch as well? > Codex-Spark marks the first milestone in our partnership with Cerebras. Codex-Spark is optimized to feel near-instant when served on ultra-low latency hardware (from [image]

2026-02-13 View on X

ZDNET

OpenAI debuts a research preview of GPT-5.3-Codex-Spark, a smaller version of GPT-5.3-Codex that it claims generates code 15 times faster, for ChatGPT Pro users

ZDNET's key takeaways — OpenAI targets “conversational” coding, not slow batch-style agents. — Big latency wins: 80% faster roundtrip, 50% faster time-to-first-token.

View original

wtf, minimax M2.5 benchmark are insane and it's probably the same base model so only 10B active parameters??? [image]

2026-02-12 View on X

MiniMax

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

Today we're introducing our latest model, MiniMax-M2.5. — Extensively trained with reinforcement learning …

View original

GLM-5 is out, amazing release with very very good benchmark scores even on tasks like @andonlabs vending bench 2 i think one of the most crazy parts of this is that the RL framework that they use is open (based on megatron for training, @sgl_project for inference), it's somewhat [image]

2026-02-12 View on X

Z.ai

Z.ai launches GLM-5, saying its flagship open-weight model has “best-in-class performance among all open-source models” in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways …

View original

GLM-5 is out, amazing release with very very good benchmark scores even on tasks like @andonlabs vending bench 2 i think one of the most crazy parts of this is that the RL framework that they use is open (based on megatron for training, @sgl_project for inference), it's somewhat [image]

2026-02-12 View on X

Reuters

Z.ai says it will raise prices by at least 30% for new GLM coding plan subscribers to accommodate surging demand for its AI coding tools

View original

ok this is very interesting, this is not the same perf than gpt5.3, and might not be the same arch as well? > Codex-Spark marks the first milestone in our partnership with Cerebras. Codex-Spark is optimized to feel near-instant when served on ultra-low latency hardware (from [image]

2026-02-12 View on X

ZDNET

OpenAI launches a research preview of GPT-5.3-Codex-Spark, a smaller version of GPT-5.3-Codex that it claims generates code 15 times faster, for Pro users

ZDNET's key takeaways — OpenAI targets “conversational” coding, not slow batch-style agents. — Big latency wins: 80% faster roundtrip, 50% faster time-to-first-token.

View original

GLM-5 is out, amazing release with very very good benchmark scores even on tasks like @andonlabs vending bench 2 i think one of the most crazy parts of this is that the RL framework that they use is open (based on megatron for training, @sgl_project for inference), it's somewhat [image]

2026-02-11 View on X

Z.ai

Z.ai launches GLM-5, its flagship open-weight model, saying it has best-in-class performance among open-source models in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways …

View original

GLM 5 is 2x the total parameter of GLM 4.5 + deepseek sparse attention for efficient long context this is going to be a crazy model [image]

2026-02-10 View on X

Nikkei Asia

Alibaba and Tencent are releasing new models and spending millions on “red envelope” freebies to woo users ahead of the Lunar New Year

HONG KONG — China's biggest AI companies are releasing new models and handing out “red envelope” …

View original

GLM 5 is 2x the total parameter of GLM 4.5 + deepseek sparse attention for efficient long context this is going to be a crazy model [image]

2026-02-10 View on X

The Information

Source: Chinese AI startup Zhipu anonymously released its new AI model GLM-5 on OpenRouter under the name Pony Alpha; Zhipu plans to debut GLM-5 later this week

Zhipu, one of China's prominent AI developers, has anonymously released its new large language model under a different name on OpenRouter …

View original

Kimi K2.5 is NOT just a small iteration on top of k2, it's now have fully multimodal understanding INCLUDING video! [image]

2026-01-27 View on X

Kimi

Moonshot says Kimi K2.5 builds on K2 with “pretraining over ~15T mixed visual and text tokens” and “can self-direct an agent swarm with up to 100 sub-agents”

Today, we are introducing Kimi K2.5, the most powerful open-source model to date.

View original

very nice release by the kimi team, benchmarks are on par with opus 4.5, gpt 5.2 xhigh, gemini 3.0 pro there is also some nice details on the parallel RL part in the tech blog explaining how they build K2.5 agent swarm [image]

2026-01-27 View on X

Bloomberg

Chinese startup Moonshot releases Kimi K2.5, saying the model can process text, images, and videos simultaneously and beats its open-source peers in some tests

Alibaba Group Holding Ltd.-backed Moonshot AI released an upgrade of its flagship model, heating up a domestic arms race ahead …

View original

Kimi K2.5 is NOT just a small iteration on top of k2, it's now have fully multimodal understanding INCLUDING video! [image]

2026-01-27 View on X

Bloomberg

Chinese startup Moonshot releases Kimi K2.5, saying the model can process text, images, and videos simultaneously and beats its open-source peers in some tests

Alibaba Group Holding Ltd.-backed Moonshot AI released an upgrade of its flagship model, heating up a domestic arms race ahead …

View original

very nice release by the kimi team, benchmarks are on par with opus 4.5, gpt 5.2 xhigh, gemini 3.0 pro there is also some nice details on the parallel RL part in the tech blog explaining how they build K2.5 agent swarm [image]

2026-01-27 View on X

Kimi

Moonshot says Kimi K2.5 builds on K2 with “pretraining over ~15T mixed visual and text tokens” and “can self-direct an agent swarm with up to 100 sub-agents”

Today, we are introducing Kimi K2.5, the most powerful open-source model to date.

View original

the gap in design taste and vibe coding ability between GLM 4.6 and GLM 4.7 is impressive (see the blog for more examples), seems to be the main focus of this release expecting minimax M2.1 to focus on the same thing so it's going to be interesting! [image]

2025-12-23 View on X

Z.ai

Chinese AI startup Z.ai releases GLM-4.7, an open-weight model that Z.ai says delivers significant improvements in coding performance compared to GLM-4.6

like 210 — Z.ai 6.24k — Text Generation Transformers Safetensors English Chinese glm4_moe conversational eWeek : Chinese AI Startup Z.ai Takes On OpenAI Via Cheaper Prices Vinc...

View original

very interesting table from deepseek v3.2 that compares the output token count on different benchmarks, dsv3.2 speciale version thinks much more than any other model, BUT since they are using sparse attention the inference cost will still be ok? [image]

2025-12-01 View on X

Bloomberg

DeepSeek releases DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which it calls “reasoning-first models built for agents”, after releasing V3.2-Exp in September

China's DeepSeek unveiled two new versions of an experimental artificial-intelligence model it released weeks ago …

View original

deepseek math v2 is the first open source model to reach gold on IMO? and we get a tech report, what an amazing release [image]

2025-11-30 View on X

The Decoder

DeepSeek says its new DeepSeekMath-V2 model got gold-medal level status on the International Mathematical Olympiad 2025 and Chinese Mathematical Olympiad 2024

where models prove formal mathematical theorems—GPT-5 scores 20%. Gemini Deep Think IMO Gold hits 65.7%. DeepSeek Math V2 (Heavy) scores 61.9%. That's second place—but Gemini is...

View original

deepseek math v2 is the first open source model to reach gold on IMO? and we get a tech report, what an amazing release [image]

2025-11-29 View on X

The Decoder

DeepSeek says its new DeepSeekMath-V2 model got gold-medal level status on the International Mathematical Olympiad 2025 and Chinese Mathematical Olympiad 2024

Chinese startup Deepseek reports its new DeepseekMath-V2 model has reached gold medal status at the Math Olympiad …

View original

we're very close to 50% on HLE, and bonus point: it's with an open model :) [image]

2025-11-07 View on X

CNBC

Chinese startup Moonshot releases Kimi K2 Thinking, an open-weight model it claims beats GPT-5 in agentic capabilities; source: the model cost $4.6M to train

Chinese startup Moonshot on Thursday released its latest generative artificial intelligence model which claims to beat OpenAI's ChatGPT in …

View original

> “200-300 sequential tool calls” this is really the impressive part of this release imo, can't wait to see how they did it [image]

2025-11-07 View on X

CNBC

Chinese startup Moonshot releases Kimi K2 Thinking, an open-weight model it claims beats GPT-5 in agentic capabilities; source: the model cost $4.6M to train

Chinese startup Moonshot on Thursday released its latest generative artificial intelligence model which claims to beat OpenAI's ChatGPT in …

View original