17 Days, 4 Models: China Open Source AI Arms Race and the Performance Landscape Reshuffle

Intelligence Summary

From late April to early May 2026, China’s AI industry released four open-source/open-weight flagship models within just 17 days:

GLM-5.1 (Zhipu AI): 754B MoE, MIT open-source license
Kimi K2.6 (Moonshot AI): 1T MoE, open weights
DeepSeek V4 (DeepSeek): Trillion-scale MoE, open source
MiMo V2.5 Pro (Xiaomi): Multimodal open-source model

Community testing yielded a concise but powerful conclusion: Kimi K2.6 is the fastest, GLM-5.1 is the most “fancy,” DeepSeek V4 is the most comprehensive, and Xiaomi MiMo is the slowest.

But behind this simple evaluation lies a profound transformation in Chinese open-source AI—from a “catch-up narrative” to “differentiated competition.”

Capability Profiles of the Four Models

GLM-5.1: The Most Versatile All-Rounder

GLM-5.1’s keyword is functional completeness. The 754B MoE architecture gives it capabilities with no weak dimensions:

Coding: Ranked first among domestic models in the code arena, surpassing both Kimi K2.6 and DeepSeek V4 Pro
Agent Tool Calling: Optimized specifically for long-duration autonomous execution and complex engineering tasks
Trained on Huawei Ascend: Fully trained on non-NVIDIA chips, zero NVIDIA dependency

The substance of “most fancy” is this: GLM-5.1 is closest to closed-source flagship models in functional breadth. It’s not the champion of any single category, but it’s the closest to “all-around” among open-source options.

Kimi K2.6: The Speed King

Kimi K2.6’s killer advantage is inference speed. In its 1T parameter MoE architecture, each token activates only about 32 billion parameters, meaning:

Free to use: Available for free inference on platforms like Fireworks AI
Strong in both coding and math: LiveCodeBench v6 score of 53.7%, surpassing Claude Sonnet 4
256K context window: Supports image and video input

The community consensus is clear: if you need rapid iteration and low-cost prototyping, Kimi K2.6 is currently the best choice. Its “speed” isn’t just inference speed—it’s the speed from idea to code.

DeepSeek V4: Comprehensive but “Last Place”?

DeepSeek V4 ranked fourth among domestic models in the Arena coding leaderboard, a result that sparked some discussion. But “last place” needs to be understood correctly:

The comparison baseline is domestic flagships: Fourth place is still world-class
SWE-bench 80.6%: Only 0.2 percentage points behind Claude Opus 4.6 (80.8%)
Cost advantage: API pricing is significantly lower than closed-source models with equivalent performance

DeepSeek V4’s “comprehensiveness” is reflected in: it maintains first-class standards across coding, reasoning, math, and multimodal dimensions, with no obvious weaknesses. But at this level of competition, “no weaknesses” doesn’t mean “has standout strengths.”

MiMo V2.5 Pro: Slow but Surprising

Xiaomi MiMo V2.5 Pro is the slowest among the four in inference speed, but it has a unique positioning: runnable on consumer-grade GPUs.

Native multimodal: Designed from the ground up as a multimodal model, not pieced together afterward
Xiaomi ecosystem integration: Deep integration with Xiaomi phones, cars, and IoT devices
GDPVal evaluation leader: Outstanding performance in specific evaluation dimensions

“Slowness” may not be a problem for Xiaomi—the company’s business model means it prioritizes end-user experience over extreme inference speed.

Landscape Reshuffle: From “Who Is Better” to “Who Fits Best”

The simultaneous existence of these four models marks an important paradigm shift:

Before: The goal of open-source models was to “catch up to GPT/Claude,” with evaluation based on a single performance leaderboard.

Now: All four domestic open-source models are at or near closed-source flagship levels on the Arena, and the evaluation standard has shifted to scenario fit:

Need fastest prototyping → Kimi K2.6
Need most comprehensive capabilities → GLM-5.1
Need lowest-cost production deployment → DeepSeek V4
Need end-device integration → MiMo V2.5 Pro

This isn’t a “who replaces whom” story—it’s the formation of a “division of labor” ecosystem.

Signal Interpretation

The release density of 4 flagship models in 17 days is itself a signal. This isn’t coincidental—it reflects:

Technology convergence: The maturation of core technologies like MoE architecture, GRPO optimization, and Thinking Tokens has significantly shortened R&D cycles across all vendors
Accelerated competition: When any one company releases a new model, others must follow within weeks or risk being perceived as “falling behind” by the market
Cost collapse: The continuous decline in training and inference costs is rapidly lowering the barrier to releasing flagship models

Meanwhile, the fact that GLM-5.1 was trained on Huawei Ascend further breaks the narrative that “only NVIDIA chips can train frontier models.” The diversification of the compute supply chain is moving from theory to practice.

Action Recommendations

Agent Framework Developers: Recommend establishing a “multi-model routing” strategy—use Kimi K2.6 as the default model for fast responses, GLM-5.1 as backup for complex tasks, and DeepSeek V4 as the cost optimization option for batch processing.
Enterprise Technology Selection: Don’t just look at single scores on leaderboards. Choose models based on your actual scenario (latency sensitivity, concurrency requirements, data privacy requirements).
Individual Developers: Kimi K2.6’s free inference service is currently the lowest-barrier way to experience a flagship model. Start with it.

Cross-Verification

This assessment is corroborated by multiple independent signals: community testing (a comparison post with 497 likes and 185 bookmarks), ranking changes on the Arena Leaderboard, and consistent performance across SWE-bench and LiveCodeBench. Meanwhile, the strong sales of Zhipu’s Coding Plan and Kimi’s intensive fundraising (over $3.9 billion in less than half a year) corroborate these models’ market competitiveness from a commercial perspective.

When four domestic open-source models simultaneously reach frontier levels, the nature of competition has shifted from “can we catch up” to “how do we differentiate.” This is a mark of maturity for China’s AI industry.