Bottom Line: Chinese Models’ “Open-Source Symbiosis” Is Rewriting the Rules
In late April 2026, the AI community noticed a noteworthy phenomenon: Kimi K2.6’s underlying architecture inherits DeepSeek v3’s design, while DeepSeek V4’s training optimizer comes from Kimi team’s Muon optimizer. This isn’t simple “borrowing” — it’s a technology cycle based on open-source protocols. Both sides continue to build on each other’s innovations, ultimately achieving performance comparable to frontier closed models at just 1/8 the training cost.
This “cross-innovation” model is becoming a unique competitive advantage for Chinese open-source AI.
Technical Breakdown of Cross-Innovation
Kimi K2.6 → Inheriting DeepSeek v3 Architecture
Kimi K2.6 (Moonshot AI) adopted DeepSeek v3’s MoE (Mixture of Experts) + MLA (Multi-Latent Attention) design at the architecture level. Key features:
| Dimension | DeepSeek v3 Architecture | Kimi K2.6’s Evolution |
|---|---|---|
| Parameters | 671B total, 37B active | Extended to 1.6T total |
| Context Window | 128K | Public 256K, hardware-limited 1M |
| Inference Efficiency | MLA reduces KV Cache | Combined with proprietary scheduling |
| Agent Capability | Basic Tool Call | Leading in HLE, DeepSearchQA |
Kimi K2.6 built on this to strengthen tool-augmented Agent capabilities, excelling in HLE (Humanity’s Last Exam), DeepSearchQA, and real-world software engineering tasks — earning the community label of “elite agentic generalist.”
DeepSeek V4 → Adopting Kimi’s Muon Optimizer
DeepSeek V4 introduced the Muon optimizer in its training — originally developed by Kimi/Moonshot AI. Muon’s core advantages:
- More efficient gradient updates: Converges more stably under MoE architecture compared to traditional AdamW
- Lower VRAM usage: Smaller optimizer state allows larger batch sizes
- Chinese chip compatibility: Better adaptation on Huawei Ascend NPUs than traditional optimizers
DeepSeek V4 further invented new attention architectures on this foundation, simultaneously improving training and inference efficiency, ultimately demonstrating “brute reasoning” capability in theorem proving, competition math, and algorithmic coding tasks.
Performance: Open Source vs Closed Source
Based on community comprehensive evaluation (April 2026 data):
| Model | Score | Parameters | Context | API Cost (vs GPT-5.5) |
|---|---|---|---|---|
| Kimi K2.6 | 73 | 1.6T | 256K-1M | ~1/8 |
| DeepSeek V4 Flash | 73 | N/A | 1M | ~1/8 |
| DeepSeek V4 Pro | 73 | N/A | 1M | ~1/10 |
| Gemma 4 31B | 72 | 31B | 128K | ~1/5 |
| Qwen3.6 27B | 71 | 27B | 128K | ~1/6 |
| MiniMax M2.7 | 61 | N/A | 128K | ~1/7 |
| GLM 5.1 | 60 | N/A | 128K | ~1/8 |
Key observation: The top three — Kimi K2.6, DeepSeek V4 Flash/Pro — all score 73, tied for first. Given their API costs are just 1/8 to 1/10 of GPT-5.5, the cost-performance advantage is extremely significant.
Landscape Judgment: Why Only Chinese Models Can Play This Game
1. Open-Source Protocol Technology Cycle
Chinese leading model companies generally adopt permissive open-source licenses (Apache 2.0 or similar), enabling architecture and optimizer sharing between companies. In contrast, US closed models’ architecture details are trade secrets, preventing similar “technology spillover.”
2. Compute Constraints Drive Architecture Innovation
As Andrej Karpathy noted: “Creativity loves constraints.” Chinese model companies have limited access to high-end NVIDIA GPUs, with some turning to Huawei Ascend chips. This compute constraint forces teams to make deep architectural optimizations rather than simply throwing more compute at problems.
3. Talent Flow as Accelerator
High talent mobility among Chinese AI companies naturally creates a “knowledge sharing network” as technical ideas and practical experience spread.
Actionable Advice
For Developers
- API selection: Test Kimi K2.6 first for Agent/Tool Call scenarios; test DeepSeek V4 Pro first for reasoning/math/coding
- Cost-sensitive use cases: DeepSeek V4 Flash’s 1M context + 1/10 cost is one of the best solutions for long document processing
- Chinese chip adaptation: Watch DeepSeek V4’s Ascend optimization progress — Ascend-Native versions already exist
For Investors
- The “symbiotic evolution” model of Chinese open-source models is forming collective competitiveness — a single model falling behind no longer means the entire ecosystem falls behind
- Moonshot AI (Kimi) and DeepSeek’s valuation logic should shift from “single company” to “ecosystem contributor”