Kimi Uses DeepSeek Architecture, DeepSeek Uses Kimi Optimizer: The "Open-Source Symbiosis" of Chinese AI Models

Bottom Line: Chinese Models’ “Open-Source Symbiosis” Is Rewriting the Rules

In late April 2026, the AI community noticed a noteworthy phenomenon: Kimi K2.6’s underlying architecture inherits DeepSeek v3’s design, while DeepSeek V4’s training optimizer comes from Kimi team’s Muon optimizer. This isn’t simple “borrowing” — it’s a technology cycle based on open-source protocols. Both sides continue to build on each other’s innovations, ultimately achieving performance comparable to frontier closed models at just 1/8 the training cost.

This “cross-innovation” model is becoming a unique competitive advantage for Chinese open-source AI.

Technical Breakdown of Cross-Innovation

Kimi K2.6 → Inheriting DeepSeek v3 Architecture

Kimi K2.6 (Moonshot AI) adopted DeepSeek v3’s MoE (Mixture of Experts) + MLA (Multi-Latent Attention) design at the architecture level. Key features:

Dimension	DeepSeek v3 Architecture	Kimi K2.6’s Evolution
Parameters	671B total, 37B active	Extended to 1.6T total
Context Window	128K	Public 256K, hardware-limited 1M
Inference Efficiency	MLA reduces KV Cache	Combined with proprietary scheduling
Agent Capability	Basic Tool Call	Leading in HLE, DeepSearchQA

Kimi K2.6 built on this to strengthen tool-augmented Agent capabilities, excelling in HLE (Humanity’s Last Exam), DeepSearchQA, and real-world software engineering tasks — earning the community label of “elite agentic generalist.”

DeepSeek V4 → Adopting Kimi’s Muon Optimizer

DeepSeek V4 introduced the Muon optimizer in its training — originally developed by Kimi/Moonshot AI. Muon’s core advantages:

More efficient gradient updates: Converges more stably under MoE architecture compared to traditional AdamW
Lower VRAM usage: Smaller optimizer state allows larger batch sizes
Chinese chip compatibility: Better adaptation on Huawei Ascend NPUs than traditional optimizers

DeepSeek V4 further invented new attention architectures on this foundation, simultaneously improving training and inference efficiency, ultimately demonstrating “brute reasoning” capability in theorem proving, competition math, and algorithmic coding tasks.

Performance: Open Source vs Closed Source

Based on community comprehensive evaluation (April 2026 data):

Model	Score	Parameters	Context	API Cost (vs GPT-5.5)
Kimi K2.6	73	1.6T	256K-1M	~1/8
DeepSeek V4 Flash	73	N/A	1M	~1/8
DeepSeek V4 Pro	73	N/A	1M	~1/10
Gemma 4 31B	72	31B	128K	~1/5
Qwen3.6 27B	71	27B	128K	~1/6
MiniMax M2.7	61	N/A	128K	~1/7
GLM 5.1	60	N/A	128K	~1/8

Key observation: The top three — Kimi K2.6, DeepSeek V4 Flash/Pro — all score 73, tied for first. Given their API costs are just 1/8 to 1/10 of GPT-5.5, the cost-performance advantage is extremely significant.

Landscape Judgment: Why Only Chinese Models Can Play This Game

1. Open-Source Protocol Technology Cycle

Chinese leading model companies generally adopt permissive open-source licenses (Apache 2.0 or similar), enabling architecture and optimizer sharing between companies. In contrast, US closed models’ architecture details are trade secrets, preventing similar “technology spillover.”

2. Compute Constraints Drive Architecture Innovation

As Andrej Karpathy noted: “Creativity loves constraints.” Chinese model companies have limited access to high-end NVIDIA GPUs, with some turning to Huawei Ascend chips. This compute constraint forces teams to make deep architectural optimizations rather than simply throwing more compute at problems.

3. Talent Flow as Accelerator

High talent mobility among Chinese AI companies naturally creates a “knowledge sharing network” as technical ideas and practical experience spread.

Actionable Advice

For Developers

API selection: Test Kimi K2.6 first for Agent/Tool Call scenarios; test DeepSeek V4 Pro first for reasoning/math/coding
Cost-sensitive use cases: DeepSeek V4 Flash’s 1M context + 1/10 cost is one of the best solutions for long document processing
Chinese chip adaptation: Watch DeepSeek V4’s Ascend optimization progress — Ascend-Native versions already exist

For Investors

The “symbiotic evolution” model of Chinese open-source models is forming collective competitiveness — a single model falling behind no longer means the entire ecosystem falls behind
Moonshot AI (Kimi) and DeepSeek’s valuation logic should shift from “single company” to “ecosystem contributor”