Kimi Uses DeepSeek Architecture, DeepSeek Uses Kimi Optimizer: The "Open-Source Symbiosis" of Chinese AI Models

Kimi Uses DeepSeek Architecture, DeepSeek Uses Kimi Optimizer: The "Open-Source Symbiosis" of Chinese AI Models

Bottom Line: Chinese Models’ “Open-Source Symbiosis” Is Rewriting the Rules

In late April 2026, the AI community noticed a noteworthy phenomenon: Kimi K2.6’s underlying architecture inherits DeepSeek v3’s design, while DeepSeek V4’s training optimizer comes from Kimi team’s Muon optimizer. This isn’t simple “borrowing” — it’s a technology cycle based on open-source protocols. Both sides continue to build on each other’s innovations, ultimately achieving performance comparable to frontier closed models at just 1/8 the training cost.

This “cross-innovation” model is becoming a unique competitive advantage for Chinese open-source AI.

Technical Breakdown of Cross-Innovation

Kimi K2.6 → Inheriting DeepSeek v3 Architecture

Kimi K2.6 (Moonshot AI) adopted DeepSeek v3’s MoE (Mixture of Experts) + MLA (Multi-Latent Attention) design at the architecture level. Key features:

DimensionDeepSeek v3 ArchitectureKimi K2.6’s Evolution
Parameters671B total, 37B activeExtended to 1.6T total
Context Window128KPublic 256K, hardware-limited 1M
Inference EfficiencyMLA reduces KV CacheCombined with proprietary scheduling
Agent CapabilityBasic Tool CallLeading in HLE, DeepSearchQA

Kimi K2.6 built on this to strengthen tool-augmented Agent capabilities, excelling in HLE (Humanity’s Last Exam), DeepSearchQA, and real-world software engineering tasks — earning the community label of “elite agentic generalist.”

DeepSeek V4 → Adopting Kimi’s Muon Optimizer

DeepSeek V4 introduced the Muon optimizer in its training — originally developed by Kimi/Moonshot AI. Muon’s core advantages:

  • More efficient gradient updates: Converges more stably under MoE architecture compared to traditional AdamW
  • Lower VRAM usage: Smaller optimizer state allows larger batch sizes
  • Chinese chip compatibility: Better adaptation on Huawei Ascend NPUs than traditional optimizers

DeepSeek V4 further invented new attention architectures on this foundation, simultaneously improving training and inference efficiency, ultimately demonstrating “brute reasoning” capability in theorem proving, competition math, and algorithmic coding tasks.

Performance: Open Source vs Closed Source

Based on community comprehensive evaluation (April 2026 data):

ModelScoreParametersContextAPI Cost (vs GPT-5.5)
Kimi K2.6731.6T256K-1M~1/8
DeepSeek V4 Flash73N/A1M~1/8
DeepSeek V4 Pro73N/A1M~1/10
Gemma 4 31B7231B128K~1/5
Qwen3.6 27B7127B128K~1/6
MiniMax M2.761N/A128K~1/7
GLM 5.160N/A128K~1/8

Key observation: The top three — Kimi K2.6, DeepSeek V4 Flash/Pro — all score 73, tied for first. Given their API costs are just 1/8 to 1/10 of GPT-5.5, the cost-performance advantage is extremely significant.

Landscape Judgment: Why Only Chinese Models Can Play This Game

1. Open-Source Protocol Technology Cycle

Chinese leading model companies generally adopt permissive open-source licenses (Apache 2.0 or similar), enabling architecture and optimizer sharing between companies. In contrast, US closed models’ architecture details are trade secrets, preventing similar “technology spillover.”

2. Compute Constraints Drive Architecture Innovation

As Andrej Karpathy noted: “Creativity loves constraints.” Chinese model companies have limited access to high-end NVIDIA GPUs, with some turning to Huawei Ascend chips. This compute constraint forces teams to make deep architectural optimizations rather than simply throwing more compute at problems.

3. Talent Flow as Accelerator

High talent mobility among Chinese AI companies naturally creates a “knowledge sharing network” as technical ideas and practical experience spread.

Actionable Advice

For Developers

  • API selection: Test Kimi K2.6 first for Agent/Tool Call scenarios; test DeepSeek V4 Pro first for reasoning/math/coding
  • Cost-sensitive use cases: DeepSeek V4 Flash’s 1M context + 1/10 cost is one of the best solutions for long document processing
  • Chinese chip adaptation: Watch DeepSeek V4’s Ascend optimization progress — Ascend-Native versions already exist

For Investors

  • The “symbiotic evolution” model of Chinese open-source models is forming collective competitiveness — a single model falling behind no longer means the entire ecosystem falls behind
  • Moonshot AI (Kimi) and DeepSeek’s valuation logic should shift from “single company” to “ecosystem contributor”

Sources