Open Source Models Closing In on Closed Source: What a 6-Point Gap Means

Open Source Models Closing In on Closed Source: What a 6-Point Gap Means

The Signal

The latest Intelligence Index data reveals an underappreciated trend: the capability gap between Chinese open-source models and global closed-source flagships is rapidly converging.

ModelIntelligence IndexOpen SourcePrice Positioning
GPT-5.560Closed$5/$30 per M
Gemini 3 / Claude57Closed$3.50/$15 per M
Kimi K2.654Open~$1.70/$3 per M
MiMo V2.5 Pro54OpenMIT License
DeepSeek V4 Pro52Open$2.20/$3.48 per M
GLM-5.1~50OpenSubscription
MiniMax M2.7~49OpenLow-cost

The gap between GPT-5.5 and Kimi K2.6 is only 6 points. Given that Kimi K2.6’s API costs just 1/10 of GPT-5.5, this cost-performance curve is steep enough to change most enterprises’ model selection decisions.

The Practical Meaning of a 6-Point Gap

The Intelligence Index was designed to comprehensively evaluate model capabilities in real-world scenarios — not memorized benchmark scores, but a weighted score across reasoning, coding, instruction following, long context, and more.

What does a 6-point gap mean?

In 80% of daily development scenarios, users cannot tell the difference.

A developer sharing their “budget AI package” on VEX put it plainly:

“I use DeepSeek V4 Flash for coding — the free tier is enough for daily use. When I need reasoning power, I switch to Pro, pay-per-use, and it costs just a few bucks a month.”

This isn’t theoretical “good enough” — it’s a choice in real production environments. When Kimi K2.6 beat Claude Opus 4.7 on LiveBench (a dynamic anti-cheating evaluation), the narrative of closed-source models’ “capability moat” began to crumble.

The Catch-Up Path of Open Source Models

Looking at the Intelligence Index trajectory:

2025 Q2: GPT-5.0 (50) vs DeepSeek V3 (38) → 12-point gap
2025 Q4: GPT-5.2 (55) vs DeepSeek V4 (45) → 10-point gap
2026 Q1: GPT-5.5 (60) vs Kimi K2.6 (54) → 6-point gap

The catch-up pace is accelerating. With the gap shrinking by 2-4 points every six months, open-source models could reach the current GPT-5.5 level by the end of 2026.

But this isn’t a simple “more parameters = better” story. Both Kimi K2.6 and MiMo V2.5 Pro use MoE (Mixture of Experts) architecture, achieving trillion-level total parameters while keeping active parameters around 50B. This means inference costs can be drastically reduced without sacrificing capability.

The Overlooked Variable: Practical Gap

The US CAISI agency’s evaluation report stated that DeepSeek V4 Pro’s comprehensive capability “lags the frontier by about 8 months.” This judgment is partially reflected in the Intelligence Index — 52 points is indeed below 60.

But the “8-month gap” interpretation needs full context:

  • GPT-5.5 is an iteration of GPT-5.0 released last August, and DeepSeek V4 Pro’s capability has already caught up to that version
  • In coding, Chinese language understanding, and long-text processing, domestic models perform in the same tier as international flagships
  • Open weights + local deployment capability is something closed-source models can never provide

One developer’s summary was precise:

“Parameters aren’t lacking, benchmark scores aren’t lacking — so where’s the gap? The biggest gap is real-world practice. But if your scenario doesn’t need the frontier’s 100% capability, then 92% capability at 1/10 the price is the better choice.”

Landscape Assessment

The Intelligence Index data is rewriting a fundamental assumption: that closed-source models’ capability advantage is permanent.

When open-source models approach closed-source flagships within 6 points while costing 1/5 to 1/10 the price, market competition logic shifts from “who’s the strongest” to “who’s the best fit.”

The cascading effects of this shift:

  1. Enterprise procurement: Moving from “buy the most expensive” to “allocate by scenario” — core reasoning with GPT-5.5, daily development with DeepSeek, long documents with Kimi
  2. Individual developers: Multi-model routing becomes a standard skill — knowing how to orchestrate models matters more than mastering a single one
  3. Model vendors: Closed-source vendors must prove that the “6-point gap” has irreplaceable value in specific scenarios, otherwise price stratification will directly translate into market share loss

Action Items

  • If you’re evaluating model migration: Test Kimi K2.6 or DeepSeek V4 Pro in 20% of your real business scenarios first — the 6-point Intelligence Index gap is likely imperceptible in daily use
  • If you’re making model procurement decisions: Don’t just look at absolute Intelligence Index scores — calculate “cost per Intelligence point” — Kimi K2.6 costs about $0.055/M token per point, GPT-5.5 costs about $0.50/M token per point, a 9x difference
  • If you’re building Agent applications: Open-source MoE models have even more pronounced cost advantages in Agent scenarios, because Agents typically require massive token consumption, magnifying the per-unit cost impact