Qwen3.6 Family Tops Intelligence Index: 27B Leads but Inference Costs 21x More Than Gemma 4

Qwen3.6 Family Tops Intelligence Index: 27B Leads but Inference Costs 21x More Than Gemma 4

Bottom Line First

After the full Qwen3.6 family rolled out in late April 2026, it delivered a highly debated scorecard:

  • Qwen3.6-27B tops the Artificial Analysis Intelligence Index (open-source models under 150B parameters) with a score of 46
  • Qwen3.6-35B quantized versions achieve 95/92/73 tps on the DGX-Spark leaderboard, surpassing GPT-OSS-120B and Gemma 4 26B
  • However, completing the full Intelligence Index requires approximately 3.7x more output tokens, with a total cost 21x higher than Gemma 4 31B

This isn’t a “who’s stronger” story—it’s a “performance tax” story. Qwen3.6 trades more tokens for higher scores, but the cost of inference balloons exponentially.

Intelligence Index Data Overview

ModelIntelligence IndexParametersOutput Token MultiplierRelative Cost
Qwen3.6-27B4627B3.7x21x
Gemma 4 31B3931B1.0x1.0x
Qwen3.6-35B (Q8)35B
Qwen3.6-35B (Q6)35B
Qwen3.6-35B (Q4)35B
GPT-OSS-120B120B

Source: Artificial Analysis Intelligence Index, DGX-Spark Leaderboard (Apr 2026)

Qwen3.6-27B’s score of 46 is indeed impressive, ranking first among open-source models under 150B parameters. But deeper analysis reveals:

  1. Abnormally high token consumption: To complete the same test suite, Qwen3.6-27B generates 3.7x more output tokens than Gemma 4 31B
  2. Massive cost gap: Combining API calls and inference time, Qwen3.6’s total cost is approximately 21x that of Gemma 4
  3. Quantization fills the gap: The 35B Q8/Q6/Q4 quantized models are progressively hitting the DGX-Spark shelf, achieving 95/92/73 tps respectively

Quantized Models: The Consumer Hardware Entry Ticket

The performance of Qwen3.6-35B’s three quantized versions (Q8/Q6/Q4) on DGX-Spark is noteworthy:

  • Q8 (8-bit): 95 tps — minimal precision loss, ideal for quality-sensitive scenarios
  • Q6 (6-bit): 92 tps — best price-performance ratio, the sweet spot between accuracy and speed
  • Q4 (4-bit): 73 tps — lowest VRAM usage, suitable for edge deployment

Notably, even the Q4 quantized version of the 35B model cannot run on RTX 3090/4090 (24GB VRAM)—it will directly OOM. This means consumer users need at least 40GB+ VRAM hardware (such as RTX 5090 or professional cards) to run it.

In comparison, the 27B version after quantization can barely run on 24GB VRAM cards, but will significantly sacrifice context length.

Landscape Assessment

The Qwen3.6 family release reveals an industry trend: open-source models’ “leaderboard-chasing strategy” is being checked by cost awareness.

  • Qwen camp: Maximizing Intelligence Index scores by increasing output tokens to enhance complex reasoning
  • Gemma camp: Lightweight efficiency approach, A4B (activating 4B parameters) architecture enables multi-instance inference on consumer hardware
  • Middle ground: Quantized models are becoming the practical balance between performance and cost

For enterprise users, the key choice is: do you need the absolute top score on the Intelligence Index, or the optimal output per token?

Action Recommendations

ScenarioRecommendedReason
Academic research / leaderboard chasingQwen3.6-27BHighest Intelligence Index score
Production inferenceGemma 4 31B21x cheaper, only 7-point gap
Consumer hardware deploymentQwen3.6-35B Q4Lowest VRAM usage, 73 tps
Best valueQwen3.6-35B Q692 tps, acceptable precision loss
Multi-instance concurrencyGemma 4 26B A4BCan run multiple instances on one laptop

Key judgment: If your use case doesn’t require absolute top 5% performance on the Intelligence Index, Gemma 4’s cost advantage is extremely significant. But if you’re doing code generation or complex reasoning, the token consumption of Qwen3.6 translates into real score gains—this calculation depends on your budget constraints.