Qwen3.6 Family Tops Intelligence Index: 27B Leads but Inference Costs 21x More Than Gemma 4

Bottom Line First

After the full Qwen3.6 family rolled out in late April 2026, it delivered a highly debated scorecard:

Qwen3.6-27B tops the Artificial Analysis Intelligence Index (open-source models under 150B parameters) with a score of 46
Qwen3.6-35B quantized versions achieve 95/92/73 tps on the DGX-Spark leaderboard, surpassing GPT-OSS-120B and Gemma 4 26B
However, completing the full Intelligence Index requires approximately 3.7x more output tokens, with a total cost 21x higher than Gemma 4 31B

This isn’t a “who’s stronger” story—it’s a “performance tax” story. Qwen3.6 trades more tokens for higher scores, but the cost of inference balloons exponentially.

Intelligence Index Data Overview

Model	Intelligence Index	Parameters	Output Token Multiplier	Relative Cost
Qwen3.6-27B	46	27B	3.7x	21x
Gemma 4 31B	39	31B	1.0x	1.0x
Qwen3.6-35B (Q8)	—	35B	—	—
Qwen3.6-35B (Q6)	—	35B	—	—
Qwen3.6-35B (Q4)	—	35B	—	—
GPT-OSS-120B	—	120B	—	—

Source: Artificial Analysis Intelligence Index, DGX-Spark Leaderboard (Apr 2026)

Qwen3.6-27B’s score of 46 is indeed impressive, ranking first among open-source models under 150B parameters. But deeper analysis reveals:

Abnormally high token consumption: To complete the same test suite, Qwen3.6-27B generates 3.7x more output tokens than Gemma 4 31B
Massive cost gap: Combining API calls and inference time, Qwen3.6’s total cost is approximately 21x that of Gemma 4
Quantization fills the gap: The 35B Q8/Q6/Q4 quantized models are progressively hitting the DGX-Spark shelf, achieving 95/92/73 tps respectively

Quantized Models: The Consumer Hardware Entry Ticket

The performance of Qwen3.6-35B’s three quantized versions (Q8/Q6/Q4) on DGX-Spark is noteworthy:

Q8 (8-bit): 95 tps — minimal precision loss, ideal for quality-sensitive scenarios
Q6 (6-bit): 92 tps — best price-performance ratio, the sweet spot between accuracy and speed
Q4 (4-bit): 73 tps — lowest VRAM usage, suitable for edge deployment

Notably, even the Q4 quantized version of the 35B model cannot run on RTX 3090/4090 (24GB VRAM)—it will directly OOM. This means consumer users need at least 40GB+ VRAM hardware (such as RTX 5090 or professional cards) to run it.

In comparison, the 27B version after quantization can barely run on 24GB VRAM cards, but will significantly sacrifice context length.

Landscape Assessment

The Qwen3.6 family release reveals an industry trend: open-source models’ “leaderboard-chasing strategy” is being checked by cost awareness.

Qwen camp: Maximizing Intelligence Index scores by increasing output tokens to enhance complex reasoning
Gemma camp: Lightweight efficiency approach, A4B (activating 4B parameters) architecture enables multi-instance inference on consumer hardware
Middle ground: Quantized models are becoming the practical balance between performance and cost

For enterprise users, the key choice is: do you need the absolute top score on the Intelligence Index, or the optimal output per token?

Action Recommendations

Scenario	Recommended	Reason
Academic research / leaderboard chasing	Qwen3.6-27B	Highest Intelligence Index score
Production inference	Gemma 4 31B	21x cheaper, only 7-point gap
Consumer hardware deployment	Qwen3.6-35B Q4	Lowest VRAM usage, 73 tps
Best value	Qwen3.6-35B Q6	92 tps, acceptable precision loss
Multi-instance concurrency	Gemma 4 26B A4B	Can run multiple instances on one laptop

Key judgment: If your use case doesn’t require absolute top 5% performance on the Intelligence Index, Gemma 4’s cost advantage is extremely significant. But if you’re doing code generation or complex reasoning, the token consumption of Qwen3.6 translates into real score gains—this calculation depends on your budget constraints.

Bottom Line First

Intelligence Index Data Overview

Quantized Models: The Consumer Hardware Entry Ticket

Landscape Assessment

Action Recommendations

Related

Gemini CLI v0.40.0 Supports Local Gemma: Smart Routing Makes Simple Tasks Free

Zhipu Publicly Shares GLM-5 Scaling Pain: Debugging Garbled Outputs Reveals the Dark Side of Scaling Laws

Anthropic Internal Feature Cardinal Exposed: Claude to Get Visual Interaction Retrospective