Bottom Line First
After the full Qwen3.6 family rolled out in late April 2026, it delivered a highly debated scorecard:
- Qwen3.6-27B tops the Artificial Analysis Intelligence Index (open-source models under 150B parameters) with a score of 46
- Qwen3.6-35B quantized versions achieve 95/92/73 tps on the DGX-Spark leaderboard, surpassing GPT-OSS-120B and Gemma 4 26B
- However, completing the full Intelligence Index requires approximately 3.7x more output tokens, with a total cost 21x higher than Gemma 4 31B
This isn’t a “who’s stronger” story—it’s a “performance tax” story. Qwen3.6 trades more tokens for higher scores, but the cost of inference balloons exponentially.
Intelligence Index Data Overview
| Model | Intelligence Index | Parameters | Output Token Multiplier | Relative Cost |
|---|---|---|---|---|
| Qwen3.6-27B | 46 | 27B | 3.7x | 21x |
| Gemma 4 31B | 39 | 31B | 1.0x | 1.0x |
| Qwen3.6-35B (Q8) | — | 35B | — | — |
| Qwen3.6-35B (Q6) | — | 35B | — | — |
| Qwen3.6-35B (Q4) | — | 35B | — | — |
| GPT-OSS-120B | — | 120B | — | — |
Source: Artificial Analysis Intelligence Index, DGX-Spark Leaderboard (Apr 2026)
Qwen3.6-27B’s score of 46 is indeed impressive, ranking first among open-source models under 150B parameters. But deeper analysis reveals:
- Abnormally high token consumption: To complete the same test suite, Qwen3.6-27B generates 3.7x more output tokens than Gemma 4 31B
- Massive cost gap: Combining API calls and inference time, Qwen3.6’s total cost is approximately 21x that of Gemma 4
- Quantization fills the gap: The 35B Q8/Q6/Q4 quantized models are progressively hitting the DGX-Spark shelf, achieving 95/92/73 tps respectively
Quantized Models: The Consumer Hardware Entry Ticket
The performance of Qwen3.6-35B’s three quantized versions (Q8/Q6/Q4) on DGX-Spark is noteworthy:
- Q8 (8-bit): 95 tps — minimal precision loss, ideal for quality-sensitive scenarios
- Q6 (6-bit): 92 tps — best price-performance ratio, the sweet spot between accuracy and speed
- Q4 (4-bit): 73 tps — lowest VRAM usage, suitable for edge deployment
Notably, even the Q4 quantized version of the 35B model cannot run on RTX 3090/4090 (24GB VRAM)—it will directly OOM. This means consumer users need at least 40GB+ VRAM hardware (such as RTX 5090 or professional cards) to run it.
In comparison, the 27B version after quantization can barely run on 24GB VRAM cards, but will significantly sacrifice context length.
Landscape Assessment
The Qwen3.6 family release reveals an industry trend: open-source models’ “leaderboard-chasing strategy” is being checked by cost awareness.
- Qwen camp: Maximizing Intelligence Index scores by increasing output tokens to enhance complex reasoning
- Gemma camp: Lightweight efficiency approach, A4B (activating 4B parameters) architecture enables multi-instance inference on consumer hardware
- Middle ground: Quantized models are becoming the practical balance between performance and cost
For enterprise users, the key choice is: do you need the absolute top score on the Intelligence Index, or the optimal output per token?
Action Recommendations
| Scenario | Recommended | Reason |
|---|---|---|
| Academic research / leaderboard chasing | Qwen3.6-27B | Highest Intelligence Index score |
| Production inference | Gemma 4 31B | 21x cheaper, only 7-point gap |
| Consumer hardware deployment | Qwen3.6-35B Q4 | Lowest VRAM usage, 73 tps |
| Best value | Qwen3.6-35B Q6 | 92 tps, acceptable precision loss |
| Multi-instance concurrency | Gemma 4 26B A4B | Can run multiple instances on one laptop |
Key judgment: If your use case doesn’t require absolute top 5% performance on the Intelligence Index, Gemma 4’s cost advantage is extremely significant. But if you’re doing code generation or complex reasoning, the token consumption of Qwen3.6 translates into real score gains—this calculation depends on your budget constraints.