Русская версия: Qwen3.6-35B-A3B: 3B Active Parameters Getting Close to 397B-Class Coding Performance

Это русская версия материала. Для полноты языковых маршрутов текст основан на существующей основной версии.

The Qwen team hid an interesting model inside the Qwen 3.6 line: 35B-A3B. It has 35B total parameters, but each token activates only about 3.6B.

Community coding tests are already making noise: on repository-level coding tasks, this MoE model with roughly 3B active parameters is reportedly approaching the level of a 397B dense model.

The Numbers

Core Qwen3.6-35B-A3B configuration:

Total parameters: 35B
Active parameters: 3.6B, roughly 10% per inference step
Architecture: sparse Mixture of Experts, similar in spirit to the Mixtral route
Deployment: available through AWS JumpStart

Compared with Qwen3.6-27B dense, the 27B model activates all parameters and performs well on coding, but latency and inference cost are much higher.

The point of 35B-A3B is simple: spend about one tenth of the compute and get close to full-size coding performance.

MoE's Cost-Performance Moment

MoE is not new. Mixtral 8x7B proved the direction in late 2023. But Qwen3.6-35B-A3B pushes the practical side further:

3.6B active parameters can fit into consumer-GPU territory
Repository-level coding ability gets close to what previously required far larger dense models
AWS JumpStart deployment means teams can call it without managing GPUs themselves

For independent developers and small teams, that is a useful combination. You do not need to rent an H100 to run a coding-capable local agent.

The Trade-Offs

Community tests also show the usual MoE weaknesses:

Routing instability: edge cases can route to the wrong experts and drop quality
Long-context decay: very long contexts are less stable than dense models
Harder fine-tuning: LoRA and QLoRA support for MoE is still less mature

These are not fatal, but they do mean MoE is currently better for inference-heavy use than heavy domain adaptation.

Alibaba Cloud's Bet

The Qwen 3.6 lineup is fairly clear: 35B MoE targets the cost-performance middle, 27B dense carries the quality benchmark, smaller models cover edge devices, and Max Preview guards the flagship API tier.

Putting 35B-A3B on AWS JumpStart is also a signal: Qwen is trying to enter the daily workflow of global developers, not just remain a "Chinese model" people test out of curiosity.

The next model worth watching is Qwen 3.6 8B dense. If that gets close to today's 27B coding level, edge deployment gets much more interesting.

Main sources:

Qwen Hugging Face page
Community coding test threads
AWS JumpStart model catalog

The Numbers

MoE's Cost-Performance Moment

The Trade-Offs

Alibaba Cloud's Bet

Похожие материалы

Самая большая ловушка при написании LLM кода для комбинаторной оптимизации: просишь оптимизировать — модель только всё портит

Чем детальнее оценочные критерии, тем больше модель находит лазейки: взлом системы вознаграждения в обучении с подкреплением на основе рубрик

RLHF тихонько разрушает «честность» ИИ: в чём суть Semantic Reward Collapse