Bottom Line
Ant Group (Inclusion AI / Ant Ling) has open-sourced two models in late April: Ling-2.6-Flash and Ling-2.6-1T, both using MoE architecture, MIT license, with BF16/FP8/INT4 precision variants. Compared to models of similar parameter scale, Ling’s core differentiation lies in extremely low activation parameters and execution-oriented design — not a benchmark-padding machine, but purpose-built for Agent workloads.
| Dimension | Ling-2.6-Flash | Ling-2.6-1T |
|---|---|---|
| Total Parameters | 104B | ~1T |
| Active Parameters | 7.4B | ~63B |
| Context Window | 256K | 256K+ |
| License | MIT | MIT |
| SWE-Bench Verified | 62 | 67+ |
| BFCL-V4 | 67 | 72+ |
| TAU2-Bench (Telecom) | 93.86 | 95+ |
What Happened
Ling-2.6-Flash: Ultra-Lightweight Agent Model
- April 29: Ling-2.6-Flash weights officially open-sourced. 104B total parameters, only 7.4B activated per inference — meaning it can run on consumer-grade GPUs (single RTX 4090 with INT4 quantization).
- Built on Ling 2.0 with hybrid linear attention mechanism, replacing the previous GQA attention with a more efficient hybrid approach, significantly reducing inference latency.
- SWE-Bench Verified 62, BFCL-V4 67, TAU2-Telecom 93.86 — all hard-scenario metrics, not academic leaderboard-padding datasets.
Ling-2.6-1T: Flagship Execution Model
- Following Flash, Ling-2.6-1T was released on the same day. ~1T total parameters, ~63B active parameters.
- Core design philosophy is “Execution-First”: reducing token waste during reasoning, skipping verbose internal monologue-style thinking, outputting executable results directly.
- Community feedback: many frontier models’ reasoning outputs are essentially wasted tokens — users pay for every internal thought, but task completion rates don’t improve proportionally. Ling-2.6-1T directly addresses this problem.
Why It Matters
1. A New Variable in the Chinese MoE Camp
Previously, the main Chinese open-source MoE models were DeepSeek V4 (1.6T/37B active) and Kimi K2.6 (~1T). Ling-2.6’s entry means:
- Flash tier (7.4B active): fills the gap for consumer GPU-runnable Agent models in Chinese open source
- 1T tier (63B active): comparable active parameter count to DeepSeek V4, but with a more radical design philosophy — fewer tokens consumed, same task completion rate
2. Cost Revolution for Agent Scenarios
What does Ling-2.6-Flash’s 7.4B active parameters mean?
- Compared to GPT-5.5, a single API call’s reasoning output may consume hundreds of extra tokens
- Ling-2.6-Flash reduces each call cost to 1/10 or lower through streamlined reasoning paths
- For Agent workloads requiring high-frequency calls, this is the key threshold from “experimental” to “production-grade”
3. Ant’s Open-Source Strategy Shift
Ant Ling previously focused on API services (Ling Chat). This full open-source release means:
- Shift from closed services to ecosystem building
- MIT license (not Apache 2.0 or commercial), allowing unrestricted commercial use
- Available on both Hugging Face and ModelScope, covering international and domestic developers
Actionable Advice
Who Should Pay Attention
- Agent developers: Ling-2.6-Flash’s 7.4B active parameters make it ideal for low-latency Agent calls
- Cost-sensitive teams: High API call volume scenarios, Flash’s cost advantage is significant
- Consumer GPU users: INT4 quantized version runs 104B MoE on a single RTX 4090
How to Get Started
# Hugging Face installation
pip install transformers accelerate
# Load Ling-2.6-Flash (INT4 quantized)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"InclusionAI/Ling-2.6-Flash",
device_map="auto",
torch_dtype="auto"
)
- Hugging Face:
huggingface.co/InclusionAI - ModelScope:
modelscope.cn/organization/AntLingAGI - Official deployment docs:
github.com/AntLingAGI/Ling
Caveats
- As a newly open-sourced model, community tooling (Ollama, vLLM adapters) may still be catching up
- SWE-Bench 62 vs DeepSeek V4’s 68+ — pure coding ability still has a gap
- The 1T version has high hardware requirements; try Flash first to evaluate direction