Ant Group Ling-2.6 Fully Open-Sourced: Flash Activates Only 7.4B, 1T Flagship Built for "Execution-First"

Ant Group Ling-2.6 Fully Open-Sourced: Flash Activates Only 7.4B, 1T Flagship Built for "Execution-First"

Bottom Line

Ant Group (Inclusion AI / Ant Ling) has open-sourced two models in late April: Ling-2.6-Flash and Ling-2.6-1T, both using MoE architecture, MIT license, with BF16/FP8/INT4 precision variants. Compared to models of similar parameter scale, Ling’s core differentiation lies in extremely low activation parameters and execution-oriented design — not a benchmark-padding machine, but purpose-built for Agent workloads.

DimensionLing-2.6-FlashLing-2.6-1T
Total Parameters104B~1T
Active Parameters7.4B~63B
Context Window256K256K+
LicenseMITMIT
SWE-Bench Verified6267+
BFCL-V46772+
TAU2-Bench (Telecom)93.8695+

What Happened

Ling-2.6-Flash: Ultra-Lightweight Agent Model

  • April 29: Ling-2.6-Flash weights officially open-sourced. 104B total parameters, only 7.4B activated per inference — meaning it can run on consumer-grade GPUs (single RTX 4090 with INT4 quantization).
  • Built on Ling 2.0 with hybrid linear attention mechanism, replacing the previous GQA attention with a more efficient hybrid approach, significantly reducing inference latency.
  • SWE-Bench Verified 62, BFCL-V4 67, TAU2-Telecom 93.86 — all hard-scenario metrics, not academic leaderboard-padding datasets.

Ling-2.6-1T: Flagship Execution Model

  • Following Flash, Ling-2.6-1T was released on the same day. ~1T total parameters, ~63B active parameters.
  • Core design philosophy is “Execution-First”: reducing token waste during reasoning, skipping verbose internal monologue-style thinking, outputting executable results directly.
  • Community feedback: many frontier models’ reasoning outputs are essentially wasted tokens — users pay for every internal thought, but task completion rates don’t improve proportionally. Ling-2.6-1T directly addresses this problem.

Why It Matters

1. A New Variable in the Chinese MoE Camp

Previously, the main Chinese open-source MoE models were DeepSeek V4 (1.6T/37B active) and Kimi K2.6 (~1T). Ling-2.6’s entry means:

  • Flash tier (7.4B active): fills the gap for consumer GPU-runnable Agent models in Chinese open source
  • 1T tier (63B active): comparable active parameter count to DeepSeek V4, but with a more radical design philosophy — fewer tokens consumed, same task completion rate

2. Cost Revolution for Agent Scenarios

What does Ling-2.6-Flash’s 7.4B active parameters mean?

  • Compared to GPT-5.5, a single API call’s reasoning output may consume hundreds of extra tokens
  • Ling-2.6-Flash reduces each call cost to 1/10 or lower through streamlined reasoning paths
  • For Agent workloads requiring high-frequency calls, this is the key threshold from “experimental” to “production-grade”

3. Ant’s Open-Source Strategy Shift

Ant Ling previously focused on API services (Ling Chat). This full open-source release means:

  • Shift from closed services to ecosystem building
  • MIT license (not Apache 2.0 or commercial), allowing unrestricted commercial use
  • Available on both Hugging Face and ModelScope, covering international and domestic developers

Actionable Advice

Who Should Pay Attention

  • Agent developers: Ling-2.6-Flash’s 7.4B active parameters make it ideal for low-latency Agent calls
  • Cost-sensitive teams: High API call volume scenarios, Flash’s cost advantage is significant
  • Consumer GPU users: INT4 quantized version runs 104B MoE on a single RTX 4090

How to Get Started

# Hugging Face installation
pip install transformers accelerate

# Load Ling-2.6-Flash (INT4 quantized)
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "InclusionAI/Ling-2.6-Flash",
    device_map="auto",
    torch_dtype="auto"
)
  • Hugging Face: huggingface.co/InclusionAI
  • ModelScope: modelscope.cn/organization/AntLingAGI
  • Official deployment docs: github.com/AntLingAGI/Ling

Caveats

  • As a newly open-sourced model, community tooling (Ollama, vLLM adapters) may still be catching up
  • SWE-Bench 62 vs DeepSeek V4’s 68+ — pure coding ability still has a gap
  • The 1T version has high hardware requirements; try Flash first to evaluate direction