DeepSeek V4 Fully Compatible with Huawei Ascend: First Domestic Large Model Trained and Deployed on Domestic Chips

DeepSeek V4 Fully Compatible with Huawei Ascend: First Domestic Large Model Trained and Deployed on Domestic Chips

On April 24, DeepSeek released the V4 series of models—the flagship V4-Pro with 1.6 trillion parameters and the efficient V4-Flash with 284 billion parameters. But more important than the models themselves: this is the first domestic large model based on Huawei Ascend chips from the training phase onward.

Key Metrics

MetricValue
V4-Pro Total Parameters1.6T, 49B activated
First Token Latency20ms
Inference Compute ConsumptionOnly 27% of the previous generation V3.2
Ascend 950 Single-Card Throughput4700 TPS (8k input)
FP4 Compute PowerAscend 950PR reaches 1.56P, 2.87x that of H20
Procurement CostOnly 1/3 to 1/4 of H200

From “After-the-Fact Adaptation” to “Native First Release”

Previous domestic models were all first trained in NVIDIA’s CUDA ecosystem, then spent months migrating to the Ascend CANN framework. This time, DeepSeek V4 was trained directly on the Ascend 950, with Huawei announcing full compatibility across the entire Ascend supernode series within hours.

This means domestic computing power has gone from a “backup option” to a “primary choice.”

Leapfrog Breakthrough in Agent Capabilities

V4-Pro achieves a leapfrog improvement in Agent capabilities, with a coding experience that surpasses Sonnet 4.5 and delivery quality approaching Opus 4.6. It also launches “Fast Mode” and “Expert Mode,” and has started a phased rollout of image recognition mode.

Signal to the Industry

When the largest open-source model vendor and the largest domestic chip vendor are deeply integrated, the entire ecosystem’s flywheel starts spinning. Following the announcement, shares of domestic AI chip concept stocks surged over 10% on the day.


Primary Sources: Toutiao, chinaz, Bilibili Ascend Livestream