Core Conclusion
On April 26, DeepSeek rolled out two pricing adjustments simultaneously: V4 series API cache hit prices permanently reduced to 1/10 of original, and V4-Pro limited-time 75% discount (through May 5). Stacked together, V4-Pro cache hit pricing drops to approximately $0.0036/M token — that’s 139x cheaper than GPT-5.5 and 83x cheaper than Claude Sonnet 4.6. This isn’t a simple promotion; it’s a systemic restructuring of long-context scenario cost economics.
Data Comparison
| Model | Cache Hit Price | vs DeepSeek V4-Pro | Notes |
|---|---|---|---|
| DeepSeek V4-Pro (discounted) | ~$0.0036/M | baseline | Cache 1/10 + 75% discount stacked |
| GPT-5.5 | ~$0.50/M | 139x more expensive | OpenAI official pricing |
| Claude Sonnet 4.6 | ~$0.30/M | 83x more expensive | Anthropic official pricing |
| DeepSeek V4-Pro (original) | ~$0.014/M | 3.9x | Cache 1/10 only, no discount |
Key shift: Cache hits went from “nice-to-have savings” to “absolute cost dominance.” For scenarios requiring repeated calls to the same context (RAG, multi-turn Agent conversations, codebase analysis), the cost difference jumped from percentage-level to orders of magnitude.
What Happened
Cache Hit Pricing Slashed to 1/10
DeepSeek officially announced on April 26 that input cache hit prices across the entire V4 model series are permanently reduced to 1/10 of original. This is not a limited-time offer — it’s a permanent pricing change.
V4-Pro 75% Discount Running Concurrently
On top of the cache price reduction, V4-Pro models simultaneously enjoy a 75% discount (through May 5). Combined, the effective cost for cache-hit scenarios is compressed to extremely low levels.
National Supercomputing Internet Free Access
China’s National Supercomputing Internet platform simultaneously announced limited-time free access to DeepSeek V4, further lowering the barrier for developer trials. Third-party aggregator ZenMux also followed with limited free testing.
Why It Matters
Million-Token Long Context: From “Possible” to “Practical”
Previously, million-token context in commercial APIs was priced high enough to cause hesitation. For 1M tokens:
- GPT-5.5: ~$500
- Claude Sonnet 4.6: ~$300
- DeepSeek V4-Pro (discounted cache hit): ~$3.60
The gap went from “several times more expensive” to “two orders of magnitude.” This means long-text scenarios previously abandoned due to cost — full codebase analysis, complete legal document processing, ultra-long meeting transcript summarization — are now economically viable.
Cascading Effects for Agent Scenarios
A core pain point of Agentic AI is repeated context consumption. An Agent workflow may repeatedly read the same system prompts, tool definitions, and context documents across multiple steps. Cache hit pricing directly targets this bottleneck:
- Tool call loops: Every Agent step requires re-reading tool definitions; cache hits make these repeated tokens nearly free
- Multi-Agent collaboration: When multiple Agents share the same knowledge base context, cache reuse cost benefits multiply
- RAG pipelines: Retrieved document fragments appear repeatedly across multi-turn conversations; the higher the cache hit rate, the lower the cost
Price War Enters Deep Water
DeepSeek’s adjustment sends a clear signal: capture developer ecosystem through extreme cost efficiency. When cache hit prices drop to $0.0036/M token, competitors must either follow with price cuts or lose the cost-efficiency narrative. This foreshadows 2026 AI API market competition expanding from “model capability” to “usage cost” dimensions.
Landscape Assessment
| Dimension | DeepSeek | Others |
|---|---|---|
| Cache hit pricing | Industry lowest | Generally 50-140x higher |
| Long-context pricing strategy | Aggressive cuts | Conservative adjustments |
| Free trial coverage | Supercomputing platform + third-party aggregators | Independent offerings |
| Ecosystem impact | Strong developer migration incentive | Increased retention pressure |
Trend: DeepSeek is using a “cost碾压” strategy to争夺 the default choice for mid-to-long context scenarios. If competitors don’t follow, developers in budget-sensitive scenarios will naturally flow to DeepSeek.
Action Recommendations
What You Can Do Now
- Switch cache-hit scenarios to DeepSeek V4-Pro: If your application has heavy repeated context (RAG, Agent tool calls), test migration immediately. Cost advantage is greatest during the discount period
- Optimize cache hit rates: Ensure system prompts, tool definitions, and other fixed parts remain stable to maximize cache hit ratios. Target >80% hit rate
- Use free periods for stress testing: Leverage National Supercomputing Internet or ZenMux limited free quotas to conduct large-scale V4 testing and evaluate whether model quality meets your needs
What to Watch
- Post-May 5 V4-Pro discount continuation: The 75% discount is time-limited; costs will rebound ~4x after expiry (but remain significantly below competitors)
- Competitor response speed: Watch whether OpenAI and Anthropic respond to cache pricing changes
- Cache mechanism reliability: DeepSeek cache consistency and hit rates need verification across different scenarios
Risk Notes
- DeepSeek’s muted US stock market reaction post-launch and domestic user growth below expectations suggest the price cut may be a response to sluggish user growth
- Extreme low pricing may come with tradeoffs in service quality or availability; thorough testing required for production
- Cache hits depend on request consistency; dynamically changing context scenarios cannot capture the full cost advantage