DeepSeek Slashes API Cache Prices to 1/10: V4 Series Cuts Make Million-Token Context Truly Usable

Core Conclusion

On April 26, DeepSeek rolled out two pricing adjustments simultaneously: V4 series API cache hit prices permanently reduced to 1/10 of original, and V4-Pro limited-time 75% discount (through May 5). Stacked together, V4-Pro cache hit pricing drops to approximately $0.0036/M token — that’s 139x cheaper than GPT-5.5 and 83x cheaper than Claude Sonnet 4.6. This isn’t a simple promotion; it’s a systemic restructuring of long-context scenario cost economics.

Data Comparison

Model	Cache Hit Price	vs DeepSeek V4-Pro	Notes
DeepSeek V4-Pro (discounted)	~$0.0036/M	baseline	Cache 1/10 + 75% discount stacked
GPT-5.5	~$0.50/M	139x more expensive	OpenAI official pricing
Claude Sonnet 4.6	~$0.30/M	83x more expensive	Anthropic official pricing
DeepSeek V4-Pro (original)	~$0.014/M	3.9x	Cache 1/10 only, no discount

Key shift: Cache hits went from “nice-to-have savings” to “absolute cost dominance.” For scenarios requiring repeated calls to the same context (RAG, multi-turn Agent conversations, codebase analysis), the cost difference jumped from percentage-level to orders of magnitude.

What Happened

Cache Hit Pricing Slashed to 1/10

DeepSeek officially announced on April 26 that input cache hit prices across the entire V4 model series are permanently reduced to 1/10 of original. This is not a limited-time offer — it’s a permanent pricing change.

V4-Pro 75% Discount Running Concurrently

On top of the cache price reduction, V4-Pro models simultaneously enjoy a 75% discount (through May 5). Combined, the effective cost for cache-hit scenarios is compressed to extremely low levels.

National Supercomputing Internet Free Access

China’s National Supercomputing Internet platform simultaneously announced limited-time free access to DeepSeek V4, further lowering the barrier for developer trials. Third-party aggregator ZenMux also followed with limited free testing.

Why It Matters

Million-Token Long Context: From “Possible” to “Practical”

Previously, million-token context in commercial APIs was priced high enough to cause hesitation. For 1M tokens:

GPT-5.5: ~$500
Claude Sonnet 4.6: ~$300
DeepSeek V4-Pro (discounted cache hit): ~$3.60

The gap went from “several times more expensive” to “two orders of magnitude.” This means long-text scenarios previously abandoned due to cost — full codebase analysis, complete legal document processing, ultra-long meeting transcript summarization — are now economically viable.

Cascading Effects for Agent Scenarios

A core pain point of Agentic AI is repeated context consumption. An Agent workflow may repeatedly read the same system prompts, tool definitions, and context documents across multiple steps. Cache hit pricing directly targets this bottleneck:

Tool call loops: Every Agent step requires re-reading tool definitions; cache hits make these repeated tokens nearly free
Multi-Agent collaboration: When multiple Agents share the same knowledge base context, cache reuse cost benefits multiply
RAG pipelines: Retrieved document fragments appear repeatedly across multi-turn conversations; the higher the cache hit rate, the lower the cost

Price War Enters Deep Water

DeepSeek’s adjustment sends a clear signal: capture developer ecosystem through extreme cost efficiency. When cache hit prices drop to $0.0036/M token, competitors must either follow with price cuts or lose the cost-efficiency narrative. This foreshadows 2026 AI API market competition expanding from “model capability” to “usage cost” dimensions.

Landscape Assessment

Dimension	DeepSeek	Others
Cache hit pricing	Industry lowest	Generally 50-140x higher
Long-context pricing strategy	Aggressive cuts	Conservative adjustments
Free trial coverage	Supercomputing platform + third-party aggregators	Independent offerings
Ecosystem impact	Strong developer migration incentive	Increased retention pressure

Trend: DeepSeek is using a “cost碾压” strategy to争夺 the default choice for mid-to-long context scenarios. If competitors don’t follow, developers in budget-sensitive scenarios will naturally flow to DeepSeek.

Action Recommendations

What You Can Do Now

Switch cache-hit scenarios to DeepSeek V4-Pro: If your application has heavy repeated context (RAG, Agent tool calls), test migration immediately. Cost advantage is greatest during the discount period
Optimize cache hit rates: Ensure system prompts, tool definitions, and other fixed parts remain stable to maximize cache hit ratios. Target >80% hit rate
Use free periods for stress testing: Leverage National Supercomputing Internet or ZenMux limited free quotas to conduct large-scale V4 testing and evaluate whether model quality meets your needs

What to Watch

Post-May 5 V4-Pro discount continuation: The 75% discount is time-limited; costs will rebound ~4x after expiry (but remain significantly below competitors)
Competitor response speed: Watch whether OpenAI and Anthropic respond to cache pricing changes
Cache mechanism reliability: DeepSeek cache consistency and hit rates need verification across different scenarios

Risk Notes

DeepSeek’s muted US stock market reaction post-launch and domestic user growth below expectations suggest the price cut may be a response to sluggish user growth
Extreme low pricing may come with tradeoffs in service quality or availability; thorough testing required for production
Cache hits depend on request consistency; dynamically changing context scenarios cannot capture the full cost advantage