DeepSeek Slashes API Cache Prices to 1/10: V4 Series Cuts Make Million-Token Context Truly Usable

DeepSeek Slashes API Cache Prices to 1/10: V4 Series Cuts Make Million-Token Context Truly Usable

Core Conclusion

On April 26, DeepSeek rolled out two pricing adjustments simultaneously: V4 series API cache hit prices permanently reduced to 1/10 of original, and V4-Pro limited-time 75% discount (through May 5). Stacked together, V4-Pro cache hit pricing drops to approximately $0.0036/M token — that’s 139x cheaper than GPT-5.5 and 83x cheaper than Claude Sonnet 4.6. This isn’t a simple promotion; it’s a systemic restructuring of long-context scenario cost economics.

Data Comparison

ModelCache Hit Pricevs DeepSeek V4-ProNotes
DeepSeek V4-Pro (discounted)~$0.0036/MbaselineCache 1/10 + 75% discount stacked
GPT-5.5~$0.50/M139x more expensiveOpenAI official pricing
Claude Sonnet 4.6~$0.30/M83x more expensiveAnthropic official pricing
DeepSeek V4-Pro (original)~$0.014/M3.9xCache 1/10 only, no discount

Key shift: Cache hits went from “nice-to-have savings” to “absolute cost dominance.” For scenarios requiring repeated calls to the same context (RAG, multi-turn Agent conversations, codebase analysis), the cost difference jumped from percentage-level to orders of magnitude.

What Happened

Cache Hit Pricing Slashed to 1/10

DeepSeek officially announced on April 26 that input cache hit prices across the entire V4 model series are permanently reduced to 1/10 of original. This is not a limited-time offer — it’s a permanent pricing change.

V4-Pro 75% Discount Running Concurrently

On top of the cache price reduction, V4-Pro models simultaneously enjoy a 75% discount (through May 5). Combined, the effective cost for cache-hit scenarios is compressed to extremely low levels.

National Supercomputing Internet Free Access

China’s National Supercomputing Internet platform simultaneously announced limited-time free access to DeepSeek V4, further lowering the barrier for developer trials. Third-party aggregator ZenMux also followed with limited free testing.

Why It Matters

Million-Token Long Context: From “Possible” to “Practical”

Previously, million-token context in commercial APIs was priced high enough to cause hesitation. For 1M tokens:

  • GPT-5.5: ~$500
  • Claude Sonnet 4.6: ~$300
  • DeepSeek V4-Pro (discounted cache hit): ~$3.60

The gap went from “several times more expensive” to “two orders of magnitude.” This means long-text scenarios previously abandoned due to cost — full codebase analysis, complete legal document processing, ultra-long meeting transcript summarization — are now economically viable.

Cascading Effects for Agent Scenarios

A core pain point of Agentic AI is repeated context consumption. An Agent workflow may repeatedly read the same system prompts, tool definitions, and context documents across multiple steps. Cache hit pricing directly targets this bottleneck:

  • Tool call loops: Every Agent step requires re-reading tool definitions; cache hits make these repeated tokens nearly free
  • Multi-Agent collaboration: When multiple Agents share the same knowledge base context, cache reuse cost benefits multiply
  • RAG pipelines: Retrieved document fragments appear repeatedly across multi-turn conversations; the higher the cache hit rate, the lower the cost

Price War Enters Deep Water

DeepSeek’s adjustment sends a clear signal: capture developer ecosystem through extreme cost efficiency. When cache hit prices drop to $0.0036/M token, competitors must either follow with price cuts or lose the cost-efficiency narrative. This foreshadows 2026 AI API market competition expanding from “model capability” to “usage cost” dimensions.

Landscape Assessment

DimensionDeepSeekOthers
Cache hit pricingIndustry lowestGenerally 50-140x higher
Long-context pricing strategyAggressive cutsConservative adjustments
Free trial coverageSupercomputing platform + third-party aggregatorsIndependent offerings
Ecosystem impactStrong developer migration incentiveIncreased retention pressure

Trend: DeepSeek is using a “cost碾压” strategy to争夺 the default choice for mid-to-long context scenarios. If competitors don’t follow, developers in budget-sensitive scenarios will naturally flow to DeepSeek.

Action Recommendations

What You Can Do Now

  1. Switch cache-hit scenarios to DeepSeek V4-Pro: If your application has heavy repeated context (RAG, Agent tool calls), test migration immediately. Cost advantage is greatest during the discount period
  2. Optimize cache hit rates: Ensure system prompts, tool definitions, and other fixed parts remain stable to maximize cache hit ratios. Target >80% hit rate
  3. Use free periods for stress testing: Leverage National Supercomputing Internet or ZenMux limited free quotas to conduct large-scale V4 testing and evaluate whether model quality meets your needs

What to Watch

  • Post-May 5 V4-Pro discount continuation: The 75% discount is time-limited; costs will rebound ~4x after expiry (but remain significantly below competitors)
  • Competitor response speed: Watch whether OpenAI and Anthropic respond to cache pricing changes
  • Cache mechanism reliability: DeepSeek cache consistency and hit rates need verification across different scenarios

Risk Notes

  • DeepSeek’s muted US stock market reaction post-launch and domestic user growth below expectations suggest the price cut may be a response to sluggish user growth
  • Extreme low pricing may come with tradeoffs in service quality or availability; thorough testing required for production
  • Cache hits depend on request consistency; dynamically changing context scenarios cannot capture the full cost advantage