DeepSeek API Input Cache Pricing Drops to 1/10: Model Price War Enters New Phase

DeepSeek API Input Cache Pricing Drops to 1/10: Model Price War Enters New Phase

The price war for AI model APIs has moved into its second phase — shifting from competing on base inference pricing to competing on your actual monthly bill.

On April 26, DeepSeek announced a massive reduction in its API input cache hit pricing: down to 1/10th of the original price across the entire product line. The change is effective immediately. At the same time, the previously announced 75% discount on DeepSeek-V4-Pro remains active through May 5.

The implication is straightforward: if your application has repetitive system prompts or fixed instruction templates, the cost per cache-hit call is now nearly negligible.

How Input Cache Saves Money

DeepSeek’s input cache mechanism allows intermediate computation results (KV Cache) to be reused when identical input prefixes appear in subsequent calls, skipping redundant forward passes. Previously, cache hits were cheaper than full inference but still represented a meaningful cost. At 1/10th, cache-hit pricing approaches near-free invocation.

Key numbers:

  • Cache hit price: 1/10th of original, covering the full series
  • V4-Pro discount: 75% OFF, valid through May 5
  • No migration needed: existing calls automatically benefit

Phase Two of the Price War

AI model API pricing has gone through two waves.

The first wave was the rapid descent of base inference prices — vendors went from GPT-4 levels of $30/MTok down to $1-3/MTok across the board. When DeepSeek V3 launched, it pushed prices to a point that made competitors uncomfortable.

The second wave targets “actual spend.” Base prices are already low enough that further cuts have diminishing returns. So vendors started looking at cache hits, batch processing, and context reuse to push down the developer’s real bill. DeepSeek’s 1/10th cache pricing is a landmark moment in this phase — it’s not competing on model capability, it’s competing on the developer’s cost of use.

For other vendors, the pressure to follow is mounting. If a developer makes 100K calls per day and 80K hit the cache, DeepSeek’s cost could be a fraction of what they’d pay elsewhere. Price-sensitive AI applications — especially Agent systems that repeatedly send the same system prompt — will naturally gravitate toward the cheaper API.

What This Means for Your Application

If your application has any of these patterns, the cache price drop will have a noticeable impact:

  • RAG systems: Knowledge base segments appear as fixed prefixes in every query
  • Agent multi-turn conversations: System prompts are re-sent with every turn
  • Batch processing: Large volumes of similarly structured inputs processed in the same way

In these cases, check your cache hit rate in the DeepSeek dashboard. Higher hit rates mean bigger savings. And the V4-Pro 75% discount window is still open (through May 5) — good for completing high-cost development and testing.

If your prompts vary significantly per request (e.g., mostly free-form user input), cache hit rates may be lower and the impact will be limited. One approach: structure your system so fixed prefixes (system instructions, tool definitions) come first in the input, keeping the variable part from breaking cache hits.

Primary Sources