AI Model Real Cost Study: Cheap Listed Price Does Not Mean Cheap in Practice

When selecting AI models, many teams compare API per-million-token prices. But Stanford CRFM’s latest research reveals a serious flaw: models with cheap listed prices can actually cost dozens of times more to run.

Stanford’s 28x Reversal

The research team found:

Gemini 3 Flash listed price: 1.7x cheaper than Claude Haiku 4.5
Gemini 3 Flash actual cost (same task): 28x more expensive than Claude Haiku 4.5

Two core reasons:

Token efficiency differences: Some models require more rounds and longer outputs for complex questions
Task completion rate: If a model can’t answer correctly in one try, retry costs accumulate rapidly

The team estimates about 20% of model cost rankings reverse across different benchmarks.

Artificial Analysis Index Data

Latest cost data from April 25:

Model	Total Evaluation Cost
Claude Opus 4.7	$4,811
Sonnet 4.6	$3,959
GPT-5.5 (xhigh)	$3,357
GPT-5.5 (high)	$2,159
GPT-5.5 (medium)	$1,199
DeepSeek V4 Pro	$1,071

AI Model Real Cost Study: Cheap Listed Price Does Not Mean Cheap in Practice

Stanford’s 28x Reversal

Artificial Analysis Index Data

Key Takeaways

1. Test with actual workloads, not listed prices

2. Focus on “blended cost” rather than single price

3. Early cost traps when new models launch

Key Sources

Stanford’s 28x Reversal

Artificial Analysis Index Data

Key Takeaways

1. Test with actual workloads, not listed prices

2. Focus on “blended cost” rather than single price

3. Early cost traps when new models launch

Key Sources

Related

April 2026 Model Showdown: No All-Rounder, Only Scenario Winners

DeepSeek API Input Cache Pricing Drops to 1/10: Model Price War Enters New Phase

DeepSeek V4 Open Source Release: 1.6 Trillion Parameters, Million-Token Context Window