AI Model Real Cost Study: Cheap Listed Price Does Not Mean Cheap in Practice

When selecting AI models, many teams compare API per-million-token prices. But Stanford CRFM’s latest research reveals a serious flaw: models with cheap listed prices can actually cost dozens of times more to run.

Stanford’s 28x Reversal

The research team found:

  • Gemini 3 Flash listed price: 1.7x cheaper than Claude Haiku 4.5
  • Gemini 3 Flash actual cost (same task): 28x more expensive than Claude Haiku 4.5

Two core reasons:

  1. Token efficiency differences: Some models require more rounds and longer outputs for complex questions
  2. Task completion rate: If a model can’t answer correctly in one try, retry costs accumulate rapidly

The team estimates about 20% of model cost rankings reverse across different benchmarks.

Artificial Analysis Index Data

Latest cost data from April 25:

ModelTotal Evaluation Cost
Claude Opus 4.7$4,811
Sonnet 4.6$3,959
GPT-5.5 (xhigh)$3,357
GPT-5.5 (high)$2,159
GPT-5.5 (medium)$1,199
DeepSeek V4 Pro$1,071

Key Takeaways

1. Test with actual workloads, not listed prices

2. Focus on “blended cost” rather than single price

3. Early cost traps when new models launch

Key Sources