OpenAI released GPT-5.5 on April 23, its first major version upgrade since GPT-4.5. The new model scored 82.7% on Terminal-Bench 2.0, pulling ahead of Claude Opus 4.7 (69.4%) and Gemini 3.1 Pro (68.5%). This benchmark measures long tasks in command-line scenarios requiring planning, iteration, and tool coordination—precisely the area Anthropic highlighted in Opus 4.7’s launch. GPT-5.5 improved 7.6 percentage points over GPT-5.4 (75.1%).
The pricing signal is equally noteworthy. GPT-5.5’s API pricing is $5.00/M input and $30.00/M output, a significant jump from GPT-5.4 ($3.50/$18.00). Claude Opus 4.7 is priced at $5.00/$25.00, while DeepSeek V4 Pro is just $2.20/$3.48. GPT-5.5 is now the most expensive frontier model.
Key Data Comparison
| Model | Input ($/M) | Output ($/M) | Terminal-Bench 2.0 | Context Window |
|---|---|---|---|---|
| GPT-5.5 | 5.00 | 30.00 | 82.7% (SOTA) | 200K |
| Claude Opus 4.7 | 5.00 | 25.00 | 69.4% | 200K |
| Gemini 3.1 Pro | 3.50 | 15.00 | 68.5% | 1M |
| DeepSeek V4 Pro | 2.20 | 3.48 | Not disclosed | 1M |
From GPT-5.0 to GPT-5.5, OpenAI’s pricing followed a steep curve: input price rose from $0.625 to $5.00 (8x), output from $5.00 to $30.00 (6x). Community feedback suggests GPT-5.5 also consumes more tokens per task, making actual costs even higher than advertised.
Landscape Assessment
GPT-5.5’s strategy is clear: use capability advantages to offset price disadvantages. The lead on Terminal-Bench 2.0 (13 percentage points over second place) is significant, but it’s concentrated in code and terminal tasks. Community feedback on general conversation and Chinese writing is more mixed.
Meanwhile, DeepSeek V4 Pro delivers near-comparable performance at less than one-ninth the price of GPT-5.5, moving from “budget alternative” to “primary choice.” As one developer put it: “DeepSeek’s pricing is doing to enterprise software margins what Costco did.” If this trend continues, price-sensitive users migrating to open-source models could accelerate.
Action Items
- Heavy terminal/code users: GPT-5.5’s Terminal-Bench advantage is real. Worth trying if your workflow relies heavily on CLI tools. Watch for unexpected token consumption.
- General conversation and long text: Gemini 3.1 Pro’s 1M context and lower price remain the better value proposition.
- Cost-sensitive scenarios: DeepSeek V4 Pro’s API pricing is low enough to replace GPT-5.5 in most non-frontier tasks.
- Watch for Codex quota changes: Community speculation suggests OpenAI may reduce Codex subscription GPT call quotas in June. Plan usage accordingly.