Conclusion
GPT-5.5 Pro scores 159 on the ECI (Epoch Capabilities Index), a composite metric designed by Epoch AI that aggregates 37 different benchmarks into a single score, with higher weight given to more difficult benchmarks. Compared to the previous high score held by GPT-5.4 Pro, 159 represents a generational leap.
In practical terms, GPT-5.5 achieves 36% on MLE-Bench (machine learning engineering capability, vs. 23% for GPT-5.4), 78.7% on OSWorld (computer operation tasks), surpassing Claude Opus 4.7. Its 73% success rate on 20-hour software engineering tasks makes it the strongest coding model currently available.
Test Dimensions
ECI Composite Index Explained
ECI’s core advantage is that it resists manipulation through gaming on easy benchmarks. Weight tilts toward harder tasks, meaning a score of 159 reflects genuine improvement on truly challenging tasks rather than benchmark overfitting.
FrontierMath (frontier mathematical reasoning) is a key component of ECI. GPT-5.5 Pro demonstrated unprecedented reasoning ability on this benchmark, handling unsolved or extremely difficult research-level math problems.
Coding and Agent Capabilities
| Benchmark | GPT-5.5 | GPT-5.4 | Claude Opus 4.7 |
|---|---|---|---|
| MLE-Bench | 36% | 23% | - |
| OSWorld | 78.7% | - | Below 78.7% |
| CyberGym | 81.8% | - | - |
| SWE-bench (20h) | 73% | - | - |
GPT-5.5 matches GPT-5.4’s per-token latency while using significantly fewer tokens to complete the same Codex tasks. API pricing is $5/M input tokens, $30/M output tokens, with a 1 million token context window.
Knowledge Work and Research
GDPval covers 44 professional knowledge work scenarios. GPT-5.5 achieves a win-or-tie rate of 84.9% (GPT-5.4: 83.0%, Claude Opus 4.7: 80.3%). On GeneBench (multi-stage genetics and quantitative biology data analysis), a new internal evaluation added by OpenAI, GPT-5.5 also leads.
Selection Guidance
- Coding/Agent Development: GPT-5.5 currently has the strongest overall coding ability, leading on both MLE-Bench and SWE-bench
- Research/Math Reasoning: GPT-5.5 Pro leads on FrontierMath and ECI, suitable for high-difficulty research scenarios
- Cost Control: GPT-5.5’s token efficiency surpasses 5.4, consuming fewer tokens for the same tasks
- Enterprise Knowledge Work: 84.9% GDPval win rate, suitable for document analysis and strategy formulation