GPT-5.5 Pro Scores 159 on ECI: Composite Index Surpasses All Previous Models

Conclusion

GPT-5.5 Pro scores 159 on the ECI (Epoch Capabilities Index), a composite metric designed by Epoch AI that aggregates 37 different benchmarks into a single score, with higher weight given to more difficult benchmarks. Compared to the previous high score held by GPT-5.4 Pro, 159 represents a generational leap.

In practical terms, GPT-5.5 achieves 36% on MLE-Bench (machine learning engineering capability, vs. 23% for GPT-5.4), 78.7% on OSWorld (computer operation tasks), surpassing Claude Opus 4.7. Its 73% success rate on 20-hour software engineering tasks makes it the strongest coding model currently available.

Test Dimensions

ECI Composite Index Explained

ECI’s core advantage is that it resists manipulation through gaming on easy benchmarks. Weight tilts toward harder tasks, meaning a score of 159 reflects genuine improvement on truly challenging tasks rather than benchmark overfitting.

FrontierMath (frontier mathematical reasoning) is a key component of ECI. GPT-5.5 Pro demonstrated unprecedented reasoning ability on this benchmark, handling unsolved or extremely difficult research-level math problems.

Coding and Agent Capabilities

Benchmark	GPT-5.5	GPT-5.4	Claude Opus 4.7
MLE-Bench	36%	23%	-
OSWorld	78.7%	-	Below 78.7%
CyberGym	81.8%	-	-
SWE-bench (20h)	73%	-	-

GPT-5.5 matches GPT-5.4’s per-token latency while using significantly fewer tokens to complete the same Codex tasks. API pricing is $5/M input tokens, $30/M output tokens, with a 1 million token context window.

Knowledge Work and Research

GDPval covers 44 professional knowledge work scenarios. GPT-5.5 achieves a win-or-tie rate of 84.9% (GPT-5.4: 83.0%, Claude Opus 4.7: 80.3%). On GeneBench (multi-stage genetics and quantitative biology data analysis), a new internal evaluation added by OpenAI, GPT-5.5 also leads.

Selection Guidance

Coding/Agent Development: GPT-5.5 currently has the strongest overall coding ability, leading on both MLE-Bench and SWE-bench
Research/Math Reasoning: GPT-5.5 Pro leads on FrontierMath and ECI, suitable for high-difficulty research scenarios
Cost Control: GPT-5.5’s token efficiency surpasses 5.4, consuming fewer tokens for the same tasks
Enterprise Knowledge Work: 84.9% GDPval win rate, suitable for document analysis and strategy formulation

Conclusion

Test Dimensions

ECI Composite Index Explained

Coding and Agent Capabilities

Knowledge Work and Research

Selection Guidance

Primary Sources

Related

Kimi K2.6 Tops Design Arena: Moonshot AI Surpasses All US Models in 3D Design

Qwen 3.6 Max BS Benchmark Review: Anti-Hallucination Capability Surpasses All OpenAI Models

Oxford/LLNL Chain-of-Thought Benchmark: GPT 95.7% Single, Collapses to 9.83% Chained