GPT-5.5 Pro Scores 159 on ECI: Composite Index Surpasses All Previous Models

GPT-5.5 Pro Scores 159 on ECI: Composite Index Surpasses All Previous Models

Conclusion

GPT-5.5 Pro scores 159 on the ECI (Epoch Capabilities Index), a composite metric designed by Epoch AI that aggregates 37 different benchmarks into a single score, with higher weight given to more difficult benchmarks. Compared to the previous high score held by GPT-5.4 Pro, 159 represents a generational leap.

In practical terms, GPT-5.5 achieves 36% on MLE-Bench (machine learning engineering capability, vs. 23% for GPT-5.4), 78.7% on OSWorld (computer operation tasks), surpassing Claude Opus 4.7. Its 73% success rate on 20-hour software engineering tasks makes it the strongest coding model currently available.

Test Dimensions

ECI Composite Index Explained

ECI’s core advantage is that it resists manipulation through gaming on easy benchmarks. Weight tilts toward harder tasks, meaning a score of 159 reflects genuine improvement on truly challenging tasks rather than benchmark overfitting.

FrontierMath (frontier mathematical reasoning) is a key component of ECI. GPT-5.5 Pro demonstrated unprecedented reasoning ability on this benchmark, handling unsolved or extremely difficult research-level math problems.

Coding and Agent Capabilities

BenchmarkGPT-5.5GPT-5.4Claude Opus 4.7
MLE-Bench36%23%-
OSWorld78.7%-Below 78.7%
CyberGym81.8%--
SWE-bench (20h)73%--

GPT-5.5 matches GPT-5.4’s per-token latency while using significantly fewer tokens to complete the same Codex tasks. API pricing is $5/M input tokens, $30/M output tokens, with a 1 million token context window.

Knowledge Work and Research

GDPval covers 44 professional knowledge work scenarios. GPT-5.5 achieves a win-or-tie rate of 84.9% (GPT-5.4: 83.0%, Claude Opus 4.7: 80.3%). On GeneBench (multi-stage genetics and quantitative biology data analysis), a new internal evaluation added by OpenAI, GPT-5.5 also leads.

Selection Guidance

  • Coding/Agent Development: GPT-5.5 currently has the strongest overall coding ability, leading on both MLE-Bench and SWE-bench
  • Research/Math Reasoning: GPT-5.5 Pro leads on FrontierMath and ECI, suitable for high-difficulty research scenarios
  • Cost Control: GPT-5.5’s token efficiency surpasses 5.4, consuming fewer tokens for the same tasks
  • Enterprise Knowledge Work: 84.9% GDPval win rate, suitable for document analysis and strategy formulation

Primary Sources