As of late April 2026, the latest LMSYS Chatbot Arena rankings reveal a clear landscape: Anthropic leads both text and code tracks, but the open-source camp is accelerating its catch-up.
Text Top 10: Anthropic Takes Four Seats
The Arena text leaderboard top 10 (Elo scores, higher is better):
| Rank | Model | Score | Lab |
|---|---|---|---|
| 1 | claude-opus-4-7-thinking | 1503 ±8 | Anthropic |
| 2 | claude-opus-4-6-thinking | 1501 ±5 | Anthropic |
| 3 | claude-opus-4-6 | 1496 ±5 | Anthropic |
| 4 | claude-opus-4-7 | 1493 ±7 | Anthropic |
| 5 | gemini-3.1-pro-preview | 1493 ±5 | |
| 6 | muse-spark | 1489 ±7 | Meta |
| 7 | gpt-5.5-high | 1488 ±10 | OpenAI |
| 8 | gemini-3-pro | 1486 ±4 | |
| 9 | grok-4.20-beta1 | 1481 ±5 | xAI |
| 10 | gpt-5.4-high | 1479 ±6 | OpenAI |
Four key observations:
Anthropic’s thinking mode shows a clear advantage. claude-opus-4-7-thinking leads at 1503, 10 points above its non-thinking counterpart (1493). The gap widens in the code leaderboard — thinking mode reaches 1571, 6 points higher.
OpenAI GPT-5.5 underperforms expectations. gpt-5.5-high ranks seventh at 1488, behind all Claude variants and Gemini 3.1 Pro. The error margin of ±10 is the largest among the top 10, indicating the widest divergence in user evaluations.
Meta muse-spark enters the top 6 for the first time. At 1489, it surpasses GPT-5.5 and becomes the highest-ranking non-Anthropic/Google model. If confirmed as open-source, it would be the strongest open-source text model currently available.
Google’s twins are stable but lack breakthroughs. gemini-3.1-pro-preview (1493) and gemini-3-pro (1486) rank fifth and eighth, with a small gap suggesting limited user-perceived improvement from 3.0 to 3.1 Pro.
Code Leaderboard: Anthropic’s Dominance Is Stronger
The code Arena shows an even wider gap:
| Rank | Model | Score |
|---|---|---|
| 1 | claude-opus-4-7-thinking | 1571 |
| 2 | claude-opus-4-7 | 1565 |
| 3 | claude-opus-4-6-thinking | 1551 |
| 4 | claude-opus-4-6 | 1548 |
| 5 | glm-5.1 | 1534 |
| 6 | kimi-k2.6 | 1529 |
| 7 | claude-sonnet-4-6 | 1525 |
| 8 | muse-spark | 1510 |
| 9 | gpt-5.5-high (codex-harness) | 1500 |
| 10 | claude-opus-4-5-thinking-32k | 1491 |
Anthropic’s advantage is even more pronounced in code — the top four are all Claude. GLM-5.1 and Kimi-K2.6, at 1534 and 1529 respectively, represent the best performance from Chinese models in the code Arena.
Notably, GPT-5.5 requires the Codex harness to reach 1500 in code, with the standalone version ranking even lower. This suggests that for pure code generation and editing, GPT-5.5 needs additional engineering integration to perform at its best.
Open-Source Progress
Combining Arena data with known open-source status:
- muse-spark (Meta): If confirmed open-source, its 1489 text score and 1510 code score both exceed GPT-5.5.
- Xiaomi MiMo-V2.5-Pro: Reached open-source model #1 in text and global sixth, with Agent index #1 among open-source models.
- GLM-5.1 (Zhipu): Fifth in code Arena at 1534, the highest-ranking Chinese model in code.
The gap between open-source and closed-source #1 has narrowed from 50+ points a year ago to 15-20 points, meaning open-source models are approaching closed-source flagships in real-world usability.
Landscape Assessment
The current Arena reflects a tri-polar landscape: Anthropic leads in both text and code, Google maintains a stable second tier with Gemini, and OpenAI’s GPT-5.5 has not reproduced its past dominance in crowdsourced evaluation. In the open-source camp, Meta and Chinese models are closing the gap but remain some distance from fully surpassing closed-source flagships.
For readers: if you need a model stable in both conversation and code, Claude Opus 4.7 remains the top choice. For cost-effectiveness and controllability, Xiaomi MiMo-V2.5-Pro and GLM-5.1 are worth trying.
Main sources: