Chatbot Arena April 2026: Anthropic Sweeps Top Four, Open-Source Gap Narrows

As of late April 2026, the latest LMSYS Chatbot Arena rankings reveal a clear landscape: Anthropic leads both text and code tracks, but the open-source camp is accelerating its catch-up.

Text Top 10: Anthropic Takes Four Seats

The Arena text leaderboard top 10 (Elo scores, higher is better):

Rank	Model	Score	Lab
1	claude-opus-4-7-thinking	1503 ±8	Anthropic
2	claude-opus-4-6-thinking	1501 ±5	Anthropic
3	claude-opus-4-6	1496 ±5	Anthropic
4	claude-opus-4-7	1493 ±7	Anthropic
5	gemini-3.1-pro-preview	1493 ±5	Google
6	muse-spark	1489 ±7	Meta
7	gpt-5.5-high	1488 ±10	OpenAI
8	gemini-3-pro	1486 ±4	Google
9	grok-4.20-beta1	1481 ±5	xAI
10	gpt-5.4-high	1479 ±6	OpenAI

Four key observations:

Anthropic’s thinking mode shows a clear advantage. claude-opus-4-7-thinking leads at 1503, 10 points above its non-thinking counterpart (1493). The gap widens in the code leaderboard — thinking mode reaches 1571, 6 points higher.

OpenAI GPT-5.5 underperforms expectations. gpt-5.5-high ranks seventh at 1488, behind all Claude variants and Gemini 3.1 Pro. The error margin of ±10 is the largest among the top 10, indicating the widest divergence in user evaluations.

Meta muse-spark enters the top 6 for the first time. At 1489, it surpasses GPT-5.5 and becomes the highest-ranking non-Anthropic/Google model. If confirmed as open-source, it would be the strongest open-source text model currently available.

Google’s twins are stable but lack breakthroughs. gemini-3.1-pro-preview (1493) and gemini-3-pro (1486) rank fifth and eighth, with a small gap suggesting limited user-perceived improvement from 3.0 to 3.1 Pro.

Code Leaderboard: Anthropic’s Dominance Is Stronger

The code Arena shows an even wider gap:

Rank	Model	Score
1	claude-opus-4-7-thinking	1571
2	claude-opus-4-7	1565
3	claude-opus-4-6-thinking	1551
4	claude-opus-4-6	1548
5	glm-5.1	1534
6	kimi-k2.6	1529
7	claude-sonnet-4-6	1525
8	muse-spark	1510
9	gpt-5.5-high (codex-harness)	1500
10	claude-opus-4-5-thinking-32k	1491

Anthropic’s advantage is even more pronounced in code — the top four are all Claude. GLM-5.1 and Kimi-K2.6, at 1534 and 1529 respectively, represent the best performance from Chinese models in the code Arena.

Notably, GPT-5.5 requires the Codex harness to reach 1500 in code, with the standalone version ranking even lower. This suggests that for pure code generation and editing, GPT-5.5 needs additional engineering integration to perform at its best.

Open-Source Progress

Combining Arena data with known open-source status:

muse-spark (Meta): If confirmed open-source, its 1489 text score and 1510 code score both exceed GPT-5.5.
Xiaomi MiMo-V2.5-Pro: Reached open-source model #1 in text and global sixth, with Agent index #1 among open-source models.
GLM-5.1 (Zhipu): Fifth in code Arena at 1534, the highest-ranking Chinese model in code.

The gap between open-source and closed-source #1 has narrowed from 50+ points a year ago to 15-20 points, meaning open-source models are approaching closed-source flagships in real-world usability.

Landscape Assessment

The current Arena reflects a tri-polar landscape: Anthropic leads in both text and code, Google maintains a stable second tier with Gemini, and OpenAI’s GPT-5.5 has not reproduced its past dominance in crowdsourced evaluation. In the open-source camp, Meta and Chinese models are closing the gap but remain some distance from fully surpassing closed-source flagships.

For readers: if you need a model stable in both conversation and code, Claude Opus 4.7 remains the top choice. For cost-effectiveness and controllability, Xiaomi MiMo-V2.5-Pro and GLM-5.1 are worth trying.

Main sources:

Text Top 10: Anthropic Takes Four Seats

Code Leaderboard: Anthropic’s Dominance Is Stronger

Open-Source Progress

Landscape Assessment

Related

Kimi K2.6 Tops Design Arena: Moonshot AI Surpasses All US Models in 3D Design

Qwen 3.6 Max BS Benchmark Review: Anti-Hallucination Capability Surpasses All OpenAI Models

Oxford/LLNL Chain-of-Thought Benchmark: GPT 95.7% Single, Collapses to 9.83% Chained