ERNIE 5.1 Preview Breaks into LMArena Global Top 15: The Sole Chinese Model突围

ERNIE 5.1 Preview Breaks into LMArena Global Top 15: The Sole Chinese Model突围

Key Conclusion

The LMSYS Chatbot Arena’s latest ranking on April 30 shows Baidu’s ERNIE 5.1 Preview scoring 1476 points, ranking first domestically and entering the global top 15. This is currently the only Chinese model in the global Top 15, ranking above GPT-5.5 and DeepSeek-V4-Pro.

Meanwhile, Zhipu GLM-5.1 and Kimi K2.6 have entered the “passed entry tier” in coding agent scenarios, forming a three-way competitive landscape with ERNIE 5.1 among Chinese models.

LMArena Text Leaderboard: Latest Landscape

RankModelScoreVendorNotes
1-5GPT-5.5 and other frontier models1500+OpenAI, etc.Global leaders
~10ERNIE 5.1 Preview1476BaiduOnly Chinese model in Top 15
GPT-5.5<1476OpenAISurpassed by ERNIE 5.1
DeepSeek-V4-Pro<1476DeepSeekSurpassed by ERNIE 5.1

ERNIE 5.1’s key breakthrough is in pure text conversation quality—the hardest metric to “game” in LMArena’s crowdsourced blind evaluation system, where real users vote on anonymous model responses.

Chinese Model “Big Three” Positioning

From developer feedback, Chinese models have formed a clear division of labor:

First Tier (Passed Entry):

  • GLM-5.1 (Zhipu) — Strongest in coding agent scenarios, but experienced garbled text/repetition issues at high concurrency + long context (70K+ tokens); Zhipu has published a post-mortem
  • Kimi K2.6 (Moonshot AI) — Tied with GLM-5.1, strong agent capabilities
  • ERNIE 5.1 Preview (Baidu) — Strongest in text conversation quality, backed by LMArena data

Second Tier (Not Yet Passed Entry):

  • DeepSeek-V4-Pro, Qwen 3.6 Plus, Tencent Hunyuan HY-3, etc.

This stratification shows: Chinese models are no longer about “which is better” but “which model for which scenario”—closely mirroring the smartphone market evolution from 2012 to 2016.

Why This Ranking Matters

  1. LMArena’s credibility: Unlike vendor-reported benchmarks, LMArena uses real user blind evaluations that are hard to manipulate
  2. Text vs. Multimodal: In 2026’s multimodal and Agent hype, ERNIE 5.1 proves that pure text conversation quality remains an independent competitive dimension
  3. Baidu’s AI inflection point: The ERNIE series has long been seen as “large but not refined”; the 5.1 Preview performance shows Baidu found a breakthrough in foundational text models.

Action Recommendations

  • Chinese long-text tasks: ERNIE 5.1 Preview is worth priority testing, especially for conversation quality and Chinese comprehension
  • Coding Agent scenarios: GLM-5.1 and Kimi K2.6 remain more mature choices, but watch Zhipu’s high-concurrency bug fixes
  • Cost-sensitive scenarios: DeepSeek-V4-Pro and Qwen 3.6 Plus still offer strong cost-performance advantages

LMArena rankings will continue updating. Whether ERNIE 5.1 can maintain this position in its full release remains to be seen, but as the first Top 15 breakthrough for Chinese models on a global authoritative leaderboard, the signal is clear enough.