Zhipu GLM-5.1 Released: 600 Iterations of Continuous Optimization, A New Domestic Choice for Long-Horizon Agent Tasks

Zhipu GLM-5.1 Released: 600 Iterations of Continuous Optimization, A New Domestic Choice for Long-Horizon Agent Tasks

Core Conclusion

Zhipu released GLM-5.1 in early April, positioned as the next-generation flagship model for AI Agents. Its core selling point is not absolute scores on static benchmarks, but rather sustained optimization capability during long-horizon tasks — the model demonstrates continuous improvement across 600 iterations of long-range reasoning. This forms a sharp contrast with GLM-5’s “unlimited weekly quota” plan adjustment: GLM-5 is converging on commercialization, while GLM-5.1 is exploring new Agent scenarios.

GLM-5.1 Technical Highlights

Long-Horizon Task Capability

GLM-5.1’s core innovation lies in its continuous learning ability across multiple iterations. Traditional models tend to experience “capability degradation” in multi-round Agent loops — output quality declines as conversation rounds increase. GLM-5.1, through architectural optimization, maintains a trend of continuous improvement across 600 iterations.

Capability DimensionGLM-5GLM-5.1Improvement Direction
Long-range reasoningBaselineSignificantly enhancedMulti-step task decomposition and backtracking
Iterative optimizationLimited600 iterations of continuous improvementAgent self-correction loops
SWE-Bench ProIndustry-leadingFurther aheadCode repair tasks
Agent tool callingSupportedEnhancedTool selection accuracy

Leading in SWE-Bench Pro

In SWE-Bench Pro (the professional version of the software engineering benchmark), GLM-5.1’s performance ranks in the industry’s first tier. This benchmark simulates real code repair scenarios — given a GitHub issue and a codebase, the model needs to understand the problem, locate the code, and propose a fix.

For Agent scenarios, SWE-Bench Pro is a more meaningful metric than traditional Q&A benchmarks because it measures:

  • Understanding complex codebases
  • Multi-step reasoning (locate → analyze → fix → verify)
  • Tool usage (search, read, edit, test)

Why It Matters

Differentiation of Domestic Models in the Agent Race

In the domestic large model competition, each vendor is finding its differentiated positioning:

VendorCore PositioningAdvantage Scenarios
DeepSeekExtreme cost efficiencyLarge-scale API calls, long text
Kimi/MoonshotLong context + search enhancementInformation retrieval, knowledge organization
MiniMaxMultimodal + safetyContent creation, safety-sensitive scenarios
Zhipu GLMAgent + codeProgramming assistance, automated workflows

GLM-5.1’s release further strengthens Zhipu’s positioning in the Agent + code track. The sustained optimization capability for long-horizon tasks is a core requirement for Agent scenarios — a model that can continuously work for hundreds of rounds without degradation is more practically valuable than one that performs excellently in single-turn conversations.

GLM-5 Commercialization vs GLM-5.1 Innovation

Notably, Zhipu is simultaneously doing two things:

  • GLM-5 commercialization convergence: Stopping the “unlimited weekly quota” old plan, moving to more refined pricing strategies
  • GLM-5.1 technical breakthrough: Building technical barriers in Agent long-horizon capabilities

This “tightening old products while launching new ones” strategy is increasingly common among domestic model vendors — maintaining profit margins through product iteration during price wars.

Comparison with Competitors

Long-Horizon Agent Capability

ModelIteration StabilityDegradation at 600+ RoundsAgent Scenario Fit
GLM-5.1Continuous improvementMinimalHigh
Claude Sonnet 4.6StableLowHigh
GPT-5.5MediumMediumMedium
Qwen 3.5GoodLowMedium-High
Kimi K2.5GoodLowMedium-High

Pricing Reference

Zhipu’s pricing strategy has shifted from “unlimited weekly quota” to more structured plans:

PlanMonthly FeeUse Case
New plan (former unlimited users)Pay-per-useHigh-frequency Agent usage
Standard planMonthly subscriptionDaily development assistance
Free trialLimited quotaEvaluation and testing

Note: Zhipu stopped automatic renewal of the GLM Coding Plan unlimited weekly quota old plan on April 30; affected users received 2 months of new plan benefits.

Action Recommendations

Scenarios Suitable for GLM-5.1

  1. Agent-driven code repair: Scenarios requiring continuous work in large codebases with multi-step reasoning
  2. Long-horizon automated workflows: Tasks requiring the model to maintain consistency and improvement trends across many rounds of interaction
  3. SWE-Bench-type evaluation tasks: Scenarios requiring high-accuracy code understanding and repair capabilities

Testing Strategy

  1. Run a 600-round stress test first: GLM-5.1’s core selling point is long-horizon stability; this capability should be verified with extensive iterations
  2. Compare SWE-Bench Pro performance: If your team cares about code quality, use actual code repair tasks to compare GLM-5.1 against other models
  3. Evaluate tool call accuracy: In Agent scenarios, tool call accuracy directly impacts task completion rate

Migration Recommendations

  • GLM-5 users: If you previously used the unlimited weekly quota plan, note that automatic renewal stopped on April 30. You’ve received 2 months of new plan benefits. Use this time to test GLM-5.1
  • New developers: GLM-5.1 represents Zhipu’s current technical frontier in the Agent track and is worth considering as one of the domestic Agent model options
  • Budget-conscious users: Watch Zhipu’s pricing adjustments — the new plan may be more expensive than the old unlimited plan; ROI needs to be evaluated