MiniMax M2.7: A Self-Evolving Programming Agent that Trains Itself

In March 2026, MiniMax launched the M2.7 model. It’s not just another “larger parameter, higher benchmark” update, but introduces a new paradigm for model training: enabling the model to deeply participate in its own iteration.

Core Innovation: Model Self-Evolution

The highlight of M2.7 can be summarized in one sentence:

The model is no longer merely an object of training, but an active participant in the training process.

Specific mechanisms:

Building an Agent Harness: M2.7 drives its own complex Agent workflows.
Reinforcement Learning Loop: Through feedback from tasks executed by the Agent, the model directly participates in optimizing its own strategies.
Self-Iteration: The model continuously improves on programming tasks like SWE-bench, forming a closed loop.

This fundamentally differs from traditional RLHF (Reinforcement Learning from Human Feedback) — which relies on human annotators providing preference signals. In M2.7’s self-evolution loop, the model autonomously discovers errors, fixes issues, and validates results through the Agent framework, forming an optimization cycle without human intervention.

Performance on SWE-bench

M2.7’s performance on SWE-bench is close to that of Anthropic Opus. Although the exact scores are not fully disclosed in official materials, community comparison data shows:

Model	SWE-bench (Estimated)	Price ($/M Input)
Claude Opus 4.7 Max	87.6%	$15.00
MiniMax M2.7	≈ 82-85%	$0.30
Kimi K2.6	≈ 80-83%	~$0.50
DeepSeek V4-Pro	≈ 80-83%	$0.60
GPT-5.5	≈ 83-85%	$5.00

Considering M2.7’s input price is only $0.30/million tokens (about 2.1 RMB), its cost-effectiveness is highly competitive among programming models.

Pricing: The Price War Continues for Domestic Models

The API pricing landscape for major models in March 2026:

Model	Input ($/M)	Output ($/M)
Grok	$0.20	-
MiniMax M2.7	$0.30	Not Disclosed
DeepSeek V4	~$0.60	~$1.20
GPT-5.5	$5.00	$30.00
Claude Opus 4.7	$15.00	$75.00

MiniMax’s pricing strategy is very aggressive — offering near-Opus level programming capabilities at a price point close to DeepSeek. For scenarios requiring extensive API calls in Agent workflows, the cost difference is significant.

The Significance and Risks of Self-Evolution

Why Self-Evolution is Important

The bottlenecks of traditional model training include:

Data Dependency: Requires large amounts of high-quality training data.
Human Annotation: RLHF requires a large number of human annotators.
Iteration Cycle: Each model update takes months.

If M2.7’s self-evolution model is feasible, it means:

The model can continuously learn from real-world usage.
Iteration cycles may be shortened from months to weeks or even days.
Optimization in specific domains can be more precise.

Potential Risks

Self-evolution is not without concerns:

Risk of Capability Degradation: If the feedback signals in the Agent loop are biased, the model might degrade other capabilities while optimizing one.
Safety Alignment Issues: Self-optimization might bypass human-set safety constraints.
Unexplainability: The decision paths of capabilities learned through autonomous loops are harder to trace.

Comparison with Other Domestic Models

Dimension	MiniMax M2.7	Kimi K2.6	DeepSeek V4-Pro	GLM-5.1
Programming Ability	Near Opus	Entry Level	Entry Level	Entry Level
Self-Optimization	✅ Agent-driven RL	❌	❌	❌
Open Source	Partial	Open Weights	Fully Open (MIT)	Partially Open
Price Advantage	Extremely High	High	High	Extremely High
Ecosystem Maturity	Medium	High	High	Medium

MiniMax M2.7 is currently the only domestic model with substantial action in the “self-evolution” dimension, giving it a unique differentiation advantage in long-term competitiveness.

Action Recommendations

Suitable Scenarios for MiniMax M2.7

High-Frequency Agent Programming Workflows: Code generation, review, and repair scenarios requiring extensive API calls.
Cost-Sensitive Projects: Teams with limited budgets but high requirements for programming quality.
Technology Exploration: Developers and researchers who want to experience the effects of model self-evolution.

Currently Unsuitable Scenarios

Core Systems in Production Environments: The stability of self-evolving models needs more time to be verified.
Scenarios Requiring Explainability: If the decision process needs to be audited, consider more mature models.
Non-Programming Scenarios: M2.7’s advantages are concentrated in the programming domain; general tasks may not be as strong as other models.

Signal of Practical Application of Domestic AI in 2026

MiniMax’s president previously stated publicly that starting from M2.5, the product “is already practical,” and M2.7 further enhances the programming Agent capabilities. Combined with the industry consensus that Kimi 2.5, GLM-5, and MiniMax M2.5 all crossed the “practical threshold” in early 2026, 2026 is indeed the year of practical application for domestic AI.

Whether M2.7’s self-evolution model represents the future direction of model training still needs more time to verify. However, it at least proves one thing: Chinese model manufacturers are no longer just following the technical routes of OpenAI and Anthropic, but are exploring differentiated and innovative paths.

Main Sources: