Conclusion First
Among the 512K lines of leaked code from Claude Sonnet 4.8, the most underestimated information is not the 98% vision accuracy or the +12 coding benchmark improvement, but a new effort level: X-high. This new tier will fundamentally change the cost-effectiveness model of Claude-based Agent workflows.
What X-High Actually Is
Anthropic’s previous effort levels were divided into three tiers:
| Level | Behavior Characteristics | Typical Scenarios |
|---|---|---|
| Medium | Quick answers, fewer reasoning steps | Simple Q&A, information lookup |
| High | Deep reasoning, multi-step thinking | Code generation, complex analysis |
| X-high (New) | Extreme reasoning, maximized exploration space | Architecture design, debugging难题, security audits |
The core change with X-high is that the reasoning budget upper limit has been dramatically expanded. Analysis from the leaked code reveals:
- Reasoning steps: Increased from ~50 steps in High to ~200+ steps
- Self-verification loops: Built-in multi-round self-correction, automatically verifying after each generation
- Tool call depth: Support for deeper file scanning and codebase traversal
- Memory retention: More effective use of longer context, reducing intermediate information loss
Attribution Analysis of the +12 Coding Benchmark Improvement
Sonnet 4.8’s 12-point coding benchmark improvement is extremely rare. Through code reverse engineering, we can attribute this to three factors:
| Factor | Estimated Contribution | Explanation |
|---|---|---|
| X-high reasoning depth | ~40% | More reasoning steps directly improve complex task resolution rates |
| 98% vision accuracy | ~30% | Improved screenshot/UI analysis capabilities indirectly help coding tasks |
| Training data updates | ~30% | Underlying improvement in codebase understanding |
This means if you focus only on “the model changed” while ignoring “the reasoning strategy changed,” you’ll miss Sonnet 4.8’s greatest value.
Practical Impact on Agent Workflows
The Previous Cost Model
Simple tasks → Medium (cheap) → Quick completion
Complex tasks → High (medium) → May fail → Human intervention
The New Model After Sonnet 4.8
Simple tasks → Medium (cheap) → Quick completion
Medium tasks → High (medium) → High probability of completion
Difficult tasks → X-high (expensive) → Extremely high resolution rate → No human intervention needed
The key insight: Although X-high is expensive, if it can replace human intervention, the overall cost is actually lower.
Workflow Restructuring Recommendations
Scenario 1: Code Review Pipeline
# Old approach
- Phase 1: Sonnet 4.7 High → Automated review
- Phase 2: Human review (edge cases High cannot handle)
- Cost: API fees + engineer time
# New approach (Sonnet 4.8)
- Phase 1: Sonnet 4.8 Medium → Routine review
- Phase 2: Sonnet 4.8 X-high → Complex review (replaces human)
- Cost: API fees (potentially lower than engineer time cost)
Scenario 2: Large Codebase Refactoring
X-high’s deep reasoning capability is particularly suited for tasks requiring understanding of global architecture:
- File scanning depth: Expanded from hundreds of files to thousands
- Dependency analysis: Automatically builds complete dependency graphs
- Refactoring plans: Generates complete refactoring plans including rollback strategies
Scenario 3: Security Auditing
X-high’s multi-round self-verification loops are particularly suited for security scenarios:
- Round 1: Identify potential vulnerabilities
- Round 2: Verify exploitability of vulnerabilities
- Round 3: Generate fix plans
- Round 4: Verify fix plans don’t introduce new problems
Pricing Guesses and Cost Calculations
Based on Anthropic’s pricing history, X-high pricing may be 2-3x that of High. But considering the improvement in resolution rates:
| Scenario | High Mode | X-high Mode | Cost-Effectiveness |
|---|---|---|---|
| Simple code generation | $0.50/task | $1.50/task | High is better |
| Complex debugging | $2.00 + human $50 | $6.00 | X-high is better |
| Architecture review | $5.00 + human $100 | $15.00 | X-high is better |
Action Recommendations
- Test immediately after the May 6 conference: After Sonnet 4.8 launches, compare High and X-high effectiveness with your actual tasks
- Redesign Agent routing: Add X-high as a new routing target in your Agent frameworks
- Monitor cost changes: X-high’s high reasoning steps mean token consumption may increase significantly; set budget limits