Core Conclusion
xAI is simultaneously training 7 different Grok models on its Colossus 2 cluster—the largest parallel training plan disclosed publicly. Combined with the just-released Grok 4.3 topping agentic tool calling benchmarks, xAI is building a complete model matrix from lightweight to ultra-large.
Training Scale Overview
According to disclosures on X platform, the current model matrix training on Colossus 2:
| Model Codename | Parameters | Positioning | Competing Against |
|---|---|---|---|
| Current Grok | 0.5T (500B) | Existing flagship | GPT-5.5, Claude Opus 4.7 |
| Grok 5 Small | 1T | Efficient inference | Gemini 2.5 Pro |
| Grok 5 Mid | 1.5T | Balanced performance | Claude Sonnet 4.5 |
| Grok 5 Large | 6T | Deep reasoning | GPT-6 (expected) |
| Grok 5 Max | 10T | Peak performance | No direct competitor |
A 10T parameter Grok 5 Max, if successfully trained, would be the world’s largest single language model. For reference, GPT-4 is estimated at ~1.76T parameters, Claude 3 Opus at 1-2T.
Colossus 2: Training Infrastructure
Colossus 2 is xAI’s ultra-large GPU cluster in Memphis. Key features:
- GPU scale: 200,000+ NVIDIA H100/B200 GPUs (exact number not fully disclosed)
- Network: Custom InfiniScale architecture, solving communication bottlenecks at 10K+ card scale
- Power: Dedicated substation, peak power consumption exceeding 500MW
- Cooling: Full liquid cooling, PUE below 1.1
This scale of infrastructure enables simultaneous training of 7 large models—each allocated tens of thousands of GPUs, completing training in weeks rather than months.
Grok 4.3: Already Delivered Capabilities
While waiting for Grok 5 series, xAI released Grok 4.3 in early May 2026:
- Agentic Tool Calling #1: Ranked first in agent tool calling evaluation
- Inference speed: 100 tokens/second (server-side)
- Context window: 1M tokens
- Pricing: $1.25/MTok input, highly competitive
Grok 4.3’s tool calling capability is especially noteworthy. In the Agent ecosystem, tool calling accuracy directly determines Agent usability and reliability. Grok 4.3 surpassing GPT-5.5 and Claude Opus 4.7 in this evaluation means xAI’s investment in Agent infrastructure is paying off.
Landscape Judgment: From “Single Flagship” to “Model Matrix”
xAI’s strategy shift is notable. Previously vendors maintained 2-3 models (large/medium/small), while xAI is training 7 models simultaneously:
- Scene segmentation intensifying: Different parameter sizes for different deployment scenarios
- Training efficiency improvements: Colossus 2’s compute surplus makes parallel training economically viable
- Rapid iteration: 7 models training simultaneously enables fast trial-and-error
Action Recommendations
| Your Role | Focus |
|---|---|
| Agent developers | Start with Grok 4.3 tool calling—low price, leading performance |
| Enterprise tech selection | Watch Grok 5 Small/Mid for optimal cost-performance balance |
| Researchers | Colossus 2’s parallel training architecture represents infrastructure evolution |
| Investors | 10T model commercialization path—inference cost and latency balance is key |
Timeline: Grok 5 Small/Mid expected in 3-6 months, Large/Max in 6-12 months.