xAI Training 7 Grok Models Simultaneously on Colossus 2, Up to 10T Parameters

xAI Training 7 Grok Models Simultaneously on Colossus 2, Up to 10T Parameters

Core Conclusion

xAI is simultaneously training 7 different Grok models on its Colossus 2 cluster—the largest parallel training plan disclosed publicly. Combined with the just-released Grok 4.3 topping agentic tool calling benchmarks, xAI is building a complete model matrix from lightweight to ultra-large.

Training Scale Overview

According to disclosures on X platform, the current model matrix training on Colossus 2:

Model CodenameParametersPositioningCompeting Against
Current Grok0.5T (500B)Existing flagshipGPT-5.5, Claude Opus 4.7
Grok 5 Small1TEfficient inferenceGemini 2.5 Pro
Grok 5 Mid1.5TBalanced performanceClaude Sonnet 4.5
Grok 5 Large6TDeep reasoningGPT-6 (expected)
Grok 5 Max10TPeak performanceNo direct competitor

A 10T parameter Grok 5 Max, if successfully trained, would be the world’s largest single language model. For reference, GPT-4 is estimated at ~1.76T parameters, Claude 3 Opus at 1-2T.

Colossus 2: Training Infrastructure

Colossus 2 is xAI’s ultra-large GPU cluster in Memphis. Key features:

  • GPU scale: 200,000+ NVIDIA H100/B200 GPUs (exact number not fully disclosed)
  • Network: Custom InfiniScale architecture, solving communication bottlenecks at 10K+ card scale
  • Power: Dedicated substation, peak power consumption exceeding 500MW
  • Cooling: Full liquid cooling, PUE below 1.1

This scale of infrastructure enables simultaneous training of 7 large models—each allocated tens of thousands of GPUs, completing training in weeks rather than months.

Grok 4.3: Already Delivered Capabilities

While waiting for Grok 5 series, xAI released Grok 4.3 in early May 2026:

  • Agentic Tool Calling #1: Ranked first in agent tool calling evaluation
  • Inference speed: 100 tokens/second (server-side)
  • Context window: 1M tokens
  • Pricing: $1.25/MTok input, highly competitive

Grok 4.3’s tool calling capability is especially noteworthy. In the Agent ecosystem, tool calling accuracy directly determines Agent usability and reliability. Grok 4.3 surpassing GPT-5.5 and Claude Opus 4.7 in this evaluation means xAI’s investment in Agent infrastructure is paying off.

Landscape Judgment: From “Single Flagship” to “Model Matrix”

xAI’s strategy shift is notable. Previously vendors maintained 2-3 models (large/medium/small), while xAI is training 7 models simultaneously:

  1. Scene segmentation intensifying: Different parameter sizes for different deployment scenarios
  2. Training efficiency improvements: Colossus 2’s compute surplus makes parallel training economically viable
  3. Rapid iteration: 7 models training simultaneously enables fast trial-and-error

Action Recommendations

Your RoleFocus
Agent developersStart with Grok 4.3 tool calling—low price, leading performance
Enterprise tech selectionWatch Grok 5 Small/Mid for optimal cost-performance balance
ResearchersColossus 2’s parallel training architecture represents infrastructure evolution
Investors10T model commercialization path—inference cost and latency balance is key

Timeline: Grok 5 Small/Mid expected in 3-6 months, Large/Max in 6-12 months.