xAI Training 7 Grok Models Simultaneously on Colossus 2, Up to 10T Parameters

Core Conclusion

xAI is simultaneously training 7 different Grok models on its Colossus 2 cluster—the largest parallel training plan disclosed publicly. Combined with the just-released Grok 4.3 topping agentic tool calling benchmarks, xAI is building a complete model matrix from lightweight to ultra-large.

Training Scale Overview

According to disclosures on X platform, the current model matrix training on Colossus 2:

Model Codename	Parameters	Positioning	Competing Against
Current Grok	0.5T (500B)	Existing flagship	GPT-5.5, Claude Opus 4.7
Grok 5 Small	1T	Efficient inference	Gemini 2.5 Pro
Grok 5 Mid	1.5T	Balanced performance	Claude Sonnet 4.5
Grok 5 Large	6T	Deep reasoning	GPT-6 (expected)
Grok 5 Max	10T	Peak performance	No direct competitor

A 10T parameter Grok 5 Max, if successfully trained, would be the world’s largest single language model. For reference, GPT-4 is estimated at ~1.76T parameters, Claude 3 Opus at 1-2T.

Colossus 2: Training Infrastructure

Colossus 2 is xAI’s ultra-large GPU cluster in Memphis. Key features:

GPU scale: 200,000+ NVIDIA H100/B200 GPUs (exact number not fully disclosed)
Network: Custom InfiniScale architecture, solving communication bottlenecks at 10K+ card scale
Power: Dedicated substation, peak power consumption exceeding 500MW
Cooling: Full liquid cooling, PUE below 1.1

This scale of infrastructure enables simultaneous training of 7 large models—each allocated tens of thousands of GPUs, completing training in weeks rather than months.

Grok 4.3: Already Delivered Capabilities

While waiting for Grok 5 series, xAI released Grok 4.3 in early May 2026:

Agentic Tool Calling #1: Ranked first in agent tool calling evaluation
Inference speed: 100 tokens/second (server-side)
Context window: 1M tokens
Pricing: $1.25/MTok input, highly competitive

Grok 4.3’s tool calling capability is especially noteworthy. In the Agent ecosystem, tool calling accuracy directly determines Agent usability and reliability. Grok 4.3 surpassing GPT-5.5 and Claude Opus 4.7 in this evaluation means xAI’s investment in Agent infrastructure is paying off.

Landscape Judgment: From “Single Flagship” to “Model Matrix”

xAI’s strategy shift is notable. Previously vendors maintained 2-3 models (large/medium/small), while xAI is training 7 models simultaneously:

Scene segmentation intensifying: Different parameter sizes for different deployment scenarios
Training efficiency improvements: Colossus 2’s compute surplus makes parallel training economically viable
Rapid iteration: 7 models training simultaneously enables fast trial-and-error

Action Recommendations

Your Role	Focus
Agent developers	Start with Grok 4.3 tool calling—low price, leading performance
Enterprise tech selection	Watch Grok 5 Small/Mid for optimal cost-performance balance
Researchers	Colossus 2’s parallel training architecture represents infrastructure evolution
Investors	10T model commercialization path—inference cost and latency balance is key

Timeline: Grok 5 Small/Mid expected in 3-6 months, Large/Max in 6-12 months.

Core Conclusion

Training Scale Overview

Colossus 2: Training Infrastructure

Grok 4.3: Already Delivered Capabilities

Landscape Judgment: From “Single Flagship” to “Model Matrix”

Action Recommendations

Related

MiniMax 3.0 on the Horizon: M2 Falling Behind, Stock Under Pressure, The Life-or-Death Battle for China's Second-Tier AI Models

Qwen3.6-Plus: Taking Over 80% of Daily Agent Workloads at 1/5 Opus Price

OpenAI GPT-6 "Goblin" Roadmap Leaked: September 29 DevDay Announcement, AGI Timeline Reignites Debate