NVIDIA Rubin 35x Inference Leap: Hyperscaler $600B Capex Shift, Inference Chips Become New Battleground

Core Conclusion

The AI chip market is experiencing a structural shift: from NVIDIA’s monopoly in the training era to multi-chip competition in the inference era. NVIDIA Vera Rubin architecture promises 35x inference throughput improvement, but competitors like AMD, Groq, and Cerebras are eroding share across various market segments. Hyperscalers’ $600B+ annual AI Capex is shifting from “buying GPUs for training” to “buying inference chips for services.”

NVIDIA Rubin: Technical Details of the 35x Leap

NVIDIA disclosed key information about its next-generation inference architecture in late April 2026:

Metric	Hopper (H200)	Blackwell (B200)	Vera Rubin (GB300)
Inference throughput	Baseline	~5x	~35x
Power efficiency	Baseline	~3x	~10x
Memory bandwidth	3.35 TB/s	8 TB/s	12+ TB/s
Shipping	2024 Q1	2025 Q2	2026 Q3 (ahead of schedule)
Primary scenario	Training + inference	Training-focused	Inference-optimized

Key insight: Rubin shipping ahead of schedule indicates NVIDIA is already feeling competitive pressure from AMD and custom ASICs.

Hyperscaler Capex: Where Is $600B Flowing?

According to latest analyst forecasts (April 29, 2026), hyperscaler AI Capex trends:

Year	Google	Amazon	Microsoft	Meta	Total
2024	$52B	$75B	$48B	$38B	~$213B
2025	$75B	$100B	$65B	$55B	~$295B
2026E	$90B+	$130B+	$80B+	$65B+	$365B+
Annual (next 4-5 years)					$600B+

Structural shift in Capex:

From training to inference: Training was 60% of AI Capex in 2025; inference expected to exceed 50% in 2026
From general to specialized: Custom inference chip (ASIC) procurement increasing
From GPU to diverse: AMD MI series, Groq LPU, Cerebras Wafer-Scale gaining more orders

AMD’s Inference Counterattack

AMD is transforming from “training follower” to “inference leader”:

AMD Halo Box: New Species for Edge Inference

Hardware: Ryzen AI MAX+ 395 (16 Zen 5 cores + 40 RDNA 3.5 CU + XDNA 2 NPU)
Memory: 128GB unified memory
Positioning: Personal/edge AI inference device
Shipping: June 2026
Price: Estimated $1,500-$2,000

AMD MI Series: Datacenter Inference

Hyperscalers confirmed increasing AMD MI350/MI400 procurement
MI350 offers better price-performance than NVIDIA H200 for inference
AMD datacenter GPU revenue expected to grow 80%+ in 2026

Inference Chip Competitive Landscape

Player	Solution	Advantage Scenario	Market Share Trend
NVIDIA	Vera Rubin / GB300	High-performance inference	Dominant but declining share
AMD	MI350 / Halo Box	Cost-performance + edge	Rapidly rising
Groq	LPU	Ultra-low latency inference	Niche growth
Cerebras	Wafer-Scale	Large model inference	Niche
Google	TPU v5p/v6	Internal use	Stable
Amazon	Trainium/Inferentia	AWS internal	Growing
Huawei	Ascend 910C	China market	Rapid growth

Investment Logic

Positive Directions

AI semiconductor full stack: Not just GPUs, but EDA software, custom ASICs, advanced packaging, optical interconnects, HBM memory
Edge inference: AMD Halo Box represents a new track for personal AI inference
Inference optimization software: vLLM, TensorRT-LLM will grow with hardware

Risk Factors

NVIDIA valuation already prices in most growth expectations
Inference chip competition intensifying may lead to price wars
Model compression advances may reduce inference hardware demand

Action Recommendations

For technology decision-makers:

H2 2026 inference hardware procurement should evaluate multiple vendors, not default to NVIDIA
Assess AMD Halo Box feasibility for edge inference scenarios
Monitor inference optimization software stack maturity

For investors:

AI semiconductors are no longer “just buy NVIDIA”—need full-stack opportunity awareness
Edge inference, HBM memory, advanced packaging arecertain growth directions
Watch AMD growth delivery in both datacenter and edge markets