Core Conclusion
The AI chip market is experiencing a structural shift: from NVIDIA’s monopoly in the training era to multi-chip competition in the inference era. NVIDIA Vera Rubin architecture promises 35x inference throughput improvement, but competitors like AMD, Groq, and Cerebras are eroding share across various market segments. Hyperscalers’ $600B+ annual AI Capex is shifting from “buying GPUs for training” to “buying inference chips for services.”
NVIDIA Rubin: Technical Details of the 35x Leap
NVIDIA disclosed key information about its next-generation inference architecture in late April 2026:
| Metric | Hopper (H200) | Blackwell (B200) | Vera Rubin (GB300) |
|---|---|---|---|
| Inference throughput | Baseline | ~5x | ~35x |
| Power efficiency | Baseline | ~3x | ~10x |
| Memory bandwidth | 3.35 TB/s | 8 TB/s | 12+ TB/s |
| Shipping | 2024 Q1 | 2025 Q2 | 2026 Q3 (ahead of schedule) |
| Primary scenario | Training + inference | Training-focused | Inference-optimized |
Key insight: Rubin shipping ahead of schedule indicates NVIDIA is already feeling competitive pressure from AMD and custom ASICs.
Hyperscaler Capex: Where Is $600B Flowing?
According to latest analyst forecasts (April 29, 2026), hyperscaler AI Capex trends:
| Year | Amazon | Microsoft | Meta | Total | |
|---|---|---|---|---|---|
| 2024 | $52B | $75B | $48B | $38B | ~$213B |
| 2025 | $75B | $100B | $65B | $55B | ~$295B |
| 2026E | $90B+ | $130B+ | $80B+ | $65B+ | $365B+ |
| Annual (next 4-5 years) | $600B+ |
Structural shift in Capex:
- From training to inference: Training was 60% of AI Capex in 2025; inference expected to exceed 50% in 2026
- From general to specialized: Custom inference chip (ASIC) procurement increasing
- From GPU to diverse: AMD MI series, Groq LPU, Cerebras Wafer-Scale gaining more orders
AMD’s Inference Counterattack
AMD is transforming from “training follower” to “inference leader”:
AMD Halo Box: New Species for Edge Inference
- Hardware: Ryzen AI MAX+ 395 (16 Zen 5 cores + 40 RDNA 3.5 CU + XDNA 2 NPU)
- Memory: 128GB unified memory
- Positioning: Personal/edge AI inference device
- Shipping: June 2026
- Price: Estimated $1,500-$2,000
AMD MI Series: Datacenter Inference
- Hyperscalers confirmed increasing AMD MI350/MI400 procurement
- MI350 offers better price-performance than NVIDIA H200 for inference
- AMD datacenter GPU revenue expected to grow 80%+ in 2026
Inference Chip Competitive Landscape
| Player | Solution | Advantage Scenario | Market Share Trend |
|---|---|---|---|
| NVIDIA | Vera Rubin / GB300 | High-performance inference | Dominant but declining share |
| AMD | MI350 / Halo Box | Cost-performance + edge | Rapidly rising |
| Groq | LPU | Ultra-low latency inference | Niche growth |
| Cerebras | Wafer-Scale | Large model inference | Niche |
| TPU v5p/v6 | Internal use | Stable | |
| Amazon | Trainium/Inferentia | AWS internal | Growing |
| Huawei | Ascend 910C | China market | Rapid growth |
Investment Logic
Positive Directions
- AI semiconductor full stack: Not just GPUs, but EDA software, custom ASICs, advanced packaging, optical interconnects, HBM memory
- Edge inference: AMD Halo Box represents a new track for personal AI inference
- Inference optimization software: vLLM, TensorRT-LLM will grow with hardware
Risk Factors
- NVIDIA valuation already prices in most growth expectations
- Inference chip competition intensifying may lead to price wars
- Model compression advances may reduce inference hardware demand
Action Recommendations
For technology decision-makers:
- H2 2026 inference hardware procurement should evaluate multiple vendors, not default to NVIDIA
- Assess AMD Halo Box feasibility for edge inference scenarios
- Monitor inference optimization software stack maturity
For investors:
- AI semiconductors are no longer “just buy NVIDIA”—need full-stack opportunity awareness
- Edge inference, HBM memory, advanced packaging arecertain growth directions
- Watch AMD growth delivery in both datacenter and edge markets