Bottom Line
Wall Street is reassessing the investment logic for AI infrastructure: as AI moves from “single inference” to “autonomous operation,” the center of compute demand is shifting from GPUs to CPUs and memory.
This is not just analysts’ armchair theorizing. DeepSeek V4’s million-token context, OpenClaw’s 24/7 local Agent operations, Gemini Daily Brief’s daily data scanning — these use cases are consuming massive amounts of CPU compute and memory bandwidth, not just GPU parallel computing power.
Core Arguments from the Report
Why the Agent Era Needs More CPU and Memory
| Workload Type | Primary Hardware | Trend |
|---|---|---|
| Model training | GPU (NVIDIA H200/B200) | Continued growth |
| Single inference (chat) | GPU | Growing |
| Autonomous Agent operation | CPU + memory | Explosive growth |
| Context management (million tokens) | Memory | Explosive growth |
| Local model deployment | CPU + memory | Rapid growth |
Key logic:
- Agents need continuous operation: Unlike one-off chat requests, Agents need to continuously monitor, decide, and execute in the background — this requires CPUs to stay online long-term
- Context windows are ballooning: DeepSeek V4 supports 1M token context, which must reside in memory
- Edge inference is rising: The local deployment trend (like OpenClaw) means more inference is happening on CPUs rather than cloud GPUs
List of Beneficiary Companies
CPUs & Accelerators:
- NVIDIA (not just GPUs, CPU product line also expanding)
- AMD (EPYC server CPU + MI300 accelerator)
- Intel (Xeon server CPU + Gaudi accelerator)
- Arm (architecture licensing, used in virtually all mobile and edge AI)
Memory:
- Micron (HBM and DDR5 demand surging)
- Samsung (HBM3E capacity expansion)
- SK hynix (HBM market leader, NVIDIA’s primary supplier)
Chip Manufacturing & Equipment:
- TSMC (advanced process foundry霸主)
- ASML (EUV lithography monopoly)
Supporting Data
Several key data points validate this trend:
- DeepSeek V4: 1M token context means each conversation requires approximately 2GB of memory to store context state
- OpenClaw: 320K GitHub stars, mostly deployed on personal devices (primarily CPU inference)
- Huawei Ascend: Expected 2026 AI chip revenue of $12B, much of it in CPU-coprocessor architectures
Landscape Assessment
Impact on NVIDIA
NVIDIA remains the absolute hegemon of AI chips, but this report reminds us:
- NVIDIA’s moat is in GPU training
- On the Agent operations side (CPU + memory), NVIDIA’s market share is not as dominant as in training
- NVIDIA’s Grace CPU + BlueField DPU is the response strategy, but still in early stages
Opportunities for AMD and Intel
- AMD’s EPYC + MI300 combination has a cost advantage in inference
- Intel’s Gaudi 3 accelerator is capturing part of the inference market
- Both companies are betting on the “AI PC” concept — local CPU inference is the core selling point
Impact on Memory Companies
HBM (High Bandwidth Memory) is the most certain semiconductor growth story of 2025-2026:
- SK hynix leads in HBM3E
- Samsung is catching up
- Micron’s HBM yields are improving
Actionable Advice
- Investors: If you only hold NVIDIA, consider allocating to memory and CPU stocks beyond GPUs to diversify your compute investment risk in the Agent era
- Developers: Local Agent deployment (like OpenClaw) demands far more CPU and memory than expected — don’t just look at GPUs when selecting hardware
- Chip industry professionals: Chip optimization for Agent inference (CPU inference acceleration, memory bandwidth optimization) may be the next technology hotspot