Morgan Stanley: Autonomous AI Agents Will Ignite CPU and Memory Demand, Investment Opportunities Beyond GPUs

Bottom Line

Wall Street is reassessing the investment logic for AI infrastructure: as AI moves from “single inference” to “autonomous operation,” the center of compute demand is shifting from GPUs to CPUs and memory.

This is not just analysts’ armchair theorizing. DeepSeek V4’s million-token context, OpenClaw’s 24/7 local Agent operations, Gemini Daily Brief’s daily data scanning — these use cases are consuming massive amounts of CPU compute and memory bandwidth, not just GPU parallel computing power.

Core Arguments from the Report

Why the Agent Era Needs More CPU and Memory

Workload Type	Primary Hardware	Trend
Model training	GPU (NVIDIA H200/B200)	Continued growth
Single inference (chat)	GPU	Growing
Autonomous Agent operation	CPU + memory	Explosive growth
Context management (million tokens)	Memory	Explosive growth
Local model deployment	CPU + memory	Rapid growth

Key logic:

Agents need continuous operation: Unlike one-off chat requests, Agents need to continuously monitor, decide, and execute in the background — this requires CPUs to stay online long-term
Context windows are ballooning: DeepSeek V4 supports 1M token context, which must reside in memory
Edge inference is rising: The local deployment trend (like OpenClaw) means more inference is happening on CPUs rather than cloud GPUs

List of Beneficiary Companies

CPUs & Accelerators:

NVIDIA (not just GPUs, CPU product line also expanding)
AMD (EPYC server CPU + MI300 accelerator)
Intel (Xeon server CPU + Gaudi accelerator)
Arm (architecture licensing, used in virtually all mobile and edge AI)

Memory:

Micron (HBM and DDR5 demand surging)
Samsung (HBM3E capacity expansion)
SK hynix (HBM market leader, NVIDIA’s primary supplier)

Chip Manufacturing & Equipment:

TSMC (advanced process foundry霸主)
ASML (EUV lithography monopoly)

Supporting Data

Several key data points validate this trend:

DeepSeek V4: 1M token context means each conversation requires approximately 2GB of memory to store context state
OpenClaw: 320K GitHub stars, mostly deployed on personal devices (primarily CPU inference)
Huawei Ascend: Expected 2026 AI chip revenue of $12B, much of it in CPU-coprocessor architectures

Landscape Assessment

Impact on NVIDIA

NVIDIA remains the absolute hegemon of AI chips, but this report reminds us:

NVIDIA’s moat is in GPU training
On the Agent operations side (CPU + memory), NVIDIA’s market share is not as dominant as in training
NVIDIA’s Grace CPU + BlueField DPU is the response strategy, but still in early stages

Opportunities for AMD and Intel

AMD’s EPYC + MI300 combination has a cost advantage in inference
Intel’s Gaudi 3 accelerator is capturing part of the inference market
Both companies are betting on the “AI PC” concept — local CPU inference is the core selling point

Impact on Memory Companies

HBM (High Bandwidth Memory) is the most certain semiconductor growth story of 2025-2026:

SK hynix leads in HBM3E
Samsung is catching up
Micron’s HBM yields are improving

Actionable Advice

Investors: If you only hold NVIDIA, consider allocating to memory and CPU stocks beyond GPUs to diversify your compute investment risk in the Agent era
Developers: Local Agent deployment (like OpenClaw) demands far more CPU and memory than expected — don’t just look at GPUs when selecting hardware
Chip industry professionals: Chip optimization for Agent inference (CPU inference acceleration, memory bandwidth optimization) may be the next technology hotspot