Meta Joins AMD, Broadcom, Intel, Microsoft, NVIDIA to Release MRC Protocol: Solving Network Bottlenecks in AI Training Clusters

Core Takeaway

On May 6, 2026, Meta jointly released the Multipath Reliable Connection (MRC) open network protocol together with five tech giants: AMD, Broadcom, Intel, Microsoft, and NVIDIA. This is a new network protocol specifically designed for large-scale AI training clusters, with the core goal of reducing GPU wait time, minimizing training interruptions caused by network failures, and improving overall training efficiency.

This tweet received 4,485 likes, 488 retweets, and 1,250 bookmarks on the day of publication, with views exceeding 580,000 — triggering unusually high discussion in the AI infrastructure domain.

What Happened

The core positioning of the MRC protocol: making large-scale AI training clusters run faster and more stably, reducing GPU time waste.

Participant Lineup

Company	Role	Position in AI Infrastructure
Meta	Initiator	Ultra-large model training demand side (Llama series)
AMD	Co-publisher	GPU/CPU computing power supplier
Broadcom	Co-publisher	AI network chip custom design
Intel	Co-publisher	CPU/network processor supplier
Microsoft	Co-publisher	Cloud infrastructure operator (Azure)
NVIDIA	Co-publisher	GPU and network solution supplier (InfiniBand)

The significance of this lineup is that it covers almost the entire chain of AI training infrastructure — from computing chips to network hardware, from cloud operations to model training parties.

What Problem MRC Protocol Solves

The core network challenges faced by large-scale AI training clusters:

Traditional approach problems:
┌─────┐    ┌─────┐    ┌─────┐
│GPU 0│────│GPU 1│────│GPU 2│  ← Single path dependency, any link failure causes training interruption
└─────┘    └─────┘    └─────┘
    │          │          │
    └──────────┴──────────┘
         Single network path

MRC approach improvement:
┌─────┐    ┌─────┐    ┌─────┐
│GPU 0│═══│GPU 1│═══│GPU 2│  ← Multipath reliable connection, automatic failover
└─────┘    └─────┘    └─────┘
    │   ╲    │   ╲    │
    │    ╲   │    ╲   │
    │     ╲  │     ╲  │
    └══════╲═┴══════╲═┘
      Multipath redundancy + reliable transport

Technical Advantages

Dimension	Traditional Approach	MRC Protocol
Network path	Single path, failure means interruption	Multipath redundancy, automatic failover
Reliability	Depends on physical link stability	Reliable connection layer, software-level fault tolerance
GPU utilization	Network issues cause GPU idle waiting	Reduced GPU wait time
Openness	Vendor-proprietary protocol (e.g., InfiniBand)	Open protocol, cross-vendor compatibility
Ecosystem support	Lock-in to specific vendor solutions	Six major tech giants jointly support, open standard

Why It Matters

1. AI Training Bottleneck Shifting from Compute to Network

As model sizes grow (from hundreds of billions to trillions of parameters), the number of GPUs in training clusters increases from hundreds to tens of thousands. As GPU count increases, network communication overhead and failure rates grow exponentially.

A typical trillion-parameter model training task:

Requires thousands of GPUs working simultaneously
Parameter synchronization between GPUs consumes significant network bandwidth
Network failure of any single GPU can cause the entire training task to pause

The MRC protocol directly addresses this pain point, reducing the impact of network failures on training through multipath redundancy and reliable connection layers.

2. Open Protocol vs. Proprietary Protocol Competition

Current AI training cluster network solutions are primarily monopolized by NVIDIA’s InfiniBand. The emergence of MRC as an open protocol means:

Reduced vendor lock-in risk: Cluster operators can mix network equipment from different vendors
Lower infrastructure costs: Competition from open protocols may reduce network equipment prices
Accelerated technical innovation: Multi-vendor participation drives protocol iteration

3. AMD Datacenter AI Business Growth Signal of 80%

On the same day, AMD announced that its datacenter AI business is expected to grow 80%, primarily driven by GPU/CPU orders from cloud and infrastructure operators. AMD specifically noted: market forecasts are now catching up to actual deployment cycles, signaling sustained demand ahead.

This echoes the release of the MRC protocol — the AI infrastructure market is at a turning point from planning to large-scale deployment.

Impact on the Industry

For Model Training Parties

Higher training stability: Reduced training interruptions and restarts caused by network issues
Lower GPU idle costs: Less GPU wait time for network, improved training efficiency
More flexible hardware choices: No longer locked into specific vendors’ network solutions

For Cloud Service Providers

Infrastructure differentiation competition: Cloud platforms supporting MRC protocol gain training efficiency advantage
Reduced operations complexity: Multipath redundancy reduces dependency on physical network stability

For Chip Vendors

New competitive dimension: Competition at the network protocol level will affect GPU/network chip market dynamics
Open ecosystem opportunities: Smaller vendors can enter the AI infrastructure market by supporting MRC protocol

Landscape Assessment

The release of the MRC protocol is a watershed event in the AI infrastructure domain. It marks:

Shifting bottleneck awareness in AI training — from “need more GPUs” to “need better networks”
Open protocols challenging proprietary protocol monopolies — InfiniBand’s moat is being eroded
Industry giants jointly setting standards — Meta, NVIDIA, AMD, Intel and others’ joint participation shows AI infrastructure standardization is accelerating

For China’s AI industry, there are two reasons to watch MRC protocol development: domestic large model training faces the same cluster network bottleneck issues; and the emergence of open protocols may lower the threshold for domestic vendors to access AI training infrastructure.