Luo Fuli 3.5-Hour Interview: After Pre-train Gap Closes, Agent RL Becomes the Deciding Factor for Chinese Models

Key Assessment

Luo Fuli, head of Xiaomi's large model team, gave a 3.5-hour technical interview in late April 2026—her first long-form public technical discussion since joining Xiaomi from Alibaba DAMO Academy and DeepSeek.

Core Viewpoints

1. Pre-train Gap Nearly Closed

Luo Fuli believes the gap between domestic top teams and Anthropic in pre-training is rapidly narrowing, and in some dimensions already closed.

Dimension	Past	Present
Model Quality	International lead	Gap significantly narrowed
Training Methods	Insufficient experience	Methodologies converging
Compute Scale	Severely limited	Optimizations can compensate
Competition Focus	Pre-train scale	Agent RL

2. Agent RL is Next Battleground

When pre-training is no longer a moat, competition shifts to Agent Reinforcement Learning:

Real environment interaction: Agents must learn in real toolchains, not just synthetic data
Multi-step decision making: From single-turn dialogue to multi-turn tool calling
Self-correction: Can agents discover and fix errors autonomously
Task decomposition: Planning and execution strategies for complex tasks

3. Talent Selection: Empty-Cup Mindset

Luo Fuli revealed her intern selection criteria—people with strong learning ability and curiosity:

People who can maintain an empty-cup mindset and think from first principles are rare. Strong learning ability gives them the power to quickly enter new roles.

From DeepSeek to Xiaomi: Technical Evolution

Phase	Organization	Core Direction
Alibaba DAMO	Basic model pre-training	Early LLM exploration
DeepSeek	MoE + Open Source	MiMo series MoE architecture
Xiaomi	Edge-cloud + Agent	MiMo series + hardware ecosystem

Industry Reflection on Claude Opus 4.6

Luo Fuli discussed the impact of Claude Opus 4.6 and similar 2026 technologies:

Anthropic path: Building complete developer toolchain via Claude Code → Cowork → Agent Teams
Domestic response: Cannot just follow; need differentiation in Agent RL and vertical scenarios
Open vs. Closed: Open source community feedback speed is irreplicable advantage

Recommendations

Role	Action
Model Developers	Make Agent RL core R&D direction; pre-train marginal returns diminishing
App Developers	Use MiMo Orbit free quota, low-cost Agent scenario testing
Job Seekers	Strengthen Agent framework and toolchain experience
Investors	Focus on teams with Agent RL capabilities and real-scenario data

Key Assessment

Core Viewpoints

1. Pre-train Gap Nearly Closed

2. Agent RL is Next Battleground

3. Talent Selection: Empty-Cup Mindset

From DeepSeek to Xiaomi: Technical Evolution

Industry Reflection on Claude Opus 4.6

Recommendations

Related

Presenton Is Not "Just Another AI PPT": It Turns Presentations into a Deployable Generation Workflow

The Real Appeal of Midscene: UI Automation Can Finally Ditch Fragile Selectors

A New Closed Loop for Frontend Debugging: Chrome DevTools MCP Reduces Guesswork for Coding Agents