C
ChaoBro

Luo Fuli 3.5-Hour Interview: After Pre-train Gap Closes, Agent RL Becomes the Deciding Factor for Chinese Models

Luo Fuli 3.5-Hour Interview: After Pre-train Gap Closes, Agent RL Becomes the Deciding Factor for Chinese Models

Key Assessment

Luo Fuli, head of Xiaomi's large model team, gave a 3.5-hour technical interview in late April 2026—her first long-form public technical discussion since joining Xiaomi from Alibaba DAMO Academy and DeepSeek.

Core Viewpoints

1. Pre-train Gap Nearly Closed

Luo Fuli believes the gap between domestic top teams and Anthropic in pre-training is rapidly narrowing, and in some dimensions already closed.

Dimension Past Present
Model Quality International lead Gap significantly narrowed
Training Methods Insufficient experience Methodologies converging
Compute Scale Severely limited Optimizations can compensate
Competition Focus Pre-train scale Agent RL

2. Agent RL is Next Battleground

When pre-training is no longer a moat, competition shifts to Agent Reinforcement Learning:

  • Real environment interaction: Agents must learn in real toolchains, not just synthetic data
  • Multi-step decision making: From single-turn dialogue to multi-turn tool calling
  • Self-correction: Can agents discover and fix errors autonomously
  • Task decomposition: Planning and execution strategies for complex tasks

3. Talent Selection: Empty-Cup Mindset

Luo Fuli revealed her intern selection criteria—people with strong learning ability and curiosity:

People who can maintain an empty-cup mindset and think from first principles are rare. Strong learning ability gives them the power to quickly enter new roles.

From DeepSeek to Xiaomi: Technical Evolution

Phase Organization Core Direction
Alibaba DAMO Basic model pre-training Early LLM exploration
DeepSeek MoE + Open Source MiMo series MoE architecture
Xiaomi Edge-cloud + Agent MiMo series + hardware ecosystem

Industry Reflection on Claude Opus 4.6

Luo Fuli discussed the impact of Claude Opus 4.6 and similar 2026 technologies:

  • Anthropic path: Building complete developer toolchain via Claude Code → Cowork → Agent Teams
  • Domestic response: Cannot just follow; need differentiation in Agent RL and vertical scenarios
  • Open vs. Closed: Open source community feedback speed is irreplicable advantage

Recommendations

Role Action
Model Developers Make Agent RL core R&D direction; pre-train marginal returns diminishing
App Developers Use MiMo Orbit free quota, low-cost Agent scenario testing
Job Seekers Strengthen Agent framework and toolchain experience
Investors Focus on teams with Agent RL capabilities and real-scenario data