C
ChaoBro

Luo Fuli 3.5-Hour Interview: After Pre-train Gap Closes, Agent RL Becomes the Deciding Factor for Chinese Models

Luo Fuli 3.5-Hour Interview: After Pre-train Gap Closes, Agent RL Becomes the Deciding Factor for Chinese Models

Key Assessment

Luo Fuli, head of Xiaomi’s large model team, gave a 3.5-hour technical interview in late April 2026—her first long-form public technical discussion since joining Xiaomi from Alibaba DAMO Academy and DeepSeek.

Core Viewpoints

1. Pre-train Gap Nearly Closed

Luo Fuli believes the gap between domestic top teams and Anthropic in pre-training is rapidly narrowing, and in some dimensions already closed.

DimensionPastPresent
Model QualityInternational leadGap significantly narrowed
Training MethodsInsufficient experienceMethodologies converging
Compute ScaleSeverely limitedOptimizations can compensate
Competition FocusPre-train scaleAgent RL

2. Agent RL is Next Battleground

When pre-training is no longer a moat, competition shifts to Agent Reinforcement Learning:

  • Real environment interaction: Agents must learn in real toolchains, not just synthetic data
  • Multi-step decision making: From single-turn dialogue to multi-turn tool calling
  • Self-correction: Can agents discover and fix errors autonomously
  • Task decomposition: Planning and execution strategies for complex tasks

3. Talent Selection: Empty-Cup Mindset

Luo Fuli revealed her intern selection criteria—people with strong learning ability and curiosity:

People who can maintain an empty-cup mindset and think from first principles are rare. Strong learning ability gives them the power to quickly enter new roles.

From DeepSeek to Xiaomi: Technical Evolution

PhaseOrganizationCore Direction
Alibaba DAMOBasic model pre-trainingEarly LLM exploration
DeepSeekMoE + Open SourceMiMo series MoE architecture
XiaomiEdge-cloud + AgentMiMo series + hardware ecosystem

Industry Reflection on Claude Opus 4.6

Luo Fuli discussed the impact of Claude Opus 4.6 and similar 2026 technologies:

  • Anthropic path: Building complete developer toolchain via Claude Code → Cowork → Agent Teams
  • Domestic response: Cannot just follow; need differentiation in Agent RL and vertical scenarios
  • Open vs. Closed: Open source community feedback speed is irreplicable advantage

Recommendations

RoleAction
Model DevelopersMake Agent RL core R&D direction; pre-train marginal returns diminishing
App DevelopersUse MiMo Orbit free quota, low-cost Agent scenario testing
Job SeekersStrengthen Agent framework and toolchain experience
InvestorsFocus on teams with Agent RL capabilities and real-scenario data