SGLang and Miles Deliver Day-0 DeepSeek-V4 Inference and RL Training

On the day DeepSeek-V4 launched, alongside the model discussions, there was a quieter but significant announcement: SGLang and Miles completed inference and RL training support for DeepSeek-V4 on Day 0.

In an April 25 blog post, LMSYS wrote: "SGLang and Miles form the first open-source stack to serve and train DeepSeek-V4 on launch day."

The first open-source stack to go fully operational on a new model's release day. This speed isn't coincidence — it's a measure of infrastructure maturity.

What Day-0 support means

The traditional pattern: a new model releases, the community waits days or weeks to get it running. Model architecture needs adaptation, inference parameters need tuning, training scripts need modification. For MoE models, the process is even more complex — expert routing, activation parameter configuration all require specialized handling.

Day-0 support breaks this waiting cycle. The model launches, and the inference service and RL training framework are already ready. This means:

The community can test immediately. No waiting for adaptation code — just run it. For researchers, this means faster validation of DeepSeek-V4's performance on real tasks, rather than getting stuck on "can I even run it?"

RL training can start right away. Miles is LMSYS's large-scale RL post-training framework. Day-0 support means RLHF or DPO training pipelines can launch immediately after getting model weights, without waiting for framework adaptation.

The SGLang + Miles combination

These two components work together as a coordinated stack:

SGLang handles inference serving. Its continuous batching and radix attention cache are already industry-standard optimizations. Day-0 support for DeepSeek-V4 means direct deployment capability.

Miles handles large-scale RL post-training. From PPO to DPO to the latest GRPO, Miles covers mainstream RL training paradigms. Its distributed training architecture is designed for hundred-billion to trillion-parameter models.

Together, they form a complete open-source stack from inference to training to deployment. The key characteristic: no reliance on any closed-source components. From model weights to inference engine to training framework, everything is open source.

Why this timing matters

DeepSeek-V4 is a typical open-source flagship model — trillion-parameter MoE architecture, performance comparable to leading closed-source models. But the value of open-source models isn't just weight publication; it's whether the entire ecosystem can keep up quickly.

If a model releases but the inference framework takes two weeks and the training framework takes a month, the model's actual impact is significantly diminished. Day-0 support narrows the gap between "release" and "usable."

This is also a key difference between open-source and closed-source ecosystems. When OpenAI releases a new GPT version, only OpenAI can use it immediately. When DeepSeek releases a new version, the entire open-source community can use it immediately — provided the infrastructure is ready.

SGLang and Miles's Day-0 support shows that this infrastructure readiness is becoming reality.

Implications for future model releases

With this precedent, the release rhythm for future open-source models may change. Model teams won't just need to publish weights — they'll need to coordinate with infrastructure teams like SGLang and Miles ahead of time to ensure Day-0 support.

It's similar to how Linux kernel releases handle new hardware drivers — new CPUs launch with kernel drivers already prepared. Users don't need to compile drivers themselves; it works out of the box.

DeepSeek-V4 is just the beginning. For the next open-source flagship model, Day-0 support may become standard.

Sources:

LMSYS Blog: DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles

What Day-0 support means

The SGLang + Miles combination

Why this timing matters

Implications for future model releases

Related

flue: Astro Co-founder Open-Sources an AI Agent Sandbox Framework

LMSYS P2P Weight Transfer: Second-Level Sync for 1T Parameter RL Training

9router 7.6K star：绕过 AI 编码限额的开源代理，接入了 40+ 模型提供商