DeepSeek V4 Pro Matches GPT-5.2 on FoodTruck Bench: US-China Frontier Gap Shrinks to 10 Weeks

DeepSeek V4 Pro Matches GPT-5.2 on FoodTruck Bench: US-China Frontier Gap Shrinks to 10 Weeks

Core Signal

DeepSeek V4 Pro has matched GPT-5.2’s performance on the FoodTruck Bench agentic evaluation. This marks the first Chinese model to enter the frontier tier in this evaluation system.

The real kicker? Cost efficiency: DeepSeek V4 Pro costs approximately 1/8 of GPT-5.2—and when adjusted for equivalent output quality, the cost gap actually reaches 17 times.

What Is FoodTruck Bench?

FoodTruck Bench is an evaluation benchmark focused on agentic capabilities, measuring a model’s ability to autonomously plan, call tools, perform multi-step reasoning, and execute tasks in real-world scenarios. Unlike traditional static Q&A evaluations, it requires the model to complete end-to-end workflows like a real “digital employee.”

The evaluation team stated in their official announcement:

“DeepSeek V4 Pro just matched GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~8× cheaper. First Chinese model in our frontier tier.”

There are three layers of information behind this statement worth unpacking:

Layer one: Capability parity. DeepSeek V4 Pro performs on par with GPT-5.2 on agentic tasks. Given that GPT-5.2 is one of OpenAI’s strongest general-purpose models today, this is a milestone with symbolic significance.

Layer two: The time gap. “10 weeks later”—the evaluators deliberately emphasized the time difference. The gap between US and Chinese frontier models was previously estimated at about a year. It has now been compressed to under three months.

Layer three: Cost advantage. An 8x price difference means that if enterprises replace GPT-5.2 with DeepSeek V4 Pro for the same agentic workloads, annual API spending can drop from the million-dollar range to the hundred-thousand-dollar range.

Independent Verification

This news has been cross-validated by multiple sources:

  • Caisi Evaluations analysis indicates that while DeepSeek V4’s overall capabilities lag behind US frontier models by approximately 8 months, the V4 Pro version—through optimized reasoning paths and tool-calling strategies—has caught up on agentic tasks.
  • Multiple independent developers shared their hands-on experience with DeepSeek V4 Pro on X: “Now, a week in… it’s seamless man.” The transition from an initial adjustment period to smooth daily use means DeepSeek V4 Pro can already replace certain GPT scenarios in real workflows.
  • Notably, DeepSeek V4 Pro’s integration with Claude Code has also been established—switching requires just three environment variables, giving developers a plug-and-play alternative.

Practical Implications for Developers

Cost decision window: If you’re running high-frequency agentic workloads (data scraping, code generation, automated reports), now is the time to reassess your model selection. DeepSeek V4 Pro’s performance on agentic tasks no longer requires “settling”—it’s a genuine alternative.

Multi-model strategy: The risk of single-model dependence is growing in 2026. A sensible approach is to build a model matrix: GPT-5.2 for core tasks requiring the highest reliability, DeepSeek V4 Pro for high-volume, cost-sensitive agentic loops, and the Claude 4 series for scenarios demanding fine-grained reasoning.

Open-source ecosystem dividend: The DeepSeek series has always maintained an open-source tradition. While V4 Pro is currently available primarily via API, the transparency of its technical roadmap means community adaptation tools will emerge rapidly. Open-source projects like deepclaude have already proven this.

What to Watch Next

  • Whether FoodTruck Bench will include more Chinese enterprise models (Qwen, Kimi, GLM) in its next evaluation round
  • Whether DeepSeek V4 Pro’s API pricing will decrease further as scale effects kick in
  • OpenAI’s pricing response to GPT-5.2

The competition between US and Chinese frontier models is shifting from a “capability gap” narrative to a “cost-performance race.” DeepSeek V4 Pro’s performance on FoodTruck Bench sends a clear signal: Chinese models are no longer just “cheap alternatives”—they are starting to become “the better choice” in certain dimensions.