An Overlooked Efficiency Lever
Recently, a post about AI Agent practical experience in the Chinese developer community received 13,000 views and 76 likes:
“With the help of excellent large models from both China and the US, combined with open-source Agent frameworks like Hermes Agent and OpenClaw and their corresponding Harness Engineering, the efficiency of ‘bug hunting’ and ‘incident response’ has improved dramatically. This was unimaginable just a year or two ago.”
The core keyword of this post is Harness Engineering — it doesn’t refer to a specific tool, but rather a methodology for systematically orchestrating AI Agents to solve real engineering problems.
What Is “Harness Engineering”?
If models are the “engine” and Agent frameworks are the “chassis,” then Harness Engineering is the “driving skill” — with the same hardware configuration, different driving approaches can produce a 10x difference in output.
Specifically, Harness Engineering consists of three levels:
Level 1: Model Selection and Orchestration
Not simply “calling APIs,” but dynamically selecting models based on task characteristics:
Urgent bug fix → Claude Opus 4.7 (best code understanding)
↓
Batch code scanning → DeepSeek V4 Flash (low cost, high throughput)
↓
Architecture plan evaluation → GPT-5.5 (strong multi-step reasoning)
↓
Chinese document generation → Kimi K2.6 (Chinese context + long context)
This is exactly the strategy we described in our previous “multi-model routing” article. But in the context of Harness Engineering, this routing is automated — the Agent framework automatically selects the most suitable model based on the task description.
Level 2: Agent Workflow Design
“Bug hunting” (debugging) and “incident response” (fire fighting) are the two highest-frequency, most time-consuming tasks in developers’ daily work. After redesigning workflows with Agent frameworks:
Traditional debug workflow:
1. Read error logs (5 minutes)
2. Locate suspicious code (15-30 minutes)
3. Write test to reproduce (20 minutes)
4. Attempt fix (30-60 minutes)
5. Verify fix (10 minutes)
Total: 1.5 - 2 hours
Agent-assisted debug workflow:
1. Feed error logs to Agent (30 seconds)
2. Agent automatically locates suspicious files + generates fix suggestions (2 minutes)
3. Developer reviews suggestions, confirms direction (3 minutes)
4. Agent automatically writes tests + applies fix (3 minutes)
5. Agent automatically runs tests to verify (1 minute)
Total: 10 minutes
Efficiency improvement: approximately 10x.
Level 3: Feedback Loop and Continuous Optimization
True Harness Engineering is not a one-time configuration, but an ongoing feedback mechanism:
- Agent fix suggestion adoption rate → optimize prompts and model selection
- Task completion time vs expectations → adjust Agent workflow design
- Cost consumption distribution → migrate more tasks to lower-cost models
In Practice: Best Combinations of Domestic Models + Open-Source Agent Frameworks
Based on community feedback and actual testing, the following combinations perform best in “bug hunting” and “incident response” scenarios:
Combination A: OpenClaw + DeepSeek V4 Pro
| Dimension | Data |
|---|---|
| Model cost | DeepSeek V4 Pro API is approximately 1/40 of Claude Code |
| Agent framework | OpenClaw supports direct DeepSeek API connection |
| Applicable scenarios | Code generation/review, batch tasks, CI/CD integration |
| Advantage | Extremely low cost, performance gap with closed-source flagships is small |
A developer’s actual test feedback:
“I’ve basically switched my entire workflow to DeepSeek V4 Pro, and the experience is excellent. DeepSeek’s price is only 1/40 of Claude Code, and the performance compared to other models besides Claude Code isn’t much different.”
Combination B: Hermes Agent + Kimi K2.6
| Dimension | Data |
|---|---|
| Model cost | Kimi K2.6 subscription approximately $80/month (Coding Plan Max) |
| Agent framework | Hermes Agent desktop platform, supports multiple models |
| Applicable scenarios | Long document analysis, Chinese content, Agent cluster collaboration |
| Advantage | Kimi K2.6 supports 300 sub-Agent parallel + 4000 collaboration steps |
Combination C: Hybrid Routing (Ultimate Form)
Through LiteLLM or a custom routing layer, achieving fully automatic model selection:
routing_rules:
code_review:
primary: claude-opus-4.7
fallback: deepseek-v4-pro
cost_limit: $0.50/task
bug_fix:
primary: deepseek-v4-pro
fallback: kimi-k2.6
cost_limit: $0.20/task
long_context:
primary: kimi-k2.6 # 1 million tokens
fallback: deepseek-v4-pro # 1 million tokens
cost_limit: $0.30/task
batch_processing:
primary: deepseek-v4-flash
cost_limit: $0.05/task
Tool Ecosystem: Who Is Providing “Easy-to-Use” Harness Experience?
Notably, besides the two open-source frameworks OpenClaw and Hermes Agent, there are other products lowering the barrier to Harness Engineering:
- LazyCat: One of the few products in the world providing an easy-to-use Web interface for both OpenClaw and Hermes Agent, supporting direct connection to domestic models like Kimi, GLM, and DeepSeek — just fill in the AI Key and you’re ready to go
- Ollama Cloud: Provides cloud inference services for domestic models, deployment-free
- NVIDIA NIM: Offers free access to Chinese model APIs (reported on this site previously)
The common thread among these tools: they make Harness Engineering go from “requiring engineering skills” to “out-of-the-box.”
Landscape Assessment
The rise of Harness Engineering reflects a deeper trend: the focus of AI development is shifting from the “model layer” down to the “orchestration layer.”
When the capability gap between mainstream models narrows to 6-8 points (Intelligence Index), but the price gap is as high as 10x, the key to competition is no longer “whose model is stronger” but “who can better harness these models.”
In this paradigm:
- Open-source Agent frameworks (Hermes Agent, OpenClaw) are redefined — they are not “upper-layer wrappers for models” but “infrastructure for Harness Engineering”
- Domestic models’ cost advantage is amplified — because the core of Harness Engineering is “using the right tool for the right job,” and domestic models are already the “right tool” in most scenarios
- Developer competitiveness shifts from “familiarity with a certain API” to “the ability to design efficient Agent workflows”
Action Items
- If you’re still manually calling APIs: Try OpenClaw or Hermes Agent, configure common debug/code-review tasks as Agent workflows — efficiency could improve 5-10x
- If you’re evaluating Agent frameworks: Prioritize frameworks that support multi-model routing to avoid being locked into a single model
- If you’re leading a team: Include “Harness Engineering” in engineer skill requirements — developers who can’t harness Agents are like developers who don’t use IDEs, the efficiency gap is orders of magnitude
- If you’re building a startup: The Harness Engineering tool layer still has significant gaps (visual workflow editor, cost optimization engine, Agent performance monitoring) — it’s a good direction for entrepreneurship and investment