Code as Agent Harness: When Code Is No Longer the Output—but the Operating System of Agents

If the 2024 LLM revolution was about “models writing code,” then the 2026 agentic AI revolution is about code driving models in reverse—code is no longer merely the output of an Agent; it has become the Agent’s own operating system.

This survey paper, authored by 42 researchers—including leading scientists from top academic institutions and industry labs—topped Hugging Face’s Daily Papers #1 today, proposing a unified theoretical framework: Code as Agent Harness.

Core Thesis

The paper’s central claim is clear: in emerging agentic systems, code has transcended its role as a “target output” to become the Agent’s reasoning substrate, execution engine, environment modeling tool, and execution-based verification infrastructure.

The authors formalize this shift across three layers:

Layer 1: Harness Interface

How code connects the Agent to reasoning, action, and environment modeling. This goes beyond simple “API calls”: code serves as the structural backbone of the Agent’s perception–decision–action loop.

Layer 2: Harness Mechanisms

Capabilities critical for long-horizon execution:

Planning: How code structures task decomposition and execution sequencing
Memory: Code state as persistent memory
Tool Use: Code as the glue layer for tool invocation and orchestration
Feedback-driven Control: Adaptive optimization based on execution outcomes

Layer 3: Scaling to Multi-Agent

When code becomes shared artifacts (shared code artifacts), coordination, auditing, and verification among multiple agents gain a unified semantic foundation. The paper discusses open challenges in this direction—especially regarding cross-agent state consistency and human oversight for safety-critical operations.

Application Landscape

The paper surveys applications of Code as Agent Harness across diverse domains:

Coding assistants (e.g., Claude Code, Cursor)
GUI/OS automation
Embodied agents
Scientific discovery
Personalization and recommendation
DevOps
Enterprise workflows

Open Challenges

The paper does not shy away from hard problems. Several key open challenges are presented candidly:

Evaluation must go beyond final task success rates—intermediate harness states and decision quality must be assessed
Verification under incomplete feedback—how to judge whether a harness is “correct” when environmental feedback is sparse or noisy
Regression-free harness improvement—how to modify harness code without introducing performance degradation
Extension to multimodal environments—the current framework primarily targets text/code environments

One-Sentence Summary

The paper’s value lies not in introducing novel technology, but in unifying disparate “code-driven agent” practices—scattered across research directions—into a single, coherent theoretical framework. For engineers building agentic AI systems, this roadmap is more valuable than any single technical paper.

The paper and associated code are open-sourced. For Agent developers, this may be one of the most worthwhile surveys to read closely this year.

Primary source:

arXiv:2605.18747 — “Code as Agent Harness” survey paper

Core Thesis

Layer 1: Harness Interface

Layer 2: Harness Mechanisms

Layer 3: Scaling to Multi-Agent

Application Landscape

Open Challenges

One-Sentence Summary

Related

APWA: A Distributed Architecture for True Parallelization in Multi-Agent Systems

Dual-Dimensional Consistency: A New Method to Save 10x Tokens During Inference-Time Scaling

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory Capabilities