C
ChaoBro

USTC ACC Paper: Compiling Agent Trajectories into Long-Context Training Data—Bold Idea

USTC ACC Paper: Compiling Agent Trajectories into Long-Context Training Data—Bold Idea

Training a smarter Agent usually comes down to two paths: feed it more high-quality instruction-tuning data, or let it explore in the environment on its own. But both have clear短板—instruction data covers limited scenarios, and autonomous exploration is too inefficient.

A research team from USTC submitted a paper today proposing a third path: "compile" Agent run trajectories into training data.

The paper is called ACC (Agent trajectory Compilation for long-Context training).

Trajectories Are Not Logs, They're Textbooks

Most Agent system run logs are only used for debugging. ACC's insight: these trajectories themselves contain structured information about model reasoning—when it called tools, when it needed more context, when it made a wrong decision and corrected itself.

The core challenge of compiling trajectories into training data is extracting "why it did this" rather than "what it did." If the model just learns to mimic the action sequence in a trajectory, it learns surface behavior and fails on new scenarios.

ACC's approach extracts key decision points and reasoning paths from trajectories as long-context training samples. During training, the model sees not just "input → output" but the full chain: "input → intermediate thinking → tool call → result → final output."

Why Long Context Matters

Agent reasoning is often long. A complex task may need a dozen tool calls, interspersed with information retrieval and reasoning corrections. These intermediate steps constitute the model's "thinking process," but traditional instruction tuning usually only keeps the final output.

ACC preserves these intermediate steps as part of long-context training. During inference, the model can see a more complete "how someone before thought about this problem" rather than just a cold final answer.

Just Hit HuggingFace Daily Papers

This paper got 36 upvotes today, submitted by ustc-community. Full paper details aren't fully available yet—need to wait for the complete arXiv page.

But from a direction perspective, this aligns well with the field (Agent training data construction). The bottleneck of Agent capability is increasingly not the model itself, but "how to teach the model to use tools correctly." ACC provides a structured method to turn Agent run experience into training signals.

Two Open Questions

First, how to ensure compilation quality? An Agent takes 50 steps to complete a task, but only 5 of those steps may be truly critical. How to extract those 5 from 50, rather than feeding noise to the model?

Second, trajectory formats vary widely across Agent systems. Some use ReAct, some use LangGraph, some have custom formats. Can ACC's compilation method work cross-framework, or is this an engineering problem that still needs solving?


Primary sources:

  • ACC paper (USTC Community, May 22, 2026)
  • Hugging Face Daily Papers (36 upvotes)