Agent Bugs in Production? This Paper Traces the Problem to an Overlooked Boundary

Why do production LLM Agents break?

Most answers: model isn't good enough, prompt isn't right, tool calling has bugs.

This paper offers a different angle: the problem may lie at the boundary between stochastic model outputs and deterministic systems—a boundary that has never been treated as a formal architectural object.

SDB: Stochastic-Deterministic Boundary

The authors name it: Stochastic-Deterministic Boundary.

It's a four-part contract:

Proposer: LLM generates candidate output
Verifier: checks if the output meets constraints
Commit step: turns verified output into system action
Reject signal: what to do when it fails

The paper argues that SDB is the load-bearing primitive of production agent runtimes.

Six runtime patterns

Around SDB, the authors organize Agent runtime design into three concerns: Coordination, State, Control.

Then they borrow six patterns from distributed systems, each mapping to different scenarios:

Hierarchical Delegation: conversational agents
Scatter-Gather + Saga: parallel sub-tasks that need aggregation
Event-Driven Sequencing: async task flows
Shared State Machine: multi-agent collaboration
Supervisor + Gate: autonomous agents
Human-in-the-Loop: critical decisions needing human review

Each pattern traces back to classical distributed systems concepts, but the paper identifies what changes when the worker becomes stochastic (an LLM).

A key failure mode: Replay Divergence

The paper proposes a failure mode I hadn't seen named before: Replay Divergence.

Scenario: you record all agent inputs/outputs in a deterministic event log. Later, you change the model version or prompt and replay the same log—the downstream outputs differ.

This wouldn't happen in traditional distributed systems. But in LLM Agents, it's inevitable. LLMs are stochastic; the same input can produce different outputs.

Naming this matters for debugging and auditing.

Practical takeaways

If you're running Agents in production:

Define your SDB explicitly. Don't pipe LLM output directly into system workflows. Define: who proposes, who verifies, how to commit, how to backtrack.
Pattern choice matters more than model choice. As model variance decreases, pattern choice and SDB strength become the more important levers for long-run reliability.
There's a framework for failure diagnosis. The paper provides a five-step methodology that maps production failures to pattern weaknesses.

Paper: Production LLM Agent Runtime Patterns

SDB: Stochastic-Deterministic Boundary

Six runtime patterns

A key failure mode: Replay Divergence

Practical takeaways

Related

Chrome DevTools Officially Releases MCP Server: AI Coding Agents Can Finally "See" the Browser

Google I/O 2026: The "Agentification" of Search Isn't an Upgrade, It's a Rewrite

Google's SynthID Watermarking Technology Adopted by Giants Like OpenAI and Nvidia: AI Content Provenance Enters the Standardization Era