Google Split Gemini API: No More User/Model Roles, Every Action Is a Separate Step

Google is changing Gemini's API structure. The change isn't huge, but the direction is clear.

Previously, Gemini API's interaction model was standard conversational: user sends a message, model replies. Round by round, roles clearly separated.

Now Google dismantled that model. In the new Gemini Interactions API, there are no longer strict user and model role distinctions. Each thought, each tool call, each output is represented as an independent step.

What does this mean?

From "Conversation" to "Workflow"

Standard conversation API is good for Q&A. You ask, I answer. Clean and simple.

But Agent scenarios are different. When an Agent completes a complex task, internal steps are messy: it might think first, then call tool A, get results and think again, then call tool B, realize it needs user input midway, and pause to wait.

The old user/model role model is awkward for this scenario. You had to disguise the Agent's internal thinking as user messages to the API, or package tool call results as model responses. Interface and actual behavior don't match.

The new step model exposes this complexity directly. Each action is a first-class citizen; the API no longer tries to stuff them into a conversation shell.

How exactly it changes

Google's official blog gave a key description:

"Instead of strict 'user' and 'model' roles, every action (from thinking to tool calls) is now represented as its own step."

This means developers can:

See the Agent's full thinking chain, not just the final answer
Intervene in specific steps during Agent execution, not just wait for a round to end
Serialize, persist, and replay multi-step Agent execution processes

Very practical for enterprise scenarios requiring audit and debugging.

Comparison with competitors

Anthropic's Claude API already has similar capabilities — its message API supports tool_use and tool_result as independent message types. OpenAI's Responses API is moving in this direction too.

What's special about Google's change is it didn't patch the existing API; it redesigned the interaction model. This implies Google is preparing for more complex multi-Agent collaboration scenarios.

If each step is independently addressable, then theoretically multiple Agents can interleave their own steps within the same Interaction without interfering with each other.

When will it be available

Google said this is "evolving," meaning still in progress. No GA timeline given.

But considering Google Cloud Next 2026 already demonstrated similar agentic workflow concepts, this API change will likely go GA soon.

My take

This is a "developers won't feel it but architecturally impactful" change. Regular users won't notice anything different, but people building on top of Gemini will find Agent development much smoother.

Worth watching: whether Google will launch accompanying Agent orchestration tools around this new step model, similar to Anthropic's MCP or Google's own ADK. If yes, the Agent development threshold for the Gemini ecosystem will drop noticeably.

Main sources:

From "Conversation" to "Workflow"

How exactly it changes

Comparison with competitors

When will it be available

My take

Related

Chrome DevTools Officially Releases MCP Server: AI Coding Agents Can Finally "See" the Browser

Google I/O 2026: The "Agentification" of Search Isn't an Upgrade, It's a Rewrite

Google's SynthID Watermarking Technology Adopted by Giants Like OpenAI and Nvidia: AI Content Provenance Enters the Standardization Era