Gemini 3.1 Flash-Lite Goes GA: Google Drops API Pricing to $0.25/M

No press conference. No blog post. Google just flipped the switch on Gemini 3.1 Flash-Lite to GA via OpenRouter.

The pricing is blunt: $0.25/M input tokens, $1.50/M output. In today's small-model price war, that's not competitive - it's predatory.

Specs at a Glance

Flash-Lite isn't a watered-down Flash - it's a different lane entirely:

Multimodal input: text, images, video, audio, PDF in → text out
1M context window: same tier as 3.1 Ultra
Selectable reasoning levels: low / medium / high, tune per use case
service_tier parameter: OpenRouter's new cost/latency toggle

The preview version gemini-3.1-flash-lite-preview stops updating May 11 and shuts off completely May 25. Google isn't leaving much runway here.

What This Price Means

Put Flash-Lite into the context of the current API price war:

$0.25/M input undercuts the cheapest front-tier models from just last month. If your workflow involves bulk document processing, translation, or high-frequency lightweight Agent calls, this isn't "worth considering" - it's "there's no reason not to switch."

$1.50/M output is reasonable too, but don't let the input price fool you. Long-response scenarios are where output tokens eat your budget. Flash-Lite's sweet spot is exactly the opposite: short outputs. Classification, summarization, translation, data cleaning.

Where It Fits with 3.2 Flash

Yesterday's piece on the Gemini 3.2 Flash leak mentioned Google reshuffling its naming scheme. Looking at the full lineup now, Flash-Lite is the bottom piece of a three-tier strategy:

Tier	Positioning
3.1 Ultra	Flagship, 2M context, most expensive
3.2 Flash	Mid-range, speed/reasoning balance
3.1 Flash-Lite	Low-cost, high-throughput, Agent bulk calls

Three tiers, clear division of labor. Flash-Lite isn't here to out-reason Opus or GPT-5.5 - it competes on volume, not depth.

Who Should Use It, Who Shouldn't

Good fit:

Pipelines processing large document/translation volumes
High-frequency lightweight calls in Agent frameworks (tool selection, intent classification, format validation)
Cost-sensitive batch jobs

Skip it:

Complex reasoning tasks (coding, math, long-chain logic)
Latency-critical scenarios unless you fine-tune service_tier
Tasks requiring multimodal output (text output only)

One Observation

Google's choice to launch via OpenRouter instead of waiting for Google I/O is telling. Last month's Google I/O teaser bet the spotlight on Gemini Omni, while Flash-Lite - this kind of infrastructure-grade model - doesn't need a stage. It goes straight into the API catalog and developers find it themselves.

This quiet-release playbook is becoming Google's norm. No keynote, no marketing blitz, just drop the price low enough that the invoice does the talking.

You've got less than three weeks before the preview shuts off. If your pipeline is still running gemini-3.1-flash-lite-preview, time to migrate.

Primary sources:

OpenRouter Gemini 3.1 Flash-Lite page
Google DeepMind official X account (@GoogleDeepMind), post dated 2026-05-07
OpenRouter announcement thread (preview deprecation schedule)

Specs at a Glance

What This Price Means

Where It Fits with 3.2 Flash

Who Should Use It, Who Shouldn't

One Observation

Related

Chrome DevTools Officially Releases MCP Server: AI Coding Agents Can Finally "See" the Browser

Google I/O 2026: The "Agentification" of Search Isn't an Upgrade, It's a Rewrite

Google's SynthID Watermarking Technology Adopted by Giants Like OpenAI and Nvidia: AI Content Provenance Enters the Standardization Era