No press conference. No blog post. Google just flipped the switch on Gemini 3.1 Flash-Lite to GA via OpenRouter.
The pricing is blunt: $0.25/M input tokens, $1.50/M output. In today's small-model price war, that's not competitive - it's predatory.
Specs at a Glance
Flash-Lite isn't a watered-down Flash - it's a different lane entirely:
- Multimodal input: text, images, video, audio, PDF in → text out
- 1M context window: same tier as 3.1 Ultra
- Selectable reasoning levels: low / medium / high, tune per use case
- service_tier parameter: OpenRouter's new cost/latency toggle
The preview version gemini-3.1-flash-lite-preview stops updating May 11 and shuts off completely May 25. Google isn't leaving much runway here.
What This Price Means
Put Flash-Lite into the context of the current API price war:
$0.25/M input undercuts the cheapest front-tier models from just last month. If your workflow involves bulk document processing, translation, or high-frequency lightweight Agent calls, this isn't "worth considering" - it's "there's no reason not to switch."
$1.50/M output is reasonable too, but don't let the input price fool you. Long-response scenarios are where output tokens eat your budget. Flash-Lite's sweet spot is exactly the opposite: short outputs. Classification, summarization, translation, data cleaning.
Where It Fits with 3.2 Flash
Yesterday's piece on the Gemini 3.2 Flash leak mentioned Google reshuffling its naming scheme. Looking at the full lineup now, Flash-Lite is the bottom piece of a three-tier strategy:
| Tier | Positioning |
|---|---|
| 3.1 Ultra | Flagship, 2M context, most expensive |
| 3.2 Flash | Mid-range, speed/reasoning balance |
| 3.1 Flash-Lite | Low-cost, high-throughput, Agent bulk calls |
Three tiers, clear division of labor. Flash-Lite isn't here to out-reason Opus or GPT-5.5 - it competes on volume, not depth.
Who Should Use It, Who Shouldn't
Good fit:
- Pipelines processing large document/translation volumes
- High-frequency lightweight calls in Agent frameworks (tool selection, intent classification, format validation)
- Cost-sensitive batch jobs
Skip it:
- Complex reasoning tasks (coding, math, long-chain logic)
- Latency-critical scenarios unless you fine-tune service_tier
- Tasks requiring multimodal output (text output only)
One Observation
Google's choice to launch via OpenRouter instead of waiting for Google I/O is telling. Last month's Google I/O teaser bet the spotlight on Gemini Omni, while Flash-Lite - this kind of infrastructure-grade model - doesn't need a stage. It goes straight into the API catalog and developers find it themselves.
This quiet-release playbook is becoming Google's norm. No keynote, no marketing blitz, just drop the price low enough that the invoice does the talking.
You've got less than three weeks before the preview shuts off. If your pipeline is still running gemini-3.1-flash-lite-preview, time to migrate.
Primary sources:
- OpenRouter Gemini 3.1 Flash-Lite page
- Google DeepMind official X account (@GoogleDeepMind), post dated 2026-05-07
- OpenRouter announcement thread (preview deprecation schedule)