Cloudflare Workers AI Refreshes Model Catalog: GLM-4.7-Flash and Gemma-4-26B Enter, Old Models Deprecating May 30

Cloudflare Workers AI's model catalog just got a blood transfusion.

New GLM-4.7-Flash and Gemma-4-26B-A4B-IT are listed, old Llama and Kimi models marked for deprecation — gone after May 30.

If you're running inference on Workers AI, you need to check your model dependencies now.

New Models: GLM-4.7-Flash and Gemma-4-26B-A4B-IT

GLM-4.7-Flash from Zhipu, positioned as lightweight fast inference. The "Flash" naming logic is clear: sacrifice some reasoning depth for speed and cost advantage. If your scenario is high-frequency calls, latency-sensitive, Flash is the right choice.

Gemma-4-26B-A4B-IT is Google's Gemma 4 MoE version, 26B total params, 4B active, instruction-tuned. This model's positioning is "strong among mid-sized models" — manageable parameter count but doesn't lag behind larger models in instruction following and code capability.

Real Impact of Old Model Deprecation

The old Llama and Kimi models marked for deprecation — Cloudflare didn't list the full specifics in the announcement. But if you hardcoded model names in your project (like directly writing @cf/meta/llama-3-8b in code), those calls will error out after May 30.

Two things to do:

Check Workers AI-related code, confirm which models you're using
Complete migration before May 30 — GLM-4.7-Flash and Gemma-4-26B can likely replace old Llama models directly

Why Cloudflare Swaps Models So Frequently

Workers AI's model catalog update cadence is faster than most cloud providers. This isn't a bug, it's by design.

Cloudflare's positioning for Workers AI is "edge inference platform" — not a training platform, not a large model hosting platform, but a place for developers to run inference at nodes closest to users. This means models must be small enough, fast enough, and cheap enough.

When a model stops meeting any of these three conditions, it gets replaced.

Advice for Developers

If you're building edge-side AI apps (chatbots, content moderation, simple text generation), Workers AI remains one of the most convenient choices. But you need to accept one fact: models aren't your asset, they're a platform service.

This means you should abstract model names into configuration items, not hardcode them in business logic. When the platform swaps models, you only need to change one line of config.

→ Further reading: Cloudflare Agent Memory Technical | OpenClaw Git Scan Controversy

Primary sources:

New Models: GLM-4.7-Flash and Gemma-4-26B-A4B-IT

Real Impact of Old Model Deprecation

Why Cloudflare Swaps Models So Frequently

Advice for Developers

Related

Chrome DevTools Officially Releases MCP Server: AI Coding Agents Can Finally "See" the Browser

Google I/O 2026: The "Agentification" of Search Isn't an Upgrade, It's a Rewrite

Google's SynthID Watermarking Technology Adopted by Giants Like OpenAI and Nvidia: AI Content Provenance Enters the Standardization Era