Inference · Google AI Blog
New ways to balance cost and reliability in the Gemini API
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
★ Tier-1 Source
Introducing Flex and Priority inference: advanced controls for developers to optimize costs and reliability through a single, unified interface.
Key facts
- Priority inference will be available to users with Tier 2 / 3 paid projects across the `GenerateContent` API and Interactions API endpoints
- Introducing Flex and Priority inference: advanced controls for developers to optimize costs and reliability through a single, unified interface
- Flex Inference is their new cost-optimized tier, designed for latency-tolerant workloads without the overhead of batch processing
- The new Priority Inference tier offers their highest level of assurance at a premium price point
Summary
Today, they are adding two new service tiers to the Gemini API: Flex and Priority. As AI evolves from simple chat into complex, autonomous agents, developers typically have to manage two distinct types of logic:. Background tasks: High-volume workflows like data enrichment or "thinking" processes that don't need instant responses. Interactive tasks: User-facing features like chatbots and copilots where high reliability is needed.