Google AI · 大模型

New ways to balance cost and reliability in the Gemini API

Google 在 Gemini API 中新增 Flex 和 Priority 两种推理层级，让开发者可以根据场景在成本与延迟之间灵活取舍。Flex 优先保障吞吐量、适合批量处理；Priority 则争取更快的响应时间。两者共享同一模型能力，不改变生成质量，但计费方式和配额分配有所不同。这是 API 层面的一次务实调整，有助于降低开发者的实际使用门槛。

域名: blog.google
评分: 3 · 可关注
发布: 2026-04-02

访问项目本体

导读

这条暂时没有深度导读，点上方「访问项目本体」直接到源页面查看。

原文摘要

Google is introducing two new inference tiers to the Gemini API, Flex and Priority, to balance cost and latency.

Back to Latest