Culture

Google Cuts AI Costs With New Gemini 3.5 Flash Launch

4d ago

Major corporations are reportedly blowing through their annual artificial intelligence budgets faster than ever, with some exhausting their entire allocations by May. This rapidly escalating "token budget crisis" has prompted tech giants to scramble for solutions, and Google has stepped up with a significant response.

At its Google I/O conference on May 20, 2026, Alphabet CEO Sundar Pichai officially unveiled Gemini 3.5 Flash, a new AI model designed for speed and cost efficiency. The announcement came alongside a major restructuring of Google's AI subscription pricing, a move that had begun to surface in reports the day prior. On May 19, 2026, details emerged about a new $100 per month AI Ultra plan, while the existing top-tier AI Ultra plan saw its price drop from $250 to $200 per month. Gemini 3.5 Flash was made immediately available globally through various Google platforms, including the Gemini app, AI Mode in Google Search, Google AI Studio, Android Studio, and enterprise systems.

Sundar Pichai directly addressed the urgent need for more economical AI solutions during his keynote. He highlighted the staggering reality that top companies utilizing Google Cloud are collectively processing an estimated one trillion tokens per day, underscoring the immense and often underestimated consumption of compute resources by AI agents. Pichai stated, "Companies are already blowing through their annual token budgets and it's only May," emphasizing the rapid expenditure. He presented a compelling economic argument, suggesting that if enterprises were to shift 80% of their workloads from costly "frontier models" to a combination of Gemini 3.5 Flash and other advanced tools, the collective annual savings could exceed a billion dollars.

This strategic pivot by Google is rooted in its long-standing investment in infrastructure. The company’s structural cost advantage is attributed to its extensive network, the development of custom Tensor Processing Units (TPUs), and a substantial capital expenditure commitment. Reports indicate that Google plans to invest up to $190 billion in 2026, a significant jump from $31 billion in 2022, showcasing its dedication to building out its AI capabilities from the ground up.

Gemini 3.5 Flash is touted as Google's fastest frontier model within the 3.5 generation, engineered for speed, cost efficiency, and agentic reliability. It reportedly matches the performance of other leading frontier models while costing up to a third less. The model is specifically optimized for applications requiring low-latency and high-throughput, such as agentic workflows, automation pipelines, and real-time operations. It supports a substantial context window of 1,048,576 input tokens and 65,536 maximum output tokens, with its knowledge cutoff set at January 2025. While offering significant cost reductions, Gemini 3.5 Flash may trade some raw reasoning depth compared to its more powerful counterpart, Gemini 3.5 Pro, which is anticipated for release in June 2026. The reported pricing for Gemini 3.5 Flash is set at $1.50 per million input tokens and $9 per million output tokens.

The broader context reveals a growing anxiety among enterprise technology leaders regarding the spiraling costs of AI. The swift adoption of AI agents, which perform complex, multi-step tasks involving numerous model calls and extended context windows, consumes compute resources at a rate many finance teams did not foresee. Companies that initially approached AI as a productivity experiment are now confronting the substantial cost of integrating it as core operational infrastructure. Examples include Uber, which reportedly exhausted its entire 2026 budget for AI coding tools in just four months, and Microsoft, which allegedly canceled AI coding licenses in one of its divisions due to cost concerns. There was even a reported incident where an unspecified company incurred a $500 million bill in a single month for Claude credits because usage limits for employees were not properly set. This landscape has led directly to the "token budget crisis," with enterprises burning through their annual AI token allocations prematurely.

Google's move is a clear strategic play to win in the AI market by prioritizing value and cost-effectiveness, rather than solely relying on raw capability. The company's comprehensive control over its technology stack—from chips to cloud to models—allows it to offer more competitive pricing for AI inference. This approach mirrors Google's historical success in the search engine market, where it achieved global scale by being fast, efficient, and cost-effective. The restructuring of consumer-facing AI subscriptions also reflects this cost-conscious strategy, with the updated AI Ultra subscription and the introduction of new tiers. These plans also bundle features like Gemini Spark, a 24/7 AI agent, and priority access to Google Antigravity, an agent-first development platform. Google has also refined its usage metering system, moving from daily prompt caps to a compute-based credit model that refreshes every five hours, a system similar to what Anthropic uses for Claude Code. This change enables more precise billing based on task complexity rather than simple prompt counts.

This shift by Google aims to provide much-needed relief to businesses grappling with AI expenses, signaling a significant moment in the ongoing race to make advanced artificial intelligence both powerful and practical for everyday enterprise use. As companies continue to integrate AI deeply into their operations, the ability to manage costs effectively will be paramount, and Google is positioning itself as a key solution provider in this evolving landscape.