When Google held its Cloud Next conference last week at the Mandalay Bay convention center in Las Vegas, the atmosphere was the kind of thing you only witness when an industry feels like it’s on the verge of something. The demo floors were packed with engineers. Near the press rooms, reporters gathered. Additionally, Google and Nvidia officially announced on stage that their ten-year collaboration has entered a new phase centered around a class of cloud supercomputers that most businesses could only imagine a year ago.
Google kept returning to the headline number, which was eye-catching. In comparison to the previous generation, the new A5X bare-metal instances, which are powered by Nvidia’s Vera Rubin NVL72 rack-scale systems, promise up to ten times lower inference cost per token and ten times higher throughput per megawatt. Perhaps the more intriguing of the two figures is the throughput-per-megawatt one. Energy has quietly emerged as the true AI bottleneck, and data center operators, who have been at odds with utility companies for the past eighteen months, tend to pay particular attention to any company that claims to be able to solve it.
The announcements give the impression that Google is no longer attempting to demonstrate its competitiveness in AI infrastructure. It’s attempting to demonstrate its ability to lead. The A5X clusters can accommodate up to 80,000 Rubin GPUs at one location and up to 960,000 at several locations. Government-funded science labs used to have numbers like that. They are now being offered API keys to startups.

The client list resembles a who’s-who of the current AI scene. The startup led by Mira Murati, Thinking Machines Lab, is using Nvidia GB300 systems to scale its Tinker API on Google’s A4X Max virtual machines. In order to reduce the cost of A/B testing, Snap is moving data pipelines to GPU-accelerated Spark on Google Cloud, while OpenAI is using the same family of chips for large-scale inference workloads, including ChatGPT. CrowdStrike, Schrödinger, and Salesforce are all present.
It’s worth pondering that last name. Nvidia’s NeMo libraries on Google Cloud’s Blackwell GPUs are being used by CrowdStrike, which is still rebuilding its reputation after its 2024 outage, to create artificial data for cybersecurity training. It’s a subtly intriguing detail that shows how this infrastructure is permeating sectors unrelated to chatbots.
Confidential computing is another aspect of the announcement that hasn’t received enough attention. Google is now providing Nvidia Blackwell GPUs in encrypted environments for the first time, making it impossible for infrastructure operators to see the training data or prompts. This is more important than the headline GPU counts for regulated industries, finance, healthcare, and defense contractors navigating export regulations. It’s the kind of feature that makes AI a tool approved for procurement rather than just an experiment.
It’s difficult to overlook how heavily this partnership depends on Nvidia. Earlier this spring, Anthropic and Broadcom announced their own counter-bet, creating custom silicon designed to undermine Nvidia’s hegemony. Google continues to create its own TPUs. However, Google decided to highlight Nvidia at its own conference. It’s possible that the calculation is less complicated than the strategic narratives imply: customers are the ones writing the checks, and they want Nvidia.
As this develops, it’s easy to make historical comparisons. The 2010s cloud infrastructure wars changed the structure of software firms. Every startup operating on the racks belonged to whoever owned them. With Nvidia and a few hyperscalers positioned closer to the center of every model trained, agent deployed, and robot simulated, AI may be following a similar trajectory. No one really knows yet whether that consolidation results in quicker progress or simply a tighter grip on who gets to build what. As of right now, a number of very ambitious companies are lining up to use the GPUs that are currently spinning up.
