Google has introduced Ironwood, its seventh-generation Tensor Processing Unit (TPU), marking a major leap in AI inference capabilities. Unveiled during Google Cloud Next ‘25, Ironwood is touted as the company’s most powerful and scalable custom AI chip to date, purpose-built to handle the growing demands of modern AI systems.
Designed specifically for AI inference—the process of executing trained models to generate results—Ironwood will soon be available to developers through Google Cloud.
Ironwood: Power and Scalability Combined
In a blog post, Google revealed that Ironwood will accelerate its shift from a reactive to a proactive AI infrastructure, supporting dense large language models (LLMs), mixture-of-experts (MoE) architectures, and agentic AI systems that autonomously retrieve and generate data.
Each Ironwood chip delivers a peak performance of 4,614 TFLOPs, significantly surpassing its predecessor Trillium, introduced in May 2024. These TPUs are designed for deep learning workloads, boasting high parallelism and energy efficiency.
Cluster-Ready for AI at Scale
Ironwood is scalable up to 9,216 liquid-cooled chips, connected via Google’s high-speed Inter-Chip Interconnect (ICI). At full configuration, it achieves 42.5 Exaflops—over 24 times the compute capacity of El Capitan, the world’s largest current supercomputer.
Key Features:
- Compute: 4,614 TFLOPs per chip
- Memory: 192GB per chip (6x more than Trillium)
- Bandwidth: 7.2Tbps
- Cluster options: 256 or 9,216 chip configurations
- Cloud deployment: Coming soon on Google Cloud
Ironwood TPUs are part of Google’s AI Hypercomputer architecture, built to push the limits of model performance and responsiveness.
Currently, the new TPUs are not yet publicly available. Google is expected to first integrate Ironwood into internal products such as the Gemini AI models, before offering the hardware to external developers via Google Cloud.