GPU (Graphics Processing Unit)

A graphics processing unit (GPU) is a specialized chip designed for highly parallel computation, making it the dominant hardware for training and running modern AI models. Nvidia GPUs (H100, H200, B100/B200) power the majority of frontier-model training and inference; AMD’s MI300 series and custom hyperscaler accelerators (Google TPU, AWS Trainium) make up most of the rest.

How it works

GPUs contain thousands of small cores that can execute the same operation on different data simultaneously. Neural-network training and inference are dominated by matrix multiplications — exactly the workload GPUs were built to accelerate. Modern AI clusters wire tens of thousands of GPUs together with high-bandwidth interconnects so a single model can be trained across the whole fleet.

Why it matters

GPU supply is the binding constraint on AI today. Hyperscaler AI capex is projected to exceed $600B in 2026, with most of it going to GPUs and the data centers and power to run them. The economics also flow the other way: falling per-token inference cost — down roughly 280-fold from late 2022 to late 2024 — is a direct result of better GPUs and software. See AI Infrastructure 2026.

Related terms: Hyperscaler · Capex · AI Inference · All glossary entries