NVIDIA GB200 NVL72

Powering the new era of computing.

Introduction
Highlights
Features
Specs

Introduction

Introduction
Highlights
Features
Specs

Unlocking Real-Time Trillion-Parameter Models

GB200 NVL72 connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale, liquid-cooled design. It boasts a 72-GPU NVLink domain that acts as a single, massive GPU and delivers 30X faster real-time trillion-parameter large language model (LLM) inference.

The GB200 Grace Blackwell Superchip is a key component of the NVIDIA GB200 NVL72, connecting two high-performance NVIDIA Blackwell Tensor Core GPUs and an NVIDIA Grace™ CPU using the NVIDIA NVLink™-C2C interconnect to the two Blackwell GPUs.

The Blackwell Rack-Scale Architecture for Real-Time Trillion-Parameter Inference and Training

The NVIDIA GB200 NVL72 is an exascale computer in a single rack. With 36 GB200s interconnected by the largest NVIDIA® NVLink® domain ever offered, NVLink Switch System provides 130 terabytes per second (TB/s) of low-latency GPU communications for AI and high-performance computing (HPC) workloads.

Tech Blog

Highlights

Supercharging Next-Generation AI and Accelerated Computing

LLM Inference

30X vs. NVIDIA H100 Tensor Core GPU

LLM Training

4X vs. H100

Energy Efficiency

25X vs. H100

Data Processing

18X vs. CPU

LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768 input/1,024 output, NVIDIA HGX™ H100 scaled over InfiniBand (IB) vs. GB200 NVL72, training 1.8T MOE 4096x HGX H100 scaled over IB vs. 456x GB200 NVL72 scaled over IB. Cluster size: 32,768
A database join and aggregation workload with Snappy / Deflate compression derived from TPC-H Q4 query. Custom query implementations for x86, H100 single GPU and single GPU from GB200 NLV72 vs. Intel Xeon 8480+
Projected performance subject to change.

Real-Time LLM Inference

GB200 NVL72 introduces cutting-edge capabilities and a second-generation Transformer Engine, which enables FP4 AI. When coupled with fifth-generation NVIDIA NVLink, it delivers 30X faster real-time LLM inference performance for trillion-parameter language models. This advancement is made possible with a new generation of Tensor Cores, which introduce new microscaling formats, giving high accuracy and greater throughput. Additionally, the GB200 NVL72 uses NVLink and liquid cooling to create a single massive 72-GPU rack that can overcome communication bottlenecks.

Massive-Scale Training

GB200 NVL72 includes a faster second-generation Transformer Engine, featuring FP8 precision, enabling a remarkable 4X faster training for large language models at scale. This breakthrough is complemented by the fifth-generation NVLink, which provides 1.8 TB/s of GPU-to-GPU interconnect, InfiniBand networking, and NVIDIA Magnum IO™ software.

Energy-Efficient Infrastructure

Liquid-cooled GB200 NVL72 racks reduce a data center’s carbon footprint and energy consumption. Liquid cooling increases compute density, reduces the amount of floor space used, and facilitates high-bandwidth, low-latency GPU communication with large NVLink domain architectures. Compared to NVIDIA H100 air-cooled infrastructure, GB200 delivers 25X more performance at the same power, while reducing water consumption.

Data Processing

Databases play critical roles in handling, processing, and analyzing large volumes of data for enterprises. GB200 takes advantage of the high-bandwidth memory performance, NVLink-C2C, and dedicated decompression engines in the NVIDIA Blackwell architecture to speed up key database queries by 18X compared to CPU and deliver a 5X better TCO.

Features

Technological Breakthroughs

Blackwell Architecture

The NVIDIA Blackwell architecture delivers groundbreaking advancements in accelerated computing, powering a new era of computing with unparalleled performance, efficiency, and scale.

Learn More

NVIDIA Grace CPU

The NVIDIA Grace CPU is a breakthrough processor designed for modern data centers running AI, cloud, and HPC applications. It provides outstanding performance and memory bandwidth with 2X the energy efficiency of today’s leading server processors.

Learn More

Fifth-Generation NVIDIA NVLink

Unlocking the full potential of exascale computing and trillion-parameter AI models requires swift, seamless communication between every GPU in a server cluster. The fifth generation of NVLink is a scale–up interconnect that unleashes accelerated performance for trillion- and multi-trillion-parameter AI models.

Learn About NVLink and NVLink Switch

NVIDIA Networking

The data center’s network plays a crucial role in driving AI advancements and performance, serving as the backbone for distributed AI model training and generative AI performance. NVIDIA Quantum-X800 InfiniBand, NVIDIA Spectrum™-X800 Ethernet, and NVIDIA® BlueField®-3 DPUs enable efficient scalability across hundreds and thousands of Blackwell GPUs for optimal application performance.

Learn End-to-End Networking Solutions

AI Factory for the New Industrial Revolution

NVIDIA GB300 NVL72

The NVIDIA GB300 NVL72 features 40x more AI inference performance than Hopper platforms, 40 TB of fast memory, and networking platform integration with NVIDIA ConnectX®-8 SuperNICs using Quantum-X800 InfiniBand or Spectrum™-X Ethernet. Blackwell Ultra delivers breakthrough performance on the most complex workloads, from agentic systems and reasoning to 30x faster real-time video generation.

Learn More

Specifications

GB200 NVL72 Specs

	GB200 NVL72	GB200 Grace Blackwell Superchip
Configuration	36 Grace CPU : 72 Blackwell GPUs	1 Grace CPU : 2 Blackwell GPU
FP4 Tensor Core¹	1,440 PFLOPS	40 PFLOPS
FP8/FP6 Tensor Core¹	720 PFLOPS	20 PFLOPS
INT8 Tensor Core¹	720 POPS	20 POPS
FP16/BF16 Tensor Core¹	360 PFLOPS	10 PFLOPS
TF32 Tensor Core	180 PFLOPS	5 PFLOPS
FP32	5,760 TFLOPS	160 TFLOPS
FP64	2,880 TFLOPS	80 TFLOPS
FP64 Tensor Core	2,880 TFLOPS	80 TFLOPS
GPU Memory \| Bandwidth	Up to 13.4 TB HBM3e \| 576 TB/s	Up to 372GB HBM3e \| 16 TB/s
NVLink Bandwidth	130TB/s	3.6TB/s
CPU Core Count	2,592 Arm® Neoverse V2 cores	72 Arm Neoverse V2 cores
CPU Memory \| Bandwidth	Up to 17 TB LPDDR5X \| Up to 18.4 TB/s	Up to 480GB LPDDR5X \| Up to 512 GB/s
1. With sparsity.

Get Started

Stay Up to Date

Notify Me

Cloud Services

Data Center

Embedded Systems

Gaming and Creating

Graphics Cards and GPUs

Laptops

Networking

Professional Workstations

Software

Tools

Artificial Intelligence

Cloud and Data Center

Design and Simulation

High-Performance Computing

Robotics and Edge AI

Autonomous Vehicles

Industries

NVIDIA GB200 NVL72

Unlocking Real-Time Trillion-Parameter Models

The Blackwell Rack-Scale Architecture for Real-Time Trillion-Parameter Inference and Training

Supercharging Next-Generation AI and Accelerated Computing

LLM Inference

LLM Training

Energy Efficiency

Data Processing

Real-Time LLM Inference

Massive-Scale Training

Energy-Efficient Infrastructure

Data Processing

Technological Breakthroughs

Blackwell Architecture

NVIDIA Grace CPU

Fifth-Generation NVIDIA NVLink

NVIDIA Networking

AI Factory for the New Industrial Revolution

NVIDIA GB300 NVL72

GB200 NVL72 Specs

Stay Up to Date