Powering the new era of computing.
GB200 NVL72 connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale, liquid-cooled design. It boasts a 72-GPU NVLink domain that acts as a single, massive GPU and delivers 30X faster real-time trillion-parameter large language model (LLM) inference.
The GB200 Grace Blackwell Superchip is a key component of the NVIDIA GB200 NVL72, connecting two high-performance NVIDIA Blackwell Tensor Core GPUs and an NVIDIA Grace™ CPU using the NVIDIA NVLink™-C2C interconnect to the two Blackwell GPUs.
Highlights
LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768 input/1,024 output, NVIDIA HGX™ H100 scaled over InfiniBand (IB) vs. GB200 NVL72, training 1.8T MOE 4096x HGX H100 scaled over IB vs. 456x GB200 NVL72 scaled over IB. Cluster size: 32,768
A database join and aggregation workload with Snappy / Deflate compression derived from TPC-H Q4 query. Custom query implementations for x86, H100 single GPU and single GPU from GB200 NLV72 vs. Intel Xeon 8480+
Projected performance subject to change.
Features
The NVIDIA GB300 NVL72 features 40x more AI inference performance than Hopper platforms, 40 TB of fast memory, and networking platform integration with NVIDIA ConnectX®-8 SuperNICs using Quantum-X800 InfiniBand or Spectrum™-X Ethernet. Blackwell Ultra delivers breakthrough performance on the most complex workloads, from agentic systems and reasoning to 30x faster real-time video generation.
Specifications
GB200 NVL72 | GB200 Grace Blackwell Superchip | |
Configuration | 36 Grace CPU : 72 Blackwell GPUs | 1 Grace CPU : 2 Blackwell GPU |
FP4 Tensor Core1 | 1,440 PFLOPS | 40 PFLOPS |
FP8/FP6 Tensor Core1 | 720 PFLOPS | 20 PFLOPS |
INT8 Tensor Core1 | 720 POPS | 20 POPS |
FP16/BF16 Tensor Core1 | 360 PFLOPS | 10 PFLOPS |
TF32 Tensor Core | 180 PFLOPS | 5 PFLOPS |
FP32 | 5,760 TFLOPS | 160 TFLOPS |
FP64 | 2,880 TFLOPS | 80 TFLOPS |
FP64 Tensor Core | 2,880 TFLOPS | 80 TFLOPS |
GPU Memory | Bandwidth | Up to 13.4 TB HBM3e | 576 TB/s | Up to 372GB HBM3e | 16 TB/s |
NVLink Bandwidth | 130TB/s | 3.6TB/s |
CPU Core Count | 2,592 Arm® Neoverse V2 cores | 72 Arm Neoverse V2 cores |
CPU Memory | Bandwidth | Up to 17 TB LPDDR5X | Up to 18.4 TB/s | Up to 480GB LPDDR5X | Up to 512 GB/s |
1. With sparsity. |