Tensor Cores enable mixed-precision computing, dynamically adapting calculations to accelerate throughput while preserving accuracy and providing enhanced security. The latest generation of Tensor Cores are faster than ever on a broad array of AI and high-performance computing (HPC) tasks. From 4X speedups in training trillion-parameter generative AI models to a 30X increase in inference performance, NVIDIA Tensor Cores accelerate all workloads for modern AI factories.
The Blackwell architecture delivers a 30X speedup compared to the previous NVIDIA Hopper™ generation for massive models such as GPT-MoE-1.8T. This performance boost is made possible with the fifth-generation of Tensor Cores. Blackwell Tensor Cores add new precisions, including community-defined microscaling formats, giving better accuracy and ease of replacement for higher precisions.
As generative AI models explode in size and complexity, it’s critical to improve training and inference performance. To meet these compute needs, Blackwell Tensor Cores support new quantization formats and precisions, including community-defined microscaling formats.
Since the introduction of Tensor Core technology, NVIDIA Hopper GPUs have increased their peak performance by 60X, fueling the democratization of computing for AI and HPC. The NVIDIA Hopper architecture advances fourth-generation Tensor Cores with the Transformer Engine, using FP8 to deliver 6X higher performance over FP16 for trillion-parameter-model training. Combined with 3X more performance using TF32, FP64, FP16, and INT8 precisions, Hopper Tensor Cores deliver speedups to all workloads.
Tensor Cores are essential building blocks of the complete NVIDIA data center solution that incorporates hardware, networking, software, libraries, and optimized AI models and applications from the NVIDIA NGC™ catalog. The most powerful end-to-end AI and HPC platform, it allows researchers to deliver real-world results and deploy solutions into production at scale.
Blackwell | Hopper | |
---|---|---|
Supported Tensor Core precisions | FP64, TF32, BF16, FP16, FP8, INT8, FP6, FP4 | FP64, TF32, BF16, FP16, FP8, INT8 |
Supported CUDA® Core precisions | FP64, FP32, FP16, BF16 | FP64, FP32, FP16, BF16, INT8 |
*Preliminary specifications, may be subject to change
Learn More About NVIDIA Blackwell.