MLPerf™ benchmarks—developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry—are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services. They’re all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.
MLPerf Inference v4.1 measures inference performance on nine different benchmarks, including several large language models (LLMs), text-to-image, natural language processing, recommenders, computer vision, and medical image segmentation.
MLPerf Training v4.1 measures the time to train on seven different benchmarks, including LLM pre-training, LLM fine-tuning, text-to-image, graph neural network (GNN), computer vision, recommendation, and natural language processing.
MLPerf HPC v3.0 measures training performance across four different scientific computing use cases, including climate atmospheric river identification, cosmology parameter prediction, quantum molecular modeling, and protein structure prediction.
The NVIDIA HGX™ B200 platform, powered by NVIDIA Blackwell GPUs, fifth-generation NVLink™, and the latest NVLink Switch, delivered yet another giant leap for LLM training in MLPerf Training v4.1. Through relentless full-stack engineering at data center scale, NVIDIA continues to push the boundaries of generative AI training performance, accelerating the creation and customization of increasingly capable AI models.
NVIDIA Blackwell Supercharges LLM Training
MLPerf™ Training v4.1 results retrieved from https://mlcommons.org on November 13, 2024, from the following entries: 4.1-0060 (HGX H100, 2024) in the available category, 4.1-0082 (HGX B200, 2024) in the preview category. MLPerf™ Training v3.0 results, used for HGX H100 (2023), retrieved from entry 3.0-2069. HGX A100 result not verified by MLCommons association. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See https://mlcommons.org for more information.
The NVIDIA platform, powered by NVIDIA Hopper™ GPUs, fourth-generation NVLink with third-generation NVSwitch™, and Quantum-2 InfiniBand, continued to demonstrate unmatched performance and versatility in MLPerf Training v4.1. NVIDIA delivered the highest performance at scale on all seven benchmarks.
Benchmark | Time to Train | Number of GPUs |
---|---|---|
LLM (GPT-3 175B) | 3.4 minutes | 11,616 |
LLM Fine-Tuning (Llama 2 70B-LoRA) | 1.2 minutes | 1,024 |
Text-to-Image (Stable Diffusion v2) | 1.4 minutes | 1,024 |
Graph Neural Network (R-GAT) | 0.9 minutes | 512 |
Recommender (DLRM-DCNv2) | 1.0 minutes | 128 |
Natural Language Processing (BERT) | 0.1 minutes | 3,472 |
Object Detection (RetinaNet) | 0.8 minutes | 2,528 |
MLPerf™ Training v4.1 results retrieved from https://mlcommons.org on November 13, 2024, from the following entries: NVIDIA 4.0-0058, NVIDIA 4.0-0053, NVIDIA 4.0-0007, NVIDIA 4.0-0054, NVIDIA 4.0-0053, NVIDIA + CoreWeave 4.0-0008, NVIDIA 4.0-0057, NVIDIA 4.0-0056, NVIDIA 4.0-0067. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See https://mlcommons.org for more information.
The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.
An essential component of NVIDIA’s platform and MLPerf training and inference results, the NGC™ catalog is a hub for GPU-optimized AI, HPC, and data analytics software that simplifies and accelerates end-to-end workflows. With over 150 enterprise-grade containers—including workloads for generative AI, conversational AI, and recommender systems; hundreds of AI models; and industry-specific SDKs that can be deployed on premises, in the cloud, or at the edge—NGC enables data scientists, researchers, and developers to build best-in-class solutions, gather insights, and deliver business value faster than ever.
Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance powered by the NVIDIA Blackwell platform, the Hopper platform, NVLink™, NVSwitch™, and Quantum InfiniBand. These are at the heart of the NVIDIA data center platform, the engine behind our benchmark performance.
In addition, NVIDIA DGX™ systems offer the scalability, rapid deployment, and incredible compute power that enable every enterprise to build leadership-class AI infrastructure.
NVIDIA Jetson Orin offers unparalleled AI compute, large unified memory, and comprehensive software stacks, delivering superior energy efficiency to drive the latest generative AI applications. It’s capable of fast inference for any generative AI models powered by the transformer architecture, providing superior edge performance on MLPerf.
Learn more about our data center training and inference performance.