MLPerf Benchmarks

The NVIDIA AI platform achieves world-class performance and versatility in MLPerf Training, Inference, and HPC benchmarks for the most demanding, real-world AI workloads.

See Our Results

About MLPerf
Benchmarks
Our Results
How We Do It

About MLPerf
Benchmarks
Our Results
How We Do It

What Is MLPerf?

MLPerf™ benchmarks—developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry—are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services. They’re all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.

Chalmers University is one of the leading research institutions in Sweden, specializing in multiple areas from nanotechnology to climate studies. As we incorporate AI to advance our research endeavors, we find that the MLPerf benchmark provides a transparent apples-to-apples comparison across multiple AI platforms to showcase actual performance in diverse real-world use cases.

— Chalmers University of Technology, Sweden

TSMC is driving the cutting edge of global semiconductor manufacturing, like our latest 5nm node, which leads the market in process technology. Innovations like machine-learning-based lithography and etch modeling dramatically improve our optical proximity correction (OPC) and etch simulation accuracy. To fully realize the potential of machine learning in model training and inference, we are working with the NVIDIA engineering team to port our Maxwell simulation and inverse lithography technology (ILT) engine to GPUs and see very significant speedups. The MLPerf benchmark is an important factor in our decision-making.

— Dr. Danping Peng, Director, OPC Department, TSMC, San Jose, CA, USA

Computer vision and imaging are at the core of AI research, driving scientific discovery and readily representing core components of medical care. We've worked closely with NVIDIA to bring innovations like 3DUNet to the healthcare market. Industry-standard MLPerf benchmarks provide relevant performance data to the benefit of IT organizations and developers to get the right solution to accelerate their specific projects and applications.

— Prof. Dr. Klaus Maier-Hein, Head of Medical Image Computing, Deutsches Krebsforschungszentrum (DKFZ, German Cancer Research Center)

As the preeminent leader in research and manufacturing, Samsung uses AI to dramatically boost product performance and manufacturing productivity. Productizing these AI advances requires us to have the best computing platform available. The MLPerf benchmark streamlines our selection process by providing us with an open, direct evaluation method to assess uniformly across platforms.

— Samsung Electronics

Slide 1
Slide 2
Slide 3
Slide 3

Inside the MLPerf Benchmarks

MLPerf Inference v5.1 measures inference performance on 10 different AI models, including a variety of large language models (LLMs), a reasoning LLM, text-to-image generative AI, recommendation, text-to-speech, and graph neural network (GNN).

MLPerf Training v5.1 measures the time to train seven different models, covering the following use cases: LLMs (pretraining and fine-tuning), image generation, GNN, object detection, and recommendation.

Reasoning Large Language Model

Large language model that generates intermediate reasoning, or thinking, tokens to improve response accuracy.

Details

Large Language Models

Deep learning algorithms trained on large-scale datasets that can recognize, summarize, translate, predict, and generate content for a breadth of use cases.

Details

Text-to-Image

Generates images from text prompts.

Details

Recommendation

Delivers personalized results in user-facing services such as social media or ecommerce websites by understanding interactions between users and service items, like products or ads.

Details

Object Detection (Lightweight)

Finds instances of real-world objects such as faces, bicycles, and buildings in images or videos and specifies a bounding box around each.

Details

Graph Neural Network

Uses neural networks designed to work with data structured as graphs.

Details

Speech-to-Text

Converts spoken language into written text.

Details

NVIDIA Blackwell Maximizes ROI in AI Inference

NVIDIA Blackwell enables the highest AI factory revenue: A $5M investment in GB200 NVL72 generates $75 million in token revenue– a 15x return on investment. This includes deep co-design across NVIDIA Blackwell, NVLink™, and NVLink Switch for scale-out; NVFP4 for low-precision accuracy; and NVIDIA Dynamo and TensorRT™ LLM for speed and flexibility—as well as development with community frameworks SGLang, vLLM, and more.

Explore Key Results

NVIDIA MLPerf Benchmark Results

Training
Inference

The NVIDIA platform achieved the fastest time to train on all seven MLPerf Training v5.1 benchmarks. Blackwell Ultra made its debut, delivering large leaps for large language model pretraining and fine-tuning, enabled by architectural enhancements and breakthrough NVFP4 training methods that increase performance and meet strict MLPerf accuracy requirements. NVIDIA also increased Blackwell Llama 3.1 405B pretraining performance at scale by 2.7x through a combination of twice the scale and large increases in performance per GPU enabled by NVFP4. NVIDIA also set performance records on both newly-added benchmarks—Llama 3.1 8B and FLUX.1—while continuing to hold performance records on existing recommender, object detection, and graph neural network benchmarks.

NVIDIA Blackwell Ultra Delivers Large Leap in MLPerf Training Debut

MLPerf™ Training v5.0 and v5.1 results retrieved from www.mlcommons.org on November 12, 2025, from the following entries: 4.1-0050, 5.0-0014, 5.0-0067, 5.0-0076, 5.1-0058, 5.1-0060. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited See www.mlcommons.org for more information.

Annual Rhythm and Extreme Co-Design for Sustained Training Leadership

The NVIDIA platform delivered the fastest time to train on every MLPerf Training v5.1 benchmark, with innovations across chips, systems, and software enabling sustained training performance leadership, as shown on industry-standard, peer-reviewed performance data.

Max-Scale Performance

Benchmark	Time to Train
LLM Pretraining (Llama 3.1 405B)	10 minutes
LLM Pretraining (Llama 3.1 8B)	5.2 minutes
LLM Fine-Tuning (Llama 2 70B LoRA)	0.40 minutes
Image Generation (FLUX.1)	12.5 minutes
Recommender (DLRM-DCNv2)	0.71 minutes
Graph Neural Network (R-GAT)	0.84 minutes
Object Detection (RetinaNet)	1.4 minutes

MLPerf™ Training v5.0 and v5.1 results retrieved from www.mlcommons.org on November 12, 2025, from the following entries: 5.0-0082, 5.1-0002, 5.1-0004, 5.1-0060, 5.1-0070, 5.1-0072. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.

Blackwell Ultra Sets New Reasoning Inference Records in MLPerf Inference v5.1

The NVIDIA platform set many new records in MLPerf Inference v5.1—including on the challenging new DeepSeek-R1 reasoning and Llama 3.1 405B Interactive tests—and continues to hold every per-GPU MLPerf Inference performance record in the data center category. The GB300 NVL72 rack-scale system, based on the NVIDIA Blackwell Ultra GPU architecture, made its debut just six months after NVIDIA Blackwell, setting new records on the DeepSeek-R1 reasoning inference benchmark. And NVIDIA Dynamo also made its debut this round, with its disaggregated serving, dramatically increasing the performance of each Blackwell GPU on Llama 3.1 405B Interactive. The performance and pace of innovation in the NVIDIA platform enable higher intelligence, greater AI factory revenue potential, and lower cost per million tokens.

The NVIDIA Platform Holds Every Data Center per-GPU Record in MLPerf Inference

Benchmark	Offline	Server	Interactive
DeepSeek-R1	5,842 Tokens/Second	2,907 Tokens/Second	*
Llama 3.1 405B	224 Tokens/Second	170 Tokens/Second	138 Tokens/Second
Llama 2 70B 99.9%	12,934 Tokens/Second	12,701 Tokens/Second	7,856 Tokens/Second
Llama 3.1 8B	18,370 Tokens/Second	16,099 Tokens/Second	15,284 Tokens/Second
Mistral 8x7B	16,099 Tokens/Second	16,131 Tokens/Second	*
Stable Diffusion XL	4.07 Samples/Second	3.59 Queries/Second	*
DLRMv2 99%	87,228 Tokens/Second	80,515 Tokens/Second	*
DLRMv2 99.9%	48,666 Tokens/Second	46,259 Tokens/Second	*
RetinaNet	1,875 samples/second/GPU	1,801 queries/second/GPU	*
Whisper	5,667 Tokens/Second	*	*
Graph Neural Network	81,404 Tokens/Second	*	*

* Scenarios not part of the MLPerf Inference v5.0 or v5.1 benchmark suites.

MLPerf Inference v5.0 and v5.1, Closed Division. Results retrieved from www.mlcommons.org on September 9, 2025. NVIDIA platform results from the following entries: 5.0-0072, 5.1-0007, 5.1-0053, 5.1-0079, 5.1-0028, 5.1-0062, 5.1-0086, 5.1-0073, 5.1-0008, 5.1-0070,5.1-0046, 5.1-0009, 5.1-0060, 5.1-0072. 5.1-0071, 5.1-0069 Per chip performance derived by dividing total throughput by number of reported chips. Per-chip performance is not a primary metric of MLPerf Inference v5.0 or v5.1. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See http://www.mlcommons.org for more information.

The Technology Behind the Results

The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.

Optimized Software That Accelerates AI Workflows

An essential component of NVIDIA’s platform and MLPerf training and inference results, the NGC™ catalog is a hub for GPU-optimized AI, HPC, and data analytics software that simplifies and accelerates end-to-end workflows. With over 150 enterprise-grade containers—including workloads for generative AI, conversational AI, and recommender systems; hundreds of AI models; and industry-specific SDKs that can be deployed on premises, in the cloud, or at the edge—NGC enables data scientists, researchers, and developers to build best-in-class solutions, gather insights, and deliver business value faster than ever.

Visit the NGC Catalog

Leadership-Class AI Infrastructure

Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance powered by the NVIDIA Blackwell and Blackwell Ultra platforms, including the NVIDIA GB300 NVL72 and GB200 NVL72 systems, NVLink and NVLink Switch, and Quantum InfiniBand. These are at the heart of AI factories powered by the NVIDIA data center platform, the engine behind our benchmark performance.

In addition, NVIDIA DGX™ systems offer the scalability, rapid deployment, and incredible compute power that enable every enterprise to build leadership-class AI infrastructure.

Learn More About NVIDIA’s AI Factory Solutions

Unlocking Generative AI at the Edge With Transformative Performance

NVIDIA Jetson Orin offers unparalleled AI compute, large unified memory, and comprehensive software stacks, delivering superior energy efficiency to drive the latest generative AI applications. It’s capable of fast inference for any generative AI models powered by the transformer architecture, providing superior edge performance on MLPerf.

Learn More About NVIDIA Jetson Orin