NVIDIA Grace CPU Superchip

NVIDIA Grace CPU Superchip

The breakthrough CPU for the modern data center.

Designed to Meet the Performance and Efficiency Needs of Today’s AI Data Centers

The NVIDIA Grace™ CPU is designed for a new type of data center—one that processes mountains of data to produce intelligence with maximum energy efficiency. These data centers run diverse workloads like AI, data analytics, hyperscale cloud applications, and high-performance computing (HPC). To meet the most demanding data center needs, Grace delivers 2X the performance per watt, 2X the packaging density, and the highest memory bandwidth compared to today’s leading servers.

The Grace CPU combines 72 high-performance and power-efficient Arm® Neoverse™ V2 cores, connected with the NVIDIA Scalable Coherency Fabric (SCF) that delivers 3.2TB/s of bisection bandwidth–double that of traditional CPUs to deliver maximum performance, while maintaining full compatibility with the Arm ecosystem. Grace is the first data center CPU to utilize server-class high-speed LPDDR5X memory with a wide memory subsystem that delivers up to 500GB/s of bandwidth at one fifth the power of traditional DDR memory at similar cost.

NVIDIA Grace CPU Superchip LaunchPad

In this free lab, get hands-on experience with the NVIDIA Grace CPU Superchip and interact with demos of its memory bandwidth and software environment.

Meet the NVIDIA Grace CPU

NVIDIA Grace CPU Superchip

The Grace CPU Superchip is composed of two Grace CPU chips connected coherently over NVIDIA NVLink™ Chip-to-Chip (C2C) at 900 GB/s. It packs 144 Neoverse V2 cores into a single module, with server-class LPDDR5X memory that delivers up to 1TB/s of memory bandwidth. The Grace CPU Superchip comprises the heart of a two-socket server in a compact module, delivering 2X the performance as the same power as traditional server CPUs with DDR5 memory.

NVIDIA Grace CPU C1

The NVIDIA Grace C1 is a single-socket, high-performance server platform optimized for scalable and edge platforms including hyperscale cloud, CDN, storage, telco, and other high-performance edge platforms that doesn't compromise performance or bandwidth. This platform delivers high-end x86 performance while being configurable from 140W to 250W for the Grace CPU and LPDDR5X memory compared to over 400W for similar x86 platforms. ‌The NVIDIA designed scalable coherency fabric enables the Grace CPU to deliver 2X the energy efficiency compared to leading x86 platforms.

Highlights

Double Data Center Output or Cut Energy Usage by Half With Grace CPU

Graph Analytics

3X

Data Analytics

2X

Weather

2X

Microservices

1.6X

NVIDIA Grace Superchip 480GB of LPDDR5X, AMD EPYC 9654 768 GB of DDR5. OS: Ubuntu 22.04 Compilers: GCC 12.3 unless noted below. Power for energy efficiency includes CPU + memory measured power. Graph Analytics: The Gap Benchmarks Suite BFS arXiv:1508.03619 [cs.DC], 2015. Data Analytics : HiBench+K-means Spark (HiBench 7.1.1, Hadoop 3.3.3, Spark 3.3.0; Grace: NVHPC 24.5, x86: Intel 2021.4) Weather: ICON QUBICC 80 km resolution NVHPC 24.5 (Grace) ICC 2021.4 (x86) Microservices: Google Protobufs (Commit 7cd0b6fbf1643943560d8a9fe553fd206190b27f | N instances in parallel)

Graph Analytics

The NVIDIA Grace CPU Superchip connects the Arm Neoeverse V2 cores with a custom NVIDIA Scaled Coherency fabric that delivers blazing fast performance for workloads such as GapBS Breadth First Search that stress core-to-core communication and synchronization. NVIDIA Grace delivers over 2x more performance at the server level and 3X better energy efficiency compared to leading x86 systems.

Data Analytics

As Data continues to grow, businesses need to maximize learning from their data to compete. The HiBench suite tests K-means clustering for knowledge discovery and data mining and takes advantage of the high-bandwidth and low-power memory in the NVIDIA Grace CPU. The Grace CPU is over 2X more energy efficient compared to leading x86 CPUs in the market today.

Weather

Weather prediction models are an important use case for high-performance computing (HPC) and are critical for understanding and responding to changing weather patterns as a result of climate change. The high-bandwidth and power-efficient LPDDR5X memory on the Grace CPU that delivers up to 500 GB/s of bandwidth in only about 16W enables Grace to complete almost 2X the work in the same power envelope compared to existing x86 solutions.

Microservices

Microservices are a collection of small independent services that enable data centers to easily scale to meet demand. They also provide flexibility to manage individual services without impacting the entire application. Google Protobufs measures how quickly the system can serialize and parse data needed to exchange data between systems that are essential for the execution of microservices. The high-performance and power efficiency of the NVIDIA Grace CPU delivers leading performance and power efficiency to maximize data center throughput.

Features

Technological Breakthroughs

Arm Neoverse V2 Cores

At the heart of the Grace CPU are the Arm Neoverse V2 CPU cores, Arm’s highest-performing data center core in the market today. The Neoverse V2 cores are optimized to deliver leading performance per core, while providing incredible efficiency compared to traditional CPUs. The Grace CPU integrates 72 cores and when paired with LPDDR5X memory and the NVIDIA Scaled Coherency Fabric delivers twice the performance in the same power envelope as leading x86 CPUs.

NVIDIA Scalable Coherency Fabric

NVIDIA Scalable Coherency Fabric (SCF) is a mesh fabric and distributed cache architecture designed by NVIDIA to meet those challenges of scaling cores and bandwidth in a power– and area-efficient manner. SCF provides over 3.2 TB/s of total bisection bandwidth - double that of traditional CPUs, to keep data flowing between the CPU cores, memory, and system I/O. The SCF reduces bottlenecks in data-movement-heavy applications, such as graph analytics, where NVIDIA Grace delivers up to 2X the performance of leading x86 servers.

LPDDR5X Memory

NVIDIA Grace is the first server CPU to use LPDDR5X memory with server-class reliability through mechanisms like error-correcting code (ECC). The LPDDR5X memory in NVIDIA Grace balances cost, power, bandwidth, and capacity. It delivers up to 500 GB/s in only about 16W, approximately one fifth the power of conventional DDR5 memory.

Single and Dual Socket

The NVIDIA Grace CPU portfolio includes the Grace Superchip that provides the heart of a dual-socket server with 144 Neoverse V2 Cores and up to 960GB of LPDDR5X in a single compact module requiring only 500W for the CPU and memory. ‌To provide additional flexibility, the Grace CPU C1 delivers amazing performance with 72 Neoverse V2 cores connected by a blazing-fast NVIDIA Scalable Coherency fabric in a single-socket configuration optimized for cloud, storage, edge, and telco deployments to deliver up to 2X the performance per W of conventional x86 servers.

News

Revolutionizing Data Center Efficiency With the NVIDIA Grace Family

Offered in a compact single, two-socket module, the Grace CPU Superchip delivers 2X the performance in the same power envelope as leading traditional CPUs.

NVIDIA Grace CPU Superchip Architecture In Depth

Combining NVIDIA expertise with Arm processors, on-chip fabrics, system-on-chip (SoC) design, and resilient high-bandwidth low-power memory technologies, the Grace CPU was built from the ground up to create the world’s first superchip for computing.

Boosting Mathematical Optimization Performance and Energy Efficiency on the NVIDIA Grace CPU

As the demand for faster and better mathematical optimization solutions grows, full-stack innovation is needed. This blog post explores benchmark results and use cases showing improved efficiency using the Arm-based NVIDIA Grace CPU.

Specifications

Grace CPU1 Specs

  NVIDIA Grace CPU C1 NVIDIA Grace CPU Superchip
Configuration 1x Grace CPU 2x Grace CPU
Core Count 72 Arm Neoverse V2 Cores with 4x 128b SVE2 144 Arm Neoverse V2 Cores with 4x 128b SVE2
L1 cache 64KB i-cache + 64KB d-cache per core 64KB i-cache + 64KB d-cache per core
L2 cache 1MB per core 1MB per core
L3 cache 114MB 228MB
LPDDR5X size 120GB, 240GB and 480GB on-module memory options available 240GB, 480GB and 960GB options available
Memory bandwidth Up to 384 GB/s for 480GB
Up to 512 GB/s for 120GB, 240GB
Up to 768 GB/s for 960GB
Up to 1024 GB/s for 240GB, 480GB
NVLink-C2C bandwidth n/a Up to 900 GB/s
PCIe Links Up to 4x PCIe Gen 5x16 with option to bifurcate Up to 8x PCIe Gen 5x16 with option to bifurcate

Get Started

Stay Up to Date

Sign up to hear when NVIDIA Grace Blackwell becomes available.