This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

NVIDIA Magnum IO

The IO Subsystem for the Modern, GPU-Accelerated Data Center

Maximizing Data Center Storage and Network IO Performance

The new unit of computing is the data center and at its core are NVIDIA GPUs and NVIDIA networks. Accelerated computing requires accelerated input/output (IO) to maximize performance. NVIDIA Magnum IO^™, the IO subsystem of the modern data center, is the architecture for parallel, asynchronous, and intelligent data center IO, maximizing storage and network IO performance for multi-GPU, multi-node acceleration.

Latest Magnum IO News

Magnum IO for Cloud-Native Supercomputing Architecture

Magnum IO, the IO subsystem for data centers, introduces the new enhancements necessary to accelerate IO and the communications supporting multi-tenant data centers, known as Magnum IO for Cloud-Native Supercomputing.

Volumetric Video Powered by Magnum IO and Verizon 5G

Volumetric Video Leveraging Magnum IO and Verizon 5G

Magnum IO GPUDirect over an InfiniBand network enables Verizon’s breakthrough distributed volumetric video architecture. By placing their technology into edge computing centers, located at sports centers around the United States and in Verizon facilities, they’re able to bring 3D experiences to media and serve up new options for putting you in the game.

Watch Video

Magnum IO Key Benefits

Optimized IO Performance

Bypasses the CPU to enable direct IO among GPU memory, network, and storage, resulting in 10X higher bandwidth.

System Balance and Utilization

Relieves CPU contention to create a more balanced GPU-accelerated system that delivers peak IO bandwidth, resulting in up to 10X fewer CPU cores and 30X lower CPU utilization.

Seamless Integration

Provides optimized implementation for current and future platforms, whether data transfers are fine-grained and latency-sensitive, coarse-grained and bandwidth-sensitive, or collectives.

Magnum IO Optimization Stack

Magnum IO utilizes storage IO, network IO, in-network compute, and IO management to simplify and speed up data movement, access, and management for multi-GPU, multi-node systems. Magnum IO supports NVIDIA CUDA-X™ libraries and makes the best use of a range of NVIDIA GPU and NVIDIA networking hardware topologies to achieve optimal throughput and low latency.

[Developer Blog] Magnum IO - Accelerating IO in the Modern Data Center

Storage IO

In multi-node, multi-GPU systems, slow CPU, single-thread performance is in the critical path of data access from local or remote storage devices. With storage IO acceleration, the GPU bypasses the CPU and system memory, and accesses remote storage via 8X 200 Gb/s NICs, achieving up to 1.6Terabits/s of raw storage bandwidth.

Technologies Included:

NVIDIA GPUDirect Storage ›

NVIDIA Mellanox NVMe SNAP ›

Network IO

NVIDIA NVLink® fabric and RDMA-based network IO acceleration reduces IO overhead, bypassing the CPU and enabling direct GPU to GPU data transfers at line rates.

Technologies Included:

Data Plane Development Kit ›

NVIDIA GPUDirect RDMA ›

NVIDIA Mellanox HPC-X ›

NVIDIA Collective Communication Library ›

NVIDIA Shared Memory Library ›

UCX ›

Accelerated Switch and Packet Processing^® (ASAP²) ›

In-Network Compute

In-network computing delivers processing within the network, eliminating the latency introduced by traversing to the endpoints, and any hops along the way. Data Processing Units (DPUs) introduce software defined, network hardware-accelerated computing, including pre-configured data processing engines and programmable engines.

Technologies Included:

NVIDIA^® BlueField DPU^® ›

MPI Tag Matching ›

NVIDIA Mellanox SHARP ›

IO Management

To deliver IO optimizations across compute, network, and storage, users need advanced telemetry and deep troubleshooting techniques. Magnum IO management platforms empower research and industrial data center operators to efficiently provision, monitor, manage, and preventatively maintain the modern data center fabric.

Technologies Included:

NVIDIA Cumulus NetQ ›

NVIDIA Mellanox UFM ›

Accelerating IO Across Applications

Magnum IO interfaces with NVIDIA CUDA-X high performance computing (HPC) and artificial intelligence (AI) libraries to speed up IO for a broad range of use cases—from AI to scientific visualization.

Data Analytics
High Performance Computing
Deep Learning

Data Analytics

Today, data science and machine learning (ML) are the world's largest compute segments. Modest improvements in the accuracy of predictive ML models can translate into billions of dollars to the bottom line. To enhance accuracy, the RAPIDS^™ Accelerator library has a built-in accelerated Apache Spark shuffle based on UCX that can be configured to leverage GPU-to-GPU communication and RDMA capabilities. Combined with NVIDIA networking, Magnum IO software, GPU-accelerated Spark 3.0, and RAPIDS, the NVIDIA data center platform is uniquely positioned to speed up these huge workloads at unprecedented levels of performance and efficiency.

Adobe Achieves 7X Speedup in Model Training with Spark 3.0 on Databricks for a 90% Cost Savings

High Performance Computing

To unlock next-generation discoveries, scientists rely on simulation to better understand complex molecules for drug discovery, physics for new sources of energy, and atmospheric data to better predict extreme weather patterns. Magnum IO exposes hardware-level acceleration engines and smart offloads, such as RDMA, GPUDirect, and NVIDIA SHARP, while bolstering the 400Gb/s high bandwidth and ultra-low latency of NVIDIA Quantum 2 InfiniBand networking.

With multi-tenancy, user applications may be unaware of indiscriminate interference from neighboring application traffic. Magnum IO, on the latest NVIDIA Quantum 2 InfiniBand platform, features new and improved capabilities for mitigating the negative impact on a user’s performance. This delivers optimal results, as well as the most efficient high performance computing (HPC) and machine learning deployments at any scale.

Largest Interactive Volume Visualization - 150TB NASA Mars Lander Simulation

Deep Learning

The emerging class of exascale HPC and trillion parameter AI models for tasks like superhuman conversational AI require months to train, even on supercomputers. Compressing this to the speed of business to complete training within days requires high speed, seamless communication between every GPU in a server cluster, so they can scale performance. The combination of NVIDIA NVLink, NVIDIA NVSwitch, NVIDIA Magnum IO libraries and strong scaling across servers delivers AI training speedups of up to 9X on Mixture of Experts (MoE) models. This allows researchers to train massive models at the speed of business.

Magnum IO Libraries and Deep Learning Integrations

NCCL and other Magnum IO libraries transparently leverage the latest NVIDIA H100 GPU, NVLink, NVSwitch, and InfiniBand networks to provide significant speedups for deep learning workloads, particularly recommender systems and large language model training.

The benefits of NCCL include faster time to model training accuracy, while achieving close to 100 percent interconnect bandwidth between servers in a distributed environment.

Magnum IO GPUDirect Storage (GDS) has been enabled in the Data Loading Library (DALI) through the Numpy reader operator. GDS brings up to 7.2X the performance increase in deep learning inference with DALI compared to baseline Numpy.

Enabling researchers to continue pushing the envelope of what's possible with AI requires powerful performance and massive scalability. The combination of NVIDIA Quantum-2 InfiniBand networking, NVLink, NVSwitch, and the Magnum IO software stack delivers out-of-the-box scalability for hundreds to thousands of GPUs operating together.

Performance Increases 1.9X on LBANN with NVSHMEM vs. MPI

NVIDIA Magnum IO

Maximizing Data Center Storage and Network IO Performance

Latest Magnum IO News

Magnum IO for Cloud-Native Supercomputing Architecture

Volumetric Video Leveraging Magnum IO and Verizon 5G

Magnum IO Key Benefits

Optimized IO Performance

System Balance and Utilization

Seamless Integration

Magnum IO Optimization Stack

Storage IO

Network IO

In-Network Compute

IO Management

Accelerating IO Across Applications

Data Analytics

High Performance Computing

Deep Learning

GET THE LATEST ON MAGNUM IO

NVIDIA GPUDirect Storage (GDS)

NVIDIA Mellanox NVMe Software-Defined Network Accelerated Processing (SNAP)

Data Plane Development Kit (DPDK)

NVIDIA GPUDirect RDMA (GDR)

NVIDIA Mellanox HPC-X®

NVIDIA Collective Communication Library (NCCL)

NVIDIA Shared Memory Library (NVSHMEM)™

UCX

Accelerated Switch and Packet Processing® (ASAP2)

NVIDIA® BlueField DPU® Data Processing Unit (DPU)

MPI Tag Matching

NVIDIA Mellanox SHARP

NVIDIA Cumulus NetQ

NVIDIA Mellanox Unified Fabric Manager (UFM)

Accelerated Switch and Packet Processing^® (ASAP²)

NVIDIA^® BlueField DPU^® Data Processing Unit (DPU)