The emerging class of exascale HPC and trillion parameter AI models for tasks like superhuman conversational AI require months to train, even on supercomputers. Compressing this to the speed of business to complete training within days requires high speed, seamless communication between every GPU in a server cluster, so they can scale performance. The combination of NVIDIA NVLink, NVIDIA NVSwitch, NVIDIA Magnum IO libraries and strong scaling across servers delivers AI training speedups of up to 9X on Mixture of Experts (MoE) models. This allows researchers to train massive models at the speed of business.
Magnum IO Libraries and Deep Learning Integrations
NCCL and other Magnum IO libraries transparently leverage the latest NVIDIA H100 GPU, NVLink, NVSwitch, and InfiniBand networks to provide significant speedups for deep learning workloads, particularly recommender systems and large language model training.
Enabling researchers to continue pushing the envelope of what's possible with AI requires powerful performance and massive scalability. The combination of NVIDIA Quantum-2 InfiniBand networking, NVLink, NVSwitch, and the Magnum IO software stack delivers out-of-the-box scalability for hundreds to thousands of GPUs operating together.
Performance Increases 1.9X on LBANN with NVSHMEM vs. MPI