Training Deep Learning Models at Scale: How NCCL Enables Best Performance on AI Data Center Networks
, Distinguished Engineer, NVIDIA
Highly Rated
Discover how NCCL uses every capability of all DGX and HGX platforms to accelerate inter-GPU communication and allow deep learning training to scale further. See how Grace Hopper platforms can leverage multi-node NVLink to compute in parallel at unprecedented speeds. Compare different platforms to understand how technology choices can impact your training performance and time-to-completion. Understand how new mechanisms like NVLink SHARP and IB SHARP can be used to accelerate every dimension of the training of a large language model. Learn about new collective algorithms and their performance on many thousands of GPUs, and get a glimpse of how future improvements could push the boundaries even further.