NCCL: High-Speed Inter-GPU Communication for Large-Scale Training
, NVIDIA
Highly Rated
Inter-GPU communication is central in training DL networks on multiple GPUs. The NCCL library is covering that role in most frameworks, being used by PyTorch, Horovod, and others. Learn how hardware choices can directly impact performance at scale, and what performance to expect from various platforms, including DGX systems. Understand why NVLink is critical to large-scale computing and how it's combined with Infiniband/RoCE to deliver orders-of-magnitude higher performance than standard off-the-shelf systems. Also discover the new communication patterns DL training uses and how node and fabric topology can impact them. Finally, learn how the NCCL API has evolved to serve new needs in parallel computing, including HPC workloads.