Large-Scale Deep Learning using Hybrid/Multi-Cloud MLOps, GPUs, OpenMPI, and DeepSpeed

, Staff Field Data Scientist, Domino Data Lab
As the utilization of Deep Learning and AI continues to expand, the models have become increasingly complex and larger in size. This presents a significant challenge when it comes to training these models, such as GPT-3 and Megatron. To address this issue, several distributed computing frameworks have been developed, with MPI (message passing interface) being one of the most well-established and reliable options.

In this talk, we will delve into the MLOps considerations when choosing a framework and explore the benefits of using MPI and the DeepSpeed library. We will also demonstrate how to train large deep networks using these tools, using real-world examples from the field of proteomics and a large language model,within Domino Data Lab’s Enterprise MLOps Platform. The first half of the talk will focus on introducing MPI and its use in Python, while the second half will delve into the various techniques used to train large deep networks and how to use DeepSpeed on MPI to enable these techniques.
活动: GTC Digital Spring
日期: March 2023
级别: Advanced Technical
行业: All Industries
话题: MLOPs
语言: English
话题: Data Science and Machine Learning
所在地: