Efficient At-Scale Training and Deployment of Large Language Models
, Principal Product Manager- Conversational AI and Deep Learning, NVIDIA
NeMo Megatron enables enterprises to easily train and deploy huge transformer models at scale using several parallelism techniques. In this talk, we will explain how to preprocess data in a multi-node environment, automatically select the best hyperparameters to minimize the time-to-train for multiple GPT-3 and T5 configurations, train the model at-scale and deploy the model in a multi-node production setting with an easy-to-use set of scripts. NeMo Megatron automates the workflow, reduces the time to deployment, and reduces the total cost of ownership. In addition, we are going to show how to create prompts automatically to adapt the model for different downstream tasks.