A Step-by-step Guide to Building Large Custom Language Models
, NVIDIA
, NVIDIA
Highly Rated
The last two years have seen unprecedented progress in natural language processing (NLP) with models such as BERT, RoBerta, Electra, and now GPT-3 transforming countless NLP-based applications. Given the number of languages across the globe and the complexity of domain-specific languages (e.g., specialized medical, engineering, financial text), those advancements are just starting to make an impact outside of general-purpose English. This walkthrough will not only provide an end-to-end demonstration of how to train custom large language models (from obtaining the training data, its cleaning/quality assessment to distributed training and evaluation) but also will show how to efficiently deploy them to production (including an overview of technologies to compress the models and do distributed/model parallel inference). We'll discuss technologies such as Megatron LM, Microsoft Deep Speed, and for inference TensorRT, Faster Transformer, and Triton Inference Server.