PyTorch Distributed and Data ParallelModel sizes are growing at an unprecedented pace. Myriad applications are exploring larger models to get higher accuracy. Some recent massive language models have even reached more than 100 billion parameters. Such use cases have posed major challenges to the underlying deep learning frameworks, where the frameworks need to expose features to facilitate large models, and at the same time, the features have to be flexible to cope with various distributed training paradigms. PyTorch RPC fulfills the requirements by allowing users to decompose a large model into user-defined RPC functions and handling serialization, communication, autograd, and data referencing transparently. We'll briefly introduce the design and implementation of the PyTorch RPC package. Then we'll demonstrate how RPC enables different training scenarios, including vanilla model parallelism, pipeline parallelism, and hybrid data and model parallelism.