Exploring Next-Generation Methods for Optimizing PyTorch Models for Inference with Torch-TensorRT
, Senior Automotive Deep Learning Solutions Architect, NVIDIA
, Software Engineer, Meta
Highly Rated
The PyTorch optimization and deployment ecosystem on NVIDIA GPUs is constantly evolving. Most recently, a new deployment workflow has matured in PyTorch centered around torch.fx and TorchDynamo. TorchDynamo is the next-generation machine learning compiler in PyTorch. This new method of deploying PyTorch allows for easy and accurate tracing (relative to TorchScript) and modification of the source model completely in Python.
Torch-TensorRT is making FX + Dynamo a first-class workflow for users seeking to optimize their PyTorch models with TensorRT. We'll dive into work we're doing today toward this goal, showing what we've done to support this new stack. Finally, we'll demonstrate how you can start experimenting with FX, Dynamo, and TensorRT today to get a preview of the direction Torch-TensorRT is headed.