Quantization Aware Training in PyTorch with TensorRT 8.0
, NVIDIA
Highly Rated
Quantization is used to improve latency and resource requirements of Deep Neural Networks during inference. Quantization Aware Training (QAT) improves accuracy of quantized networks by emulating quantization errors in the forward and backward passes during training. TensorRT 8.0 brings improved support for QAT with PyTorch, in conjunction with NVIDIA's open-source pytorch-quantization toolkit. This session gives an overview of the improvements in QAT with TensorRT 8.0, and walks through an end-to-end usage example.