Inference with Tensorflow 2 Integrated with TensorRT Session
, Google
, NVIDIA
, NVIDIA
Learn how to inference using Tensorflow 2 with TensorRT integrated and the performance this can offer. Tensorflow is a machine learning platform and TensorRT is an SDK for high-performance deep learning inference using NVIDIA GPUs. Tensorflow models are usually written in FP32 precision to work for both training and inference. Tensorflow-TensorRT integration automatically offloads portions of the Tensorflow graph to run with TensorRT using precisions FP16 or INT8 to improve inference throughput without sacrificing much accuracy. We'll describe: how to use Tensorflow-TensorRT integration in Tensorflow 2; the dynamic shape feature we recently added to better handle Tensorflow graph with unknown shapes; the lazy calibration mode we recently added to improve the workflow for inferencing with INT8 precision; some details on how Tensorflow-TensorRT works; and the performance benefits of using Tensorflow-TensorRT for inference.