A great AI inference accelerator has to not only deliver great performance but also the versatility to accelerate diverse neural networks, along with the programmability to enable developers to build new ones. Low latency at high throughput while maximizing utilization are the most important performance requirements of deploying inference reliably. NVIDIA Tensor Cores offer a full range of precisions—TF32, BFLOAT16, FP16, INT8, and INT4—to provide unmatched versatility and performance.
Tensor Cores enabled NVIDIA to win MLPerf Inference 0.5, the first AI industry-wide benchmark for inference.