AI models continue to explode in complexity as they take on next-level challenges such as accurate conversational AI and deep recommender systems. Conversational AI models like Megatron are hundreds of times larger and more complex than image classification models like ResNet-50. Training these massive models in FP32 precision can take days or even weeks. Tensor Cores in NVIDIA GPUs provide an order-of-magnitude higher performance with reduced precisions like TF32 and FP16. And with direct support in native frameworks via NVIDIA CUDA-X™ libraries, implementation is automatic,which dramatically slashes training-to-convergence times while maintaining accuracy.
Tensor Cores enabled NVIDIA to win MLPerf 0.6, the first AI industry-wide benchmark for training.