Structured Sparsity in the NVIDIA Ampere Architecture and its Applications in Tencent WeChat Search
, AI Developer Technology Engineer, NVIDIA
, Algorithm Researcher, Tencent
Highly Rated
We'll introduce the structured sparsity feature in the NVIDIA Ampere and Hopper Architecture. The new Sparse Tensor Core, with its unique 2:4 structured sparsity, can greatly accelerate many deep learning workloads. We'll cover some common training recipes and how to accelerate inference with TensorRT. Meanwhile, the Machine Learning Platform Department of Tencent explored some new techniques to simplify training and obtain better accuracy. For instance, considering that the traditional sparse pre-training recipe is too cumbersome on hardware consumption and time overhead, they introduce sparse fine-tuning, sparse QAT, progressive sparsity, etc., to further advance the structured sparsity. With the sparsity feature above, we obtained 1.3-1.8x acceleration in Tencent’s offline services.