Name: Accelerating Backward Data Gradient by Increasing Tensor Core Utilization in CUTLASS | GTC Digital Spring 2022 | NVIDIA On-Demand
Uploaded: 2022-03-21T10:00:00Z
Duration: 2884 s
Description: Learn about improving the backward data gradient (Dgrad) performance by increasing Tensor Core utilization for strided problems — i.e., stride >= 2

This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

詳細內容

字幕

Learn about improving the backward data gradient (Dgrad) performance by increasing Tensor Core utilization for strided problems — i.e., stride >= 2. Many machine-learning tasks require an efficient implementation of convolutions. Additionally, machine-learning training needs efficient implementation of forward, backward data (Dgrad), and backward weight (Wgrad). Implicit GEMM convolution is one of the ways to implement convolutions efficiently on a GPU. However, a naive implementation of implicit GEMM convolutions for Dgrad results in underutilizing Tensor Cores for the strided problem sizes (stride >= 2, Strided Dgrad). This results in sub-optimal performance and increased training times for popular workloads such as ResNet50, RNXT, and MaskRCNN. In this talk, we explore techniques to improve the performance of Strided Dgrad by up to 4x.

活動:

日期:

領域:

技術水平需求:

產業:

語言: English

地區:

Fill out this form to enjoy this content

Section

Section

名

姓

電子郵件

組織 / 大學名稱

我願意收到下列有關 NVIDIA 的最新消息與公告：

企業業務解決方案

開發人員技術和工具

(非必選) 您可以隨時取消訂閱。

NVIDIA 隱私權政策

Follow Nvidia