Just-In-Time Link-Time Optimization Adoption in cuSPARSE/cuFFT: Use Case Overview
, NVIDIA
, NVIDIA
, NVIDIA
Optimizing kernels in the CUDA math libraries often involves specializing parts of the kernel to exploit particulars of the problem, or new features of the GPU. These specialized functions can be part of the precompiled kernels shipped by the library, or can be provided at runtime by the user. Adding specialized precompiled kernels typically increases the binary size of the library. Conversely, including indirect calls to user functions inside the kernel often impacts performance due to the overhead of the function call. Introduced in CUDA 11.4, Just-in-time link-time-optimization (JIT LTO) allows developers to tackle these issues by providing runtime linking of specialized kernel functions with no-call overhead. We'll present the use cases of JIT LTO for cuSPARSE, as a method for the user to add custom operators with no function call overhead; and for cuFFT, as a way to support user callbacks with no call overhead and to add optimized kernels without increasing the binary size.