Learn how NVIDIA's profiling tools, Nsight Systems and Nsight Compute, can help you accelerate modern compute workloads. In the first half of this lab, you'll get hands-on experience with optimizing CUDA applications using Nsight Systems, which is a system-wide performance analysis tool that helps you optimize and scale a CUDA application irrespective of where it runs, whether on a simple workstation, an embedded device, or a cluster in the cloud. In the second half, we'll switch focus to individual workloads on the GPU and explore how to use Nsight Compute to dive deep into CUDA details for applications written in Python, the language of AI. You'll learn how to measure and optimize hardware pipeline utilization, memory accesses, and more, from the source to the assembly level.
Prerequisite(s):
Attendees should have basic knowledge in Python and CUDA programming.
Experience with using profiling tools and/or computer vision frameworks is a plus, but not a requirement to follow the course.
Explore more training options offered by the NVIDIA Deep Learning Institute (DLI). Choose from an extensive catalog of self-paced, online courses or instructor-led virtual workshops to help you develop key skills in AI, HPC, graphics & simulation, and more.
Ready to validate your skills? Get NVIDIA certified and distinguish yourself in the industry.