More Data, Faster: GPU Memory Management Best Practices in Python and C++
, Distinguished Engineer, NVIDIA
Learn how to efficiently manage heterogeneous GPU memory in complex CUDA Python and C++ workflows. Modern AI and analytics use a variety of libraries to process massive data. Careful memory management throughout the workflow is essential to prevent libraries from competing for scarce memory. The open-source RMM memory management library provides allocator interfaces and containers that make it easy to customize GPU memory management in Python and C++ code. We'll demonstrate how the composability of the memory resource interface enables layering of allocation strategies with diagnostics, logging, leak detection, profiling, and more. Learn why all libraries should provide an external allocator interface to enable sharing of scarce memory and whole-workflow memory diagnostics. Explore strategies for combining custom pool allocators with unified memory and custom spilling to handle data larger than GPU memory. Finally, learn how NVIDIA RAPIDS scales high performance to massive data sizes.