Architecturally, the CPU is composed of just a few cores with lots of cache memory that can handle a few software threads at a time. In contrast, a GPU is composed of hundreds of cores that can handle thousands of threads simultaneously.
NumPy has become the de facto way of communicating multi-dimensional data in Python. However, its implementation is not optimal for many-core GPUs. For this reason, newer libraries optimized for GPUs implement or interoperate with the Numpy array.
NVIDIA® CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on GPUs. The CUDA array interface is a standard format that describes a GPU array (tensor) to allow sharing GPU arrays between different libraries without needing to copy or convert data. CUDA array is supported by Numba, CuPy, MXNet, and PyTorch.
- CuPy is a library that implements NumPy arrays on NVIDIA GPUs by leveraging the CUDA GPU library.
- Numba is a Python compiler that can compile Python code for execution on CUDA-capable GPUs. NumPy arrays are directly supported in Numba.
- Apache MXNet is a flexible and efficient library for deep learning. Its NDArray is used to represent and manipulate the inputs and outputs of a model as multi-dimensional arrays. NDArray is similar to NumPy’s ndarrays, but they can run on GPUs to accelerate computing.
- PyTorch is an open-source deep learning framework that’s known for its flexibility and ease-of-use. Pytorch Tensors are similar to NumPy’s ndarrays, except they can run on GPUs to accelerate computing.
NVIDIA GPU-Accelerated, End-to-End Data Science
The NVIDIA RAPIDS™ suite of open-source software libraries, built on CUDA, provides the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
With the RAPIDS GPU DataFrame, data can be loaded onto GPUs using a Pandas-like interface, and then used for various connected machine learning and graph analytics algorithms without ever leaving the GPU. This level of interoperability is made possible through libraries like Apache Arrow. You can create a GPU dataframe from NumPy arrays, Pandas DataFrames, and PyArrow tables with just a single line of code. Other projects can exchange data using the CUDA array interface. This allows acceleration for end-to-end pipelines—from data prep to machine learning to deep learning.
RAPIDS supports device memory sharing between many popular data science libraries. This keeps data on the GPU and avoids costly copying back and forth to host memory.