The RAPIDS™ suite of open-source software libraries, built on CUDA, gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs, while still using familiar python interfaces like Pandas and Scikit-Learn APIs.
RAPIDS’s cuML machine learning algorithms and mathematical primitives follow the familiar scikit-learn-like API. Popular algorithms like K-means, XGBoost, and many others are supported for both single-GPU and large data center deployments. For large datasets, these GPU-based implementations can complete 10-50X faster than their CPU equivalents.
cuML-MG combines the ease of use of the Dask framework with the highly-optimized OpenUCX and NCCL libraries for performance-critical data transfers. k-Means clustering is the first algorithm built on this library. It provides a familiar, scikit-learn-style API, but scales nearly linearly across GPUs and nodes using Ethernet or Infiniband. Across a pair of DGX-1 servers, k-Means-MG can cut the run time for a large clustering problem from 630 seconds on CPU to 7.1 seconds on GPU .
With the RAPIDS GPU DataFrame, data can be loaded onto GPUs using a Pandas-like interface, and then used for various connected machine learning and graph analytics algorithms without ever leaving the GPU. This level of interoperability is made possible through libraries like Apache Arrow. This allows acceleration for end-to-end pipelines—from data prep to machine learning to deep learning.
RAPIDS supports device memory-sharing between many popular data science libraries. This keeps data on the GPU and avoids costly copying back and forth to host memory.