Architecturally, the CPU is composed of just a few cores with lots of cache memory that can handle a few software threads at a time. In contrast, a GPU is composed of hundreds of cores that can handle thousands of threads simultaneously.
The NVIDIA RAPIDS™ suite of open-source software libraries, built on CUDA-X AI, provides the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
With the RAPIDS GPU DataFrame, data can be loaded onto GPUs using a Pandas-like interface, and then used for various connected machine learning and graph analytics algorithms without ever leaving the GPU. This level of interoperability is made possible through libraries like Apache Arrow and allows acceleration for end-to-end pipelines—from data prep to machine learning to deep learning.
RAPIDS’s machine learning algorithms and mathematical primitives follow a familiar scikit-learn-like API. Popular tools like XGBoost, Random Forest, and many others are supported for both single-GPU and large data center deployments. For large datasets, these GPU-based implementations can complete 10-50X faster than their CPU equivalents.
cuML now has support for multi-node, multi-GPU random forest building. With multi-GPU support, a single NVIDIA DGX-1 can train forests 56x faster than a dual-CPU, 40-core node.
The NVIDIA RAPIDS team works closely with the DMLC XGBoost organization, and GPU-accelerated XGBoost now includes seamless, drop-in GPU acceleration, which significantly speeds up model training and improves accuracy. Tests of an XGBoost script running on a system with an NVIDIA P100 accelerator and 32 Intel Xeon E5-2698 CPU cores showed more than a four-fold speed improvement over the same test run on a non-GPU system with the same output quality. This is particularly important because data scientists typically run the XGBoost many times in order to tune parameters and find the best accuracy.
To really scale data science on GPUs, applications need to be accelerated end-to-end. cuML now brings the next evolution of support for tree-based models on GPUs, including the new Forest Inference Library (FIL). FIL is a lightweight, GPU-accelerated engine that performs inference on tree-based models, including gradient-boosted decision trees and random forests. With a single V100 GPU and two lines of Python code, users can load a saved XGBoost or LightGBM model and perform inference on new data up to 36x faster than on a dual 20-core CPU node. Building on the open-source Treelite package, the next version of FIL will add support for scikit-learn and cuML random forest models as well.