Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference
, AI Developer Technology Engineer, NVIDIA
Merlin HugeCTR is a recommender system-specific framework which accelerates the training and deployment of complex deep learning models on GPUs at scale. Since its public release in early 2020, we've added a lot of enhancements to performance and usability. We'll introduce some of them, including the inference architecture, sparse operation kit (SOK), training embedding cache, unified embedding, and the MLPerf optimizations. We'd like to especially highlight two features that have highly extended HugeCTR applicability: • HugeCTR inference architecture, which dramatically accelerates the inference via a well-designed parameter server and GPU embedding cache on top of NVIDIA Triton Inference Server. • Sparse operation kit, which can enable HugeCTR optimized embeddings on common deep learning frameworks such as Horovod and Tensorflow.