Simplify and Scale Model Serving with NVIDIA Triton Inference Server on Google Cloud Vertex AI Prediction (Presented by Google)
, Solutions Architecture , NVIDIA
, Solutions Architect, Machine Learning, Google Cloud
, Software Engineer, Vertex AI Prediction , Google Cloud
NVIDIA Triton Inference Server (Triton) is an open-source inference serving software that maximizes performance and simplifies model deployment at scale. Triton supports multiple frameworks (TensorRT, TensorFlow, ONNX, PyTorch, and more) with custom CUDA and Python backends on GPU-/CPU-based infrastructure on cloud, data center, and edge. Google Cloud and NVIDIA collaborated to add Triton as a backend on Vertex AI Prediction, Google Cloud's fully managed model serving platform.