Taking AI Models to Production: Accelerated Inference with Triton Inference Server
, Product Marketing Manager, NVIDIA
AI machine learning and deep learning inference is expected to grow faster than training. But the complexity that teams must manage to deploy, run, and scale models in production is enormous — multiple frameworks, evolving model architectures, volume of queries, diverse computing platforms, cloud-to-the-edge AI. There's a need to standardize and streamline inference without losing model performance. We'll look at the recent additions to the open-source inference serving software, Triton Inference Server, in three broad categories — support for new frameworks and workloads, inference workflow optimization tools, and scaling. We'll also cover integrations with other deployment platforms and tools and look at how some customers are approaching this complexity with Triton and achieving the key performance indicators they've set for their businesses.