Challenges and Best Practices for Inference in Production (Presented by Run.ai)
, Product Architect, NVIDIA
, CTO, Run:AI
AI and machine learning operations (MLOps) engineers often struggle to deploy models on GPUs. AI models bring new requirements and challenges that aren't always aligned with traditional best practices for application deployment, while orchestrating GPUs in production brings additional new challenges that require new tooling and workflows. Ronen Dar, CTO and co-founder of Run:AI, together with Adam Tetelman, senior product architect at NVIDIA, will detail some of the challenges in orchestrating inference workloads on GPUs and how ML teams overcome these hurdles with tools like NVIDIA Triton Inference Server to build a best-in-class, optimized, and scalable model-serving platform.