NVIDIA Riva is a GPU-accelerated conversational AI framework, including automatic speech recognition, natural language understanding, and text-to-speech capabilities to create expressive conversational AI agents. We'll share some best practices about how to deploy Riva for conversational AI and autoscale the number of Riva servers based on inference requests from the clients using Kubernetes. This idea can be applied to conversational AI in production on the cloud, such as AWS, as well as on-premises.