Speech-to-text is often regarded as a “solved problem,” but out-of-the-box solutions are rarely useful in real life and they're typically without meaningful customization. Every contact center ever created for a business is composed of enterprise situations where the topics and vocabulary differ from general English. When tasked with creating the first voice-powered AI experiences for T-Mobile’s contact center, we identified three large problems: (1) How do we build and deploy the automatic speech recognition (ASR) models? (2) How do we get audio from cell phones to our models in the cloud? and (3) How do we make sure our ASR models work equitably across all the varieties of speech that our experts and customers use? We'll walk through our model development with NVIDIA NeMo, cloud deployment with NVIDIA Riva, our efforts to identify and remove bias in our models, and the future of speech-to-text at T-Mobile.