Microsoft Teams enables highly accurate live meeting captioning and transcription services in 28 languages.
Microsoft Teams
Microsoft Azure
Real-time multi-language meeting captioning and transcription
Microsoft Azure Cognitive Services, NVIDIA GPUs on Azure, NVIDIA Triton Inference Server
Microsoft Teams is a collaboration app with nearly 250 million active monthly users. To better accommodate non-native speakers, and meeting attendees who are deaf or hard of hearing, Microsoft relies on AI-generated live captions and real-time transcription.
For optimal live captioning and transcription in multiple languages, the Microsoft Teams app uses Microsoft Azure Cognitive Services and NVIDIA Triton™ Inference Server. This enables them to leverage advanced language models that recognize jargon, names, and other meeting context, to deliver highly accurate, personalized speech-to-text results—in real time—with very low latency.
Using Triton Inference Server in Azure Cognitive Services seamlessly enables live transcription and captions with state of the art speech models in 28 languages. Triton Inference Server delivers low latency, real-time inference of the speech recognition models and ensures that models use GPUs to their full potential. This reduces the cost to customers by delivering higher throughput using fewer computational resources.
Results
Microsoft Teams is a collaboration app built for hybrid work that enables teams to stay informed, organized, and connected—all in one place. Customers use teams to communicate, collaborate, and co-author content across work, life, and learning—every day.
“AI models like these are incredibly complex, requiring tens of millions of neural network parameters to deliver accurate results across dozens of different languages. But the bigger a model is, the harder it is to run cost-effectively in real time.”
Principal PM Manager for Teams Calling and Meetings
and Devices
Microsoft