Accurate Transcriptions Enhance “Work from Anywhere” Collaboration
With hundreds of millions of online meetings daily, video conferencing has become an essential tool for enterprises today. Video conferencing applications use real-time transcription to offer features such as live captioning and meeting summarizations. RingCentral, a leading provider of unified-communications-as-a-service (UCaaS) solutions, transcribes over a billion minutes of meetings for 200,000 concurrent users on their platform. They were looking for a transcription solution to handle multiple accents, domain-specific jargon, and noisy environments accurately and in real time.
NVIDIA Solution
RingCentral fine-tuned NVIDIA’s state-of-the-art, pretrained speech recognition models on proprietary custom data with NVIDIA NeMo—an open-source framework for building conversational AI models. The models were deployed in production using NVIDIA Riva—a GPU-accelerated SDK for deploying world-class AI-based speech applications.
RingCentral Results
Results
Accuracy increased by more than 10 percent
Better quality of tasks downstream of transcription
With NVIDIA speech AI, the RingCentral team achieved impressive accuracy for customers with worldwide accents and different domain-specific vocabularies, reducing the word error rate (WER) by over 10 percent. Customers have reported colossal differences in the quality of tasks downstream of transcripts, such as meeting summarization and sentiment analysis of video conferencing and call center sessions.
“Using NVIDIA® Riva speech-to-text, we’re able to transcribe meeting audio in real time with high accuracy while concurrently running thousands of streams, which translates to more engaging meeting experiences for millions of RingCentral users.”