Speaking in Every Language: A Quick-Start Guide to TTS Models for Accented, Multilingual Communication
, Senior Data Scientist, NVIDIA
, Research Scientist, NVIDIA
Recent text-to-speech (TTS) models have demonstrated an ability to make speakers speak other languages from a single sample. This impressive ability, however, comes with several costs that limit usability and deployment of such models. First, they're extremely slow, limiting their use to offline scenarios. Second, they don't provide users with the ability to control or retain their accents, and occasionally replace a person's accent. Last, these models provide users with no control over speech attributes like speech rate, pitch range, etc.
We'll help you set up RAD-MMM: an extremely fast TTS model that provides users with the ability to speak other languages while retaining or controlling their accent, and having fine-grained control over speech attributes. Specifically, we'll show how we pre-processed the audio dataset and trained RAD-MMM (and its variants). The session will contain links to the code and Jupyter notebooks for you to play with the examples.
Events & Trainings:
Date:
Topic:
Industry:
Level:
NVIDIA technology: Cloud / Data Center GPU,DGX,NeMo