Building a Text-to-Speech Service that Sounds like You, Using NVIDIA NGC and NVIDIA A100 GPUs
, NVIDIA
, NVIDIA
Conversational AI is a complex, multi-modal workflow made of ASR, NLU, and TTS modules that uses various AI models to build solutions like smart assistants and chatbots. These models transcribe an audio query and search the web for the most likely answer to the query. Finally, the answer is delivered to the user through text-to-speech synthesis. In this session, we will build a TTS model for style transfer and expressive speech using pre-trained models developed by NVIDIA Research and NVIDIA AI software from NVIDIA NGC. Leveraging the power of NVIDIA A100 GPUs, we will fine tune the pre-trained model with speech samples and customize the variability in speech and perform style transfer from other speakers. After this session, you will be able to retain the model you create with your voice and style and make the TTS service sound just like you!
Join us for this session as we walk through each step to build this TTS model for style transfer and expressive speech.