TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis
, NVIDIA
We'll propose TalkNet, a convolutional non-autoregressive neural model for speech synthesis consisting of two feed-forward convolutional networks. The model is compact with 10.8M parameters, almost 3x less than the present state of the art text-to-speech models. The non-autoregressive architecture allows for fast training and inference, 328x times faster than real-time.