Transforming Intelligent Call Center Operations in Consumer Finance
The capabilities of AI have attracted a plethora of banking and financial institutions in Vietnam, particularly Home Credit Vietnam. As one of Vietnam's leading digital finance companies, Home Credit has always prioritized customer experience. Recognizing the importance of process automation and operational efficiency, Home Credit partnered with FPT Smart Cloud to deploy the FPT AI Engage solution in 2019, when AI was still a relatively new concept in Vietnam, demonstrating the company's strategic vision.
After the first year of operation in 2020, FPT.AI Virtual Agent for Call Center supported Home Credit Vietnam to make more than 5,000,000 calls per month. This was scaled up, powered by NVIDIA, to 12,000,000 calls during peak hours, saving 50 percent of operating costs and achieving a 98 percent call success rate. In addition, the “virtual agent” of Home Credit Vietnam had an average customer satisfaction rating of 4.5/5.
The deployment is optimized using NVIDIA® TensorRT™ and served in NVIDIA Triton™ Inference Server with dynamic batching, saving up to 20 percent of high-performance computing resources for the same quality of model output.
So far, Home Credit Vietnam has put more than 100 use cases into applications, including information inquiries, self-service to lock or activate cards, automated customer surveys, and debt collection. As a result, these mundane tasks are offloaded to AI, and human agents have more time to handle critical customer issues.
One novel application of virtual assistants that can help convert a “cost center,” such as a customer service department, into a “profit center,” is a process called service to sales. FPT AI Engage increased Home Credit Vietnam’s service-to-sales volume by 2X.
These innovations have revealed the need for more human-like customer engagement in virtual assistants to boost the digital experience. This has turned the focus to the quality of AI-generated voices.
Synthetic voices are developed in tandem with virtual assistants, with a variety of tones, accents, and sentiments available. The type of voice deployed depends on the application. For instance, virtual assistants for general inquiries use more friendly and informative tones. AI voices for telesales involve more flexibility and emotions to better persuade prospects.
Advancing Speech Synthesis Models to Upgrade Conversational Quality
Recognizing the growing demand for more human-sounding, sentimental virtual assistants, FPT Smart Cloud aims to develop speech synthesis models that can produce new voices based on a few minutes of audio samples. The generated voices need to be high-quality, indistinguishable from the human voice, and communicate in multiple languages, even if the training inputs are in Vietnamese.
Given the nuances and emotional undertones of human language, the speech synthesis model often requires a vast amount of training data and long processing time to optimize accuracy and expressiveness.
Model training was previously executed on the NVIDIA A100 Tensor Core GPU. Typical training requires three servers with a processing capacity of 100 hours of voice data per day for a duration of 20 days. An upgrade to the NVIDIA H100 Tensor Core GPU is expected to handle more complicated model requirements and reduce processing time by at least 3X, or 7 days, with 2,000 hours of audio data.
With H100, the total process only requires one dedicated server. The speech synthesis model is ready in five days, a 4X efficiency improvement compared to the NVIDIA A100, as it processed about 400 hours of voice samples each day. The output model increased 100X in dimension and aptitude, generating a human-sounding voice that can seamlessly transition into 18 languages, including Vietnamese, English, and Indonesian.
Compared with the A100, the H100 is a step up in AI voice modeling, transcending language barriers and facilitating global communication on an unprecedented scale. Taking the leap forward in AI empowerment, FPT Smart Cloud is also utilizing the NVIDIA H100 for large language models to generate more accurate and flexible responses for virtual assistants.