The new combination of computing infrastructure and software solutions enabled the IRS to quickly and easily implement AI and machine learning at scale. With Cloudera running on NVIDIA GPUs, workloads immediately ran up to 5X faster with no code changes. But there was still room for improvement.
Cloudera called on a team of NVIDIA data scientists to examine the IRS code. They determined that a few tasks with particularly complex data structures were still running on CPUs. NVIDIA wrote new code to handle those jobs and inserted it into Spark’s software interface for NVIDIA RAPIDS™, the open library for running data analytics on GPUs.
When the IRS team ran the new code on GPUs in a distributed Spark cluster, they experienced a remarkable speedup of 20X.
By developing workloads that use Apache Spark and graph analysis, engineering teams created immense graphs with nodes and edges. With AI bots and machine learning algorithms analyzing graphs, investigators were able to connect individuals to institutions and, subsequently, to larger entities spanning years and decades. These insights helped to quickly expose patterns that indicated fraud.
The same datasets that used to take weeks or months to stitch together and process now take only hours or minutes. Testing revealed a 10X improvement in engineering and data science workflows with a 50 percent reduction in infrastructure costs.