Making reading assessments more efficient using voice recognition to help kids grow their love for learning so that they can build a brighter future.
Internal Revenue Service
Cloudera
Data Science
NVIDIA AI Enterprise
NVIDIA RAPIDS
Like every other industry, the government’s data throughput requirements have grown exponentially. Compounding the challenge of managing expanding data needs, government agencies must carry out their work while efficiently rooting out waste, fraud, and abuse to ensure the ethical use of taxpayer dollars.
The Government Accountability Office (GAO) recently identified 36 operations that need to be transformed to keep up with data management requirements, including high-risk areas that affect the nation’s commerce, economy, and security.
Without adequate IT infrastructure, government agencies have struggled to efficiently explore and parse large bodies of data, making frequent human intervention necessary. This makes it difficult for agencies to effectively execute the data-driven operations necessary to maintain public trust.
To overcome these challenges, the IRS is leveraging AI tools accelerated by NVIDIA infrastructure ,machine learning, and fraud detection applications.
To combat tax fraud and uncover bad actors, IRS investigators must analyze decades’ worth of data, link individuals to suspicious transactions, and trace transactions through multiple steps and multiple hops on a graph.
With this mission, one IRS data scientist was tasked with combing through a 3+ terabyte dataset and identifying patterns to expose fraud. Unfortunately, the available compute power was insufficient. Running the job all night on a large bank of CPUs, the job failed to complete. The team attempted to break down the datasets, server by server, but were forced to manually stitch data subsets together to make the solution work. Even with all of the careful manual effort, it wasn’t possible to achieve full visibility into real-time fraud detection.
To improve data-centric tasks like this, the IRS is implementing high-powered AI tools, machine learning, and applications capable of swiftly exposing fraud and identity theft.
The new combination of computing infrastructure and software solutions enabled the IRS to quickly and easily implement AI and machine learning at scale. With Cloudera running on NVIDIA GPUs, workloads immediately ran up to 5X faster with no code changes. But there was still room for improvement.
Cloudera called on a team of NVIDIA data scientists to examine the IRS code. They determined that a few tasks with particularly complex data structures were still running on CPUs. NVIDIA wrote new code to handle those jobs and inserted it into Spark’s software interface for NVIDIA RAPIDS™, the open library for running data analytics on GPUs.
When the IRS team ran the new code on GPUs in a distributed Spark cluster, they experienced a remarkable speedup of 20X.
By developing workloads that use Apache Spark and graph analysis, engineering teams created immense graphs with nodes and edges. With AI bots and machine learning algorithms analyzing graphs, investigators were able to connect individuals to institutions and, subsequently, to larger entities spanning years and decades. These insights helped to quickly expose patterns that indicated fraud.
The same datasets that used to take weeks or months to stitch together and process now take only hours or minutes. Testing revealed a 10X improvement in engineering and data science workflows with a 50 percent reduction in infrastructure costs.
With improved computing infrastructure and AI implementation, the IRS is cutting costs and better protecting taxpayers by preventing fraud and identity theft.
Building on their success in data preparation and data analytics, the IRS plans to accelerate AI inference jobs and use Spark-GPU infrastructure to tackle natural language processing and other analytics jobs.
Across government, there are innumerable opportunities to improve performance with AI and accelerated computing. Other government agencies that track transactions to mitigate waste, theft, and fraud can follow the IRS’s example and modernize infrastructure and software to attain a higher standard of operational efficiency and public service.
“The Cloudera and NVIDIA integration will empower us to use data-driven insights to power mission-critical use cases. We’re currently implementing this integration, and are already seeing over 20x speed improvements at half the cost for our data engineering and data science workflows.”
Joe Asaldi
Technical Branch Chief of Research and Applied Analytics and Statistics, IRS
Results
Take a closer look at how NVIDIA is helping to accelerate innovation in the public sector.