Healthcare and Life Sciences

Boosting the Accuracy and Speed of Long-Read Sequencing

Objective

Increasing throughput and accuracy for next-generation instruments while keeping within necessary power, footprint, and cost constraints.

Customer

PacBio

Use Case

Edge Computing

Products

NVIDIA Parabricks
NVIDIA-Certified Systems

Improving Instrument Speed and Accuracy With NVIDIA Accelerated Computing

The sequencing of long DNA fragments, known as long-read sequencing, was featured as the method of the year in the January 2023 issue of Nature Methods, and PacBio was highlighted as a prominent leader in the space. Since their founding in 2004, PacBio has achieved a significant global footprint with their advanced sequencing systems deployed in over 40 countries, totaling more than 1,000 units sold. The firm's intellectual property portfolio includes over 400 issued U.S. patents, and their influence and relevance in the scientific community are underscored by over 9,000 citations across various publications.

PacBio builds advanced sequencing solutions to help scientists and clinical researchers resolve genetically complex problems across human germline sequencing, plant and animal sciences, infectious diseases, oncology, and other emerging applications. Their proprietary technology for long-read sequencing generates reads up to 20 kilobases in length, dramatically outpacing the typical read length of less than 300 bases produced by short-read sequencing methods. This enables more complete and accurate mapping of complex regions of the genome that may be overlooked by short-read sequencing, which aids in advancing research in various fields, including disease genetics and evolutionary biology.

Highlights

  • PacBio incorporated the NVIDIA A100 Tensor Core GPU into their Revio system to accelerate the speed and accuracy of long‑read sequencing, while minimizing costs.
  • Compute power: Revio with NVIDIA A100 GPUs offers computer power 20X greater than PacBio’s Sequel IIe.
  • Deep learning: PacBio incorporated GPUs for base calling, increased throughput with circular consensus sequencing (CCS), and improved accuracy using the DeepConsensus model.
  • Ease-of-use: Revio offers a 50 percent reduction in consumables alongside a load-in-advance capability.
  • Affordability: Revio sequences a human HiFi genome for less than $1,000, loads instruments in under one minute, and decreases file size by over 50 percent.
  • High throughput: Revio can sequence 1,300 human whole genomes annually at 30X coverage.

Image courtesy of PacBio.
PacBio’s Revio long-read sequencing system.

PacBio’s Revio System: GPU-Accelerated Long-Read Sequencing

A cornerstone of PacBio's long-read technology is its high accuracy, quality, and coverage of genomes. This manifests within its high-fidelity (HiFi) long-read sequencing, a powerful tool used to investigate large genomic or transcriptomic features at a single DNA or RNA molecule level. An essential aspect of generating long-read data is the process of base calling, which is crucial for determining nucleotide sequences of complex, long DNA molecules. However, this requires substantial computational resources, given the need to generate a consensus sequence for each molecule—a process that’s then executed across millions of molecules. 

PacBio’s Sequel IIe long-read sequencer was designed with CPU-based computation. While functional, it reached a performance threshold that limited optimal throughout and, therefore, its utility for commercial customers. To address this limitation, PacBio introduced the Revio System featuring NVIDIA A100 GPUs. This advancement allowed for a significant increase in computational power within the same device footprint. As a result of this transition to NVIDIA GPUs, coupled with NVIDIA® CUDA® for code optimization, PacBio was able to accelerate base calling, resulting in overall throughput and efficiency of the sequencing process.  

These technologies also significantly accelerated circular consensus sequencing (CCS) on the Revio System. The repeated sequencing of circularized DNA molecules to generate high-accuracy reads required substantial processing power and time, limiting the overall throughput and efficiency of the sequencer. WIth Revio using NVIDIA GPUS, PacBio was able to reduce the CCS process from over 15 hours to 2.5 hours, translating to time savings, enhanced productivity, and increased commercial viability of the Revio sequence to customers.

Adding a Deep Learning Model to Improve Accuracy

After GPU optimization of CCS, the analysis was sufficiently fast to incorporate additional workflows while maintaining the instrument’s throughput. This gave PacBio the opportunity to implement GPU-accelerated workflows to further enhance long-read accuracy, without additional hardware investment. 

The DeepConsensus model, an encoder-only transformer, was implemented and optimized on the A100 GPUs, creating a robust in-instrument solution. This achieved a shorter time to high-accuracy HiFi reads—from 30 hours on the CPU-based Sequel IIe to 24 hours on the Revio. As a result of GPU acceleration and workflows added to the instrument, PacBio achieves 99.9 percent accuracy with HiFi sequencing in Revio and can scale up to 1,300 human genomes per year. Revio is PacBio’s first sequencer to feature NVIDIA GPUs, providing a 20X increase in computing power compared to the Sequel IIe.

Customers using Revio can further use the NVIDIA Parabicks® suite of GPU-accelerated industry standard and deep learning genomic analysis tools for alignment and variant calling. DeepVariant has been accelerated on GPUs as part of Parabricks and offers very accurate variant calling for HiFi reads. A 35x coverage HiFi whole-genome sequencing (WGS) takes 313 minutes on a CPU server and only eight minutes with DeepVariant in Parabricks on a server with NVIDIA A100 GPUs.

“Our customers have transformed genomics with the power of HiFi sequencing. Revio, using NVIDIA technologies, further unleashes that power by adding high throughput and affordability. Combined with significant advances in compute, Revio will deliver short run times and a 15X increase in HiFi data.”

Christian Henry
CEO and President, PacBio

Image courtesy of PacBio.

Timeline of improvements to post-primary analysis of PacBio SMRT cell data. Target processing time for CCS to keep up with the instrument throughput was 10 hours. As additional steps such as polishing, mapping, and marshaling were optimized on the GPU, enough time was gained to add the transformer-based DeepConsensus analysis to improve HiFi read accuracy. The final result of the GPU-optimized analysis pipeline not only beats the throughput requirements for the system but has improved overall accuracy.

Revio Advances Genomics Worldwide With NVIDIA Technologies

The improved throughput and accuracy achieved with NVIDIA technologies has been proven valuable by the wide adoption of Revio.  Since its launch in October 2022, Revios have been installed around the world, including at Dubai’s Mohammed Bin Rashid University of Medicine and Health Sciences (MBRU) to propel genomic medicine discovery in rare disease and cancer, the Wellcome Sanger Institute in the UK to ramp up the Darwin Tree of Life project and increase long reads in human applications, and Radboud University Medical Center (UMC) to increase their sequencing to thousands of genomes.

Through the transition from CPU- to GPU-based workflows, PacBio developed a higher-throughout instrument that offers cost-effectiveness, enhanced compute power, and AI-driven accuracy improvements. These advancements are critical for building their next generation of genomic sequencers that can efficiently scale for customer demands across research and medical applications.

Ready to Learn More?

To learn more about NVIDIA solutions for healthcare and life sciences, contact us.