Healthcare & Life Sciences

Saving Nine Years of Processing Time With NVIDIA Parabricks

Lung cancer cells. Anne Weston, Francis Crick Institute

Objective

The Francis Crick Institute is a leader in cutting-edge biomedical research, working tirelessly to improve the understanding of human health and disease—including lung cancer. As the leading cause of cancer mortality worldwide, with over 1.8 million deaths in 2020, lung cancer highlights the critical need to understand the process of metastasis. Late diagnoses only further contribute to this problem. These challenges laid the groundwork and served as the catalyst for critical research funded by Cancer Research UK, including the TRACERx and TRACERx EVO studies.

Customer

The Francis Crick Institute

Use Case

Accelerated Computing Tools & Techniques

Products

NVIDIA Parabricks
NVIDIA A100
NVIDIA L40

Overview of the TRACERx Study

The TRACERx study—TRAcking Cancer Evolution through therapy (Rx)—aims to understand tumor evolution in non-small cell lung cancer. It looks at diagnosis through surgical resection to cure or prevent disease recurrence. 

The study consists of surgical resection of the primary tumor and nearby lymph nodes of participants who are at specific lung cancer stages. Multiple samples are taken from each removed tumor and sent for whole exome sequencing with paired RNA sequencing. Tissue microarrays and ctDNA sampling may be conducted and sequenced, followed by genomic copy number analysis and reconstruction of phylogenetic trees to characterize cancer evolution. Lastly, metastatic lesions are sequenced when available.

The TRACERx 421 represents the halfway point of the total study. Of the 421 patients, there are 233 men and 188 women with various smoking statuses, including: 

 
  • Never smoked: 30 
  • Former smoker: 211
  • Current or recent smoker: 180
 

Metadata like age, number of packs per year, disease stage, and whether any therapy was received is also considered. Sequencing data is then analyzed by a series of complex pipelines, resulting in a detailed breakdown of mutational heterogeneity and copy number heterogeneity across tumor regions. This focus on genomic heterogeneity is important because it’s been shown to be the marker for good prognosis in non-small cell lung cancer, according to several studies. In the 421 study, there’s a high somatic copy number aberration heterogeneity associated with reduced disease-free recurrence.

TRACERx EVO Research: A Shift to Whole Genome Sequencing

TRACERx EVO is a prospective, observational study that builds on the TRACERx work highlighted in the 421 cohort. The most noteworthy difference in the TRACERx EVO study is the shift to whole genome sequencing instead of whole exome sequencing.  

Mark S. Hill, Principal Research Fellow at the Francis Crick Institute, explains, “whole genome sequencing enables a much more accurate identification of copy number aberrations and explores structural variance and deep classified mutation signatures associated with the disease.”  

Additionally, deep whole genome sequencing is critical in identifying subclonal mutations. These subclones (<40% tumor proportion) were prominent in the TRACERx study and are key to understanding tumor development.

“With Parabricks, we saw huge speed ups for whole genome sequencing for the TRACERx EVO project alone. This will save nearly nine years of processing time based on our current HPC [high-performance computing] service offering.”
(from Accelerating Large-Scale Genomics Research webinar)

James Clements, Director of IT Operations & Deputy CIO at the Francis Crick Institute

Solving Computational Challenges With NVIDIA

Although the number of samples for the TRACERx EVO study is comparable to the TRACERx 421 cohort, the storage requirement is significantly higher with over 1.3 petabytes of data just for primary alignments. Additionally, the estimated CPU hours for the TRACERx EVO study increased drastically with the addition of whole genome sequencing data. 

Number of Samples, Estimated Storage and Estimated CPU Hours—Primary Alignments

Image credits here

As a result, new compute infrastructure was needed to conduct a study of this magnitude. “Through the advent of NVIDIA Parabricks [GPU-accelerated] tooling, we can really speed up critical parts of this pipeline,” Hill explains. “We essentially have an automated system that performs the various points of quality control throughout the pipeline and have accelerated alignment and variant calling processes embedded within these pipelines.”  

In preparation for the TRACERx EVO study, the Crick team conducted primary alignment benchmarking to compare traditional CPUs with GPU-accelerated NVIDIA® Parabricks®. Testing was conducted on a multipart workflow (Nextflow) comparing 16 cores and 64GB RAM on x86 compute to running on NVIDIA V100 GPUs. As a result, the team reviewed 250x whole genome sequencing from tumors already analyzed and experienced a 26X speed-up in time savings with no difference in quality metric outputs.

The Crick’s Hardware Investment With NVIDIA: A Business Case That Writes Itself

The Crick underwent a full HPC replacement that included replacing storage, networking, and CPU compute, as well as a GPU refresh. James Clements, Director of IT Operations & Deputy CIO at the Francis Crick Institute, looked across the 120 labs and 15 science and technology platforms to understand plans, desires, and what was or wasn’t working.  

In the TRACERx EVO project alone, the team saw significant speed-ups for whole genome sequencing when testing Parabricks—including FastQ alignment and DeepVariant calling. “This will save nearly nine years of processing time relative to our current HPC service offering,” Clements explains. 

In addition to the impressive time savings, the Crick team appreciated the hands-on approach with NVIDIA and the ability to provide feedback. As Clements states, “we’ve been able to work directly with the product team to test in development functionality and contribute ideas for future development.”

As a result, the Crick’s implementation consists of three clusters, all connected through NDR InfiniBand network, including:

     
  • NVIDIA A100 for a cost-effective and space-efficient general-purpose cluster, used for unoptimized workloads.
  • NVIDIA L40 for structure biology, cryo-electron microscopy works for lower-cost GPUs.
  • NVIDIA H100 for specific workloads, including optimized solutions like Parabricks.
 

Both A100 and H100 are on Dell servers using 80GB SXM4 GPUs.

Clements summarizes that NVIDIA’s impact will “benefit the Crick with tens of thousands of hours of saved wait time every single year. It will also provide a hardware platform for future innovation.”

Ready to Get Started?

To learn more about NVIDIA solutions for genomics, visit: nvidia.com/parabricks

To learn more about the Francis Crick Institute, visit:  https://www.crick.ac.uk/

“[Comparing Parabricks with CPUs], we saw around a 26X speed up and that’s with no difference in the kind of quality metric outputs when we inspect these [primary] alignments back-to-back.” (from Accelerating Large-Scale Genomics Research webinar)

Mark S. Hill, Principal Research Fellow at the Francis Crick Institute

Learn more about NVIDIA solutions for genomics.