Healthcare and Life Sciences

Reducing Single-Cell and Spatial Analysis From Hours to Minutes

Human lung samples run on the 10x Genomics Xenium Analyzer and are processed via NVIDIA RAPIDS. Image provided by TGen.

Objective

The Translational Genomics Research Institute (TGen) is a nonprofit institute that focuses on a variety of diseases, from cancer genomics to basic genomics of complex diseases. The increase of data from multi-omics sequencing created new computational challenges. Using NVIDIA RAPIDS™, TGen was able to cut the analysis time on 4 million-cell datasets from 10 hours to three minutes.

Customer

TGen

Use Case

Accelerated Computing Tools & Techniques
Data Science

Products

NVIDIA RAPIDS
NVIDIA Parabricks
NVIDIA DGX

About TGen

Founded in 2002, TGen, part of the City of Hope research center, focuses on a variety of diseases ranging from neuro and cancer genomics to basic genomics of complex diseases. As a standalone nonprofit institute, TGen’s goal is to impact patient care and conduct research that accelerates translational solutions using genomics.

Nicholas Banvoich, PhD and associate professor in the Integrated Cancer Genomics Division at TGen, runs a research lab focused on the molecular changes that drive disease outcomes—including disease initiation, progression, treatment, and response. His team’s work focuses primarily on pulmonary fibrosis, a noncancerous lung disease, and oncology.

Beyond his lab, Banovich also directs TGen’s center for single-cell and spatial multi-omics. As he explains, “My role is to bring in these new technologies that allow us to push forward single-cell and spatial multi-omics as well as work with partners like NVIDIA on computational approaches to analyzing data that could then be deployed more broadly outside of just my lab.”

Gaining More Insight With Single-Cell Approaches

Historically, TGen would grind up tissues, extract molecular information from all of the cells within the tissues, and look at that information in aggregate. However, this posed some challenges. “Every tissue—whether you’re talking about the lungs, heart, or cancers—is not made up of a monolith. They’re really complex and made up of different cell types,” Banovich explains. “These cell types are doing different things in relation to driving the disease progression, outcomes, and treatment response.” Banovich’s team conducted bulk assays and compared disease and control samples prior to using single-cell approaches. However, this didn't provide the level of granularity needed on a cellular level. Instead, these approaches only provided an average of everything that was happening.

Banovich explains, “When we started using single-cell approaches, we could really compare apples to apples, and you could go down the list of every single cell type and say what’s happening in the disease and what’s happening in control.” Single-cell approaches enabled understanding of the molecular underpinnings of the disease, but there was another approach that could provide even more insight: spatial.

Our very first run with RAPIDS, without any optimization at all, took us from 10 hours to 10 minutes. With a little additional tuning, we were down to three minutes to process this data.

Evan Mee, Bioinformatician, Integrated Cancer Genomics Division, TGen

An Explosion of Data With Spatial Omics

“Moving out of single cell into spatial, one of the biggest, immediate impacts is that you’re generating immense amounts of data,” Banovich explains. To provide context into how big of an increase in data spatial provides, Banvoich’s team ran single-cell RNA sequencing in the lung for approximately seven years and collected samples from over 200 people. As a result, they generated data from roughly 2.5 million cells in aggregate. For even broader context, the entirety of the Human Lung Cell Atlas is 4 million cells.

TGen uses leading commercial spatial platforms, including the Vizgen MERSCOPE and 10x Genomics Xenium Analyzer. With these spatial instruments, TGen captures 30,000–50,000 cells per sample, and a single run can generate data from over 2 million cells. “In two runs on the Xenium platform, we’re basically generating data on more cells than the entirety of the Human Lung Cell Atlas Project, which was a 40-investigator, 10-country effort,” Banovich explains. “It’s really, really immense amounts of data.”

“We built the Xenium Analyzer to help cutting-edge researchers like TGen rapidly move from instrument to insight with our powerful onboard analysis, enabled by NVIDIA GPUs. The combination of Xenium with NVIDIA RAPIDS further accelerates our best-in-class workflows and enables more precise analysis so researchers can go from run to result and data to discovery even faster. TGen's work is pushing on the boundaries of science and transforming our understanding of health and disease. The world can't afford to wait for these discoveries,” explains Adrian Benjamin, global spatial marketing lead at 10x Genomics.

The 10x Genomics Xenium Analyzer. Image provided by 10x Genomics.

Computational Challenges From Spatial Omics

From relational data that enables researchers to see where cells sit in relation to one another to imaging data that can be used to overlay with molecular data, spatial multi-omics unlocks new opportunities for deeper understanding. However, these new capabilities also bring new computational challenges. It was crucial to TGen to not only find a way to address these challenges but ensure that they could make the most out of the samples received from clinical studies.

The standard workflows for processing single-cell data were manageable, as the team worked with large datasets infrequently. Once the team shifted to spatial, they quickly realized that this was a bigger challenge. The first few runs from spatial omics instruments resulted in up to 10 million cells. The Xenium Analyzer instrument, powered by NVIDIA, accelerates time to results by performing on-board analysis and outputting common file formats for use in third-party tools. However, the standard workflows used for tertiary, principal component, and clustering analysis required 10–14 hours.

Making matters even worse, these pipelines aren't fixed. Data is run through pipelines and results are then assessed as to whether the clustering algorithm performed as expected. If not, parameters are tweaked and the process is repeated. As Banovich explains, “This starts to become really, really prohibitive if each of those iterations is a 10-hour process. We found ourselves, even at 3 or 4 million cells, taking too long.”

As we look forward, we’re talking about generating datasets with tens of millions or maybe even hundreds of millions of cells. Scalability across datasets of that size is only possible because we’ve been able to use this RAPIDS implementation.

Nicholas Banovich, PhD, Associate Professor, Integrated Cancer Genomics Division, TGen

Partnering With NVIDIA

As a result, TGen turned to NVIDIA RAPIDS, an open-source suite of GPU-accelerated data science and AI libraries that improves performance across data pipelines. “We decided to look at the RAPIDS implementation of Scanpy. Our very first first run with RAPIDS, without any optimization at all, took us from 10 hours to 10 minutes,” explains Evan Mee, bioinformatician at TGen. “With a little additional tuning, we were down to three minutes to process this data.”

Human lung samples run on the 10x Genomics Xenium Analyzer. Image provided by TGen.

Time savings also translates into more impactful research. Instead of waiting for quality control and long intervals between basic analysis, Banovich’s team members can conduct more fulfilling work.

RAPIDS has changed the way Banovich and his team perform analysis and, ultimately, arrive at conclusions. Being able to iterate quickly opens up possibilities for future research. Studying large datasets unlocks a clearer picture in translational research. For instance, researchers need to observe how cells interact within their local environments. With rare cell types, this requires probing an enormous number of cells, which wouldn't have been feasible without these spatial platforms and RAPIDS analytics.

In addition to understanding rare cell types, building large atlases across three dimensions is now possible. Not only can researchers understand how cells interact at a local level, they can understand disease within the larger architecture of the tissue and see how it progresses through the system—providing a much more granular view of the disease.

Banovich summarizes NVIDIA’s impact on this next chapter, “As we look forward, we’re talking about generating datasets with tens of millions or maybe even hundreds of millions of cells. Scalability across datasets of that size is only possible because we’ve been able to use this RAPIDS implementation.”

Learn more about NVIDIA solutions for genomics.

Get Started