NVIDIA Clara Parabricks Pipelines: Quickstart Guide

Get NVIDIA Clara Parabricks Pipelines up and running on a server and start using it in 10 minutes


Step 1: Make sure installation requirements are met

The following are required to install Parabricks:

  • Access to the internet
  • nvidia-driver that supports cuda-9.0 or higher
  • nvidia-driver that supports cuda-10.0 or higher if you want to run deepvariant or cnnscorevariants
  • nvidia-docker or singularity version 2.6.1 or higher
  • Python 3
  • curl (Most Linux systems will already have this installed)

The following are the hardware requirements

  • Run on any GPU that supports CUDA architecture 60, 61, 70, 75 and has 12GB GPU RAM or more. It has been tested on NVIDIA P100, NVIDIA V100, and NVIDIA T4 GPUs.
    • 1 GPU server should have 64GB CPU RAM, at least 16 CPU threads
    • 2 GPU server should have 100GB CPU RAM, at least 24 CPU threads
    • 4 GPU server should have 196GB CPU RAM, at least 32 CPU threads
    • 8 GPU server should have 392GB CPU RAM, at least 48 CPU threads

Step 2: Downloading Installation package

Request NVIDIA Parabricks access from developer.nvidia.com/clara-parabricks to get an installation package for your GPU server.


Step 3: Install Parabricks suite

Install the Parabricks package to your system:

# Step 1: Unzip the package.
$ tar -xzf parabricks.tar.gz

# Step 2: Run the installer
$ sudo ./parabricks/installer.py

# Step 3: verify your installation.
# This should display the parabricks version number:
$ pbrun version

After installation, pbrun is the executable that will start any tool in the Parabricks software suite. During installation you can choose to create a link at /usr/bin/pbrun to make it available for system wide access. Otherwise, you can access pbrun from your local installation directory (default: /opt/parabricks/pbrun).


Step 4: Example run

# Run the fq2bam tool, which aligns, co-ordinate sorts and marks duplicates
# in a pair-ended fastq file. Ref.fa is the bwa-indexed reference file

$ pbrun fq2bam --ref Ref.fa --in-fq sample_1.fq.gz sample_2.fq.gz --out-bam output.bam

You can download a sample dataset using the following command:

$ wget -O parabricks_sample.tar.gz \
"https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz?Expires=1613069864&Signature=WxLeyitbvR%2B0rO4MX%2B0GohDw89g%3D&AWSAccessKeyId=AKIAJGDUNN2G2ZAH3Q3A"

To run the sample dataset:

$ tar -xvzf parabricks_sample.tar.gz

$ /parabricks/pbrun fq2bam --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz --out-bam output.bam

The above test should take under 250 seconds on a 4 V100 GPU system