Get started today with this GPU Ready Apps Guide.
NAMD (NAnoscale Molecular Dynamics) is a production-quality molecular dynamics application designed for high-performance simulation of large biomolecular systems. Developed by University of Illinois at Urbana-Champaign (UIUC), NAMD is distributed free of charge with binaries and source code.
The latest version, NAMD 2.11, typically runs 7x faster on NVIDIA GPUs over CPU-only systems*, enabling users to run molecular dynamics simulations in hours instead of days. It’s also up to 2x faster than NAMD 2.10, which helps users save on hardware cost while significantly accelerating their scientific discoveries.
*Dual CPU server, Intel E5-2698 v3@2.3GHz, NVIDIA Tesla K80 with ECC off, Autoboost On; STMV dataset
NAMD should be portable to any parallel platform with a modern C++ compiler. Precompiled GPU-accelerated NAMD 2.11 binaries are available for download for Windows, Mac OS X, and Linux. NAMD makes use of both GPU and CPU, therefore we recommend to have a relatively modern CPU to achieve the best NAMD application performance.
NAMD developers provide complete, optimized binaries for all common platforms to which NAMD has been ported. The binaries can be obtained from the NAMD download page.
It should not be necessary to compile NAMD unless you want to add or modify features, improve performance by building a Charm++ version that takes advantage of special networking hardware, or if you have customized your version of NAMD. Complete directions for compiling NAMD are contained in the release notes, and are available on the NAMD web site and are included in all distributions.
BELOW IS A BRIEF SUMMARY OF THE COMPILATION PROCEDURE
1. Download NAMD Source code.
2. Extract the package and Charm++
tar xf NAMD_2.11_Source.tar.gz
cd NAMD_2.11_Source
tar xf charm-6.7.0.tar
3. Download and install TCL and FFTW.
Please see the release notes for the appropriate version of these libraries.
wget http://www.ks.uiuc.edu/Research/namd/libraries/fftw-linux-x86_64.tar.gz
wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64.tar.gz
wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64-threaded.tar.gz
tar xzf fftw-linux-x86_64.tar.gz
mv linux-x86_64 fftw
tar xzf tcl*-linux-x86_64.tar.gz
tar xzf tcl*-linux-x86_64-threaded.tar.gz
mv tcl*-linux-x86_64 tcl
mv tcl*-linux-x86_64-threaded tcl-threaded
4. Build Charm++
The build script has an interactive mode, you can also specify options in the command line as explained below.
cd charm-6.7.0
./build charm++ {arch} {C compiler} {Fortran compiler} {other options}
For {arch}, it is required to use one of the following options: 'verbs-linux-x86_64' for multiple node configurations, 'multicore-linux64' for a single node system. Other options are not required but recommended for best performance: 'smp' to use SMP for multi-node builds (Click here for more details), '--with-production' to use production level optimizations.
Example 1. Charm++ build for multi-node configuration using Intel compilers and SMP with all production optimizations:
./build charm++ verbs-linux-x86_64 icc smp --with-production
Example 2. Charm++ build for single-node configuration using Intel compilers with all production optimizations.
./build charm++ multicore-linux64 icc --with-production
5. Build NAMD.
Make sure you're in the main NAMD directory, then configure and run make:
./config {namd-arch} --charm-arch {charm-arch} {opts}
cd {namd-arch}
make -j4
For {namd-arch} we recommend 'Linux-x86_64-icc', {charm-arch} should be the same arch that you built Charm++ with, {opts} should be set to '--with-cuda' to enable GPU acceleration, otherwise set to '--without-cuda' for CPU-only build. The build directory can be extended with .[comment] to build, e.g., Linux-x86_64.cuda. This is helpful to manage different builds of NAMD. Note that the extension does not have any meaning, so './config Linux-x86_64.cuda' won't be a CUDA build unless you also specify '--with-cuda'.
Optional flags for the CUDA build:
a. --cuda-prefix can be used to set correct path to your CUDA toolkit;
b. --cuda-gencode allows you to add different GPU compute architecture.
Example 1. NAMD build for x86_64 architecture using Intel compilers, with GPU acceleration and specified CUDA path, for single node:
./config Linux-x86_64-icc --charm-arch multicore-linux64-icc --with-cuda --cuda-prefix /usr/local/cuda
Example 2. NAMD build for x86_64 architecture using Intel compilers, with GPU acceleration and specified CUDA path, for multiple nodes:
./config Linux-x86_64-icc --charm-arch verbs-linux-x86_64-smp-icc --with-cuda --cuda-prefix /usr/local/cuda
Example 3. NAMD build for x86_64 architecture using Intel compilers, without GPU acceleration, for multiple nodes:
./config Linux-x86_64-icc --charm-arch verbs-linux-x86_64-smp-icc
Before running a GPU-accelerated version of NAMD, install the latest NVIDIA display driver for your GPU. To run NAMD, you need ‘namd2’ executable and for multi-node run you also need the charmrun executable. Make sure to specify CPU affinity options as explained below.
General command line to run NAMD on a single-node system:
namd2 {namdOpts} {inputFile}
On a multi-node system NAMD has to be run with charmrun as specified below:
charmrun {charmOpts} namd2 {namdOpts} {inputFile}
{charmOpts}
{namdOpts}
We recommend to always check the startup messages in NAMD to make sure the options are set correctly. Additionally, ++verbose option can provide a more detailed output for the execution for runs that use charmrun. Running top or other system tools can help you make sure you’re getting the requested thread mapping.
./namd2 +p 20 +devices 0,1 apoa1.namd
Example 2. Run STMV on 2 nodes, each node with 2xGPU and 2xCPU (20 cores) and SMP NAMD build (note that we launch 4 processes, each controlling 1 GPU):
charmrun ++p 36 ++ppn 9 ./namd2 ++nodelist $NAMD_NODELIST +setcpuaffinity +pemap 1-9,11-19 +commap 0,10 +devices 0,1 stmv.namd
Note that by default, the "rsh" command is used to start namd2 on each node specified in the nodelist file. You can change this via the CONV_RSH environment variable, i.e., to use ssh instead of rsh run "export CONV_RSH=ssh" (see NAMD release notes for details).
This section demonstrates GPU acceleration for selected datasets. Note that you may need to update the input files to reduce the output energy/timing frequency. The benchmarks are listed in increasing number of atoms order. It is recommended to run large test cases like STMV on systems with multiple GPUs.
Apolipoprotein A1 (ApoA1) is the major protein component of high-density lipoprotein (HDL) in the bloodstream and plays a specific role in lipid metabolism. The ApoA1 benchmark consists of 92,224 atoms and has been a standard NAMD cross-platform benchmark for years.
ATP synthase is an enzyme that synthesizes adenosine triphosphate (ATP), the common molecular energy unit in cells. The F1-ATPase benchmark is a model of the F1 subunit of ATP synthase.
Satellite Tobacco Mosaic Virus (STMV ) is a small, icosahedral plant virus that worsens the symptoms of infection by Tobacco Mosaic Virus (TMV). SMTV is an excellent candidate for research in molecular dynamics because it is relatively small for a virus and is on the medium to high end of what is feasible to simulate using traditional molecular dynamics in a workstation or a small server.
The figure of merit is "ns/day" (the higher the better) but is printed in the log as "days/ns" (the lower the better). Note that most molecular dynamics users focus on “ns/day”, the inverted value of “days/ns”. It is recommended to take the average of all occurrences of this value for benchmarking purposes; the frequency of the output can be controlled in the corresponding *.namd input file. Due to initial load balancing the "Initial time:" outputs should be neglected in favor of the later "Benchmark time:" outputs.
See the reference results below for different system configurations with dual-socket CPU and NVIDIA Tesla boards for 1 to 4 nodes connected with 4xEDR InfiniBand.