The growth in AI is driving the need for substantial compute infrastructure in data centers to train and deploy models. The right cluster management tools are critical for managing this infrastructure at scale and ensuring its optimal utilization. This lab will introduce the NVIDIA Base Command Manager software and describe the best practices for managing AI infrastructure. You’ll gain hands-on experience with provisioning cluster nodes, managing software images, creating users and groups, deploying Kubernetes, running a containerized workload, building a custom monitoring script, and configuring nodes with GPUs. We'll cover both Base Command Manager, which is included in the NVIDIA DGX SW stack, and Base Command Manager Essentials, which is included in NVIDIA AI Enterprise.
Prerequisite(s):
Basic system administration skills
Explore more training options offered by the NVIDIA Deep Learning Institute (DLI). Choose from an extensive catalog of self-paced, online courses or instructor-led virtual workshops to help you develop key skills in AI, HPC, graphics & simulation, and more.
Ready to validate your skills? Get NVIDIA certified and distinguish yourself in the industry.