NVIDIA-Certified Professional

AI Operations

(NCP-AIO)

About This Certification

The NCP-AI Operations certification is an intermediate-level credential that validates a candidate’s ability to monitor, troubleshoot, and optimize AI infrastructure by NVIDIA. The exam is online and proctored remotely, includes 50 questions, and has a 90-minute time limit.

Please carefully review our certification FAQs and exam policies before scheduling your exam.

If you have any questions, please contact us here.

Certification Exam Details

Duration: 90 minutes  

Price: $400 

Certification level: Professional  

Subject: AI Operations 

Number of questions: 50

Prerequisites: Two to three years of operational experience working in a data center with NVIDIA hardware solutions. The candidate should be able to monitor and manage all the parts of a data center infrastructure in support of AI workloads.

Language: English 

Validity: This certification is valid for two years from issuance. Recertification may be achieved by retaking the exam.

Credentials: Upon passing the exam, participants will receive a digital badge and optional certificate indicating the certification level and topic.

Exam Preparation

Topics Covered in the Exam

Topics covered in the exam include:

  • Base Command Manager for configuration, management, and    troubleshooting
  • Slurm cluster administration
  • Kubernetes cluster administration
  • System management tools for troubleshooting and performance optimization

Candidate Audiences

  • MLOps engineers
  • DevOps engineers
  • Solution architects
  • System architects
  • AI Infrastructure engineers

Recommended Training

AI Infrastructure & Operations Fundamentals

A self-paced course that covers essential components of AI infrastructure, including compute platforms, networking, and storage solutions. The course also addresses AI operations, focusing on infrastructure management and cluster orchestration.

AI Operations Professional Workshop

A multi-day workshop where participants will gain hands-on experience with cutting-edge technologies, including NVIDIA's DCGM, InfiniBand networking, NVIDIA BlueField™ DPUs, and GPU virtualization, while learning to leverage tools for infrastructure provisioning, workload scheduling, and cluster orchestration.

Coming soon

Exam Blueprint

The table below provides an overview of the topic areas covered in the certification exam and how much of the exam is focused on that subject. 

Topics Areas % of Exam Topics Covered
Administration 36%
  • Administer Fleet Command
  • Administer Slurm cluster
  • Understand data center architecture for AI workloads
  • Administer Base Command Manager (BCM) and cluster provisioning
  • Administer Run.ai (potentially part of ACM)
  • Configure MIG (for AI and HPC)
Workload Management 16%
  • Administer Kubernetes cluster
  • Use system management tools to troubleshoot issues
Installation and Deployment 26%
  • Install and configure BCM
  • Install and initialize Kubernetes on NVIDIA hosts using BCM
  • Deploy containers from NGC
  • Deploy cloud VMI containers
  • Understand storage requirements for AI data centers
  • Deploy DOCA services on DPU Arm
Troubleshooting and Optimization 20%
  • Troubleshoot docker
  • Troubleshoot the fabric manager service for NVIDIA NVlink™/NVswitch™ systems
  • Troubleshoot BCM 
  • Troubleshoot Magnum IO components
  • Troubleshoot storage performance

Contact Us

NVIDIA offers training and certification for professionals looking to enhance their skills and knowledge in the field of AI, accelerated computing, data science, advanced networking, graphics, simulation, and more.

Contact us to learn how we can help you achieve your goals.

Stay Up to Date

Get training news, announcements, and more from NVIDIA, including the latest information on new self-paced courses, instructor-led workshops, free training, discounts, and more. You can unsubscribe at any time.