A Deep Dive on Supporting Multi-Instance GPUs in Containers and Kubernetes
, NVIDIA
, Google
MIG (short for Multi-Instance GPU) is a mode of operation in the newest generation of NVIDIA Ampere GPUs. It allows one to partition a GPU into a set of "GPU Instances", each of which appears to the software consuming it as a mini-GPU, with a fixed partition of memory and a fixed partition of compute resources.
We'll dive deep into the details of how we built support for MIG into containers and Kubernetes. Learn how MIG devices are made available to containers, what challenges we faced building MIG support for Kubernetes, and how you can take advantage of what we have built today. We'll also discuss best practices around how to distribute MIG devices throughout a Kubernetes cluster, including how to handle the life cycle of MIG devices on an individual node. We'll conclude with a demo of MIG on Google's GKE, showing off its ability to auto-scale when demand for a particular MIG device type increases.