This post updates previous posts by Zvonko Kaiser about using the nvidia-container-runtime-hook to access GPUs in containers on bare metal using Podman and in OpenShift. This post will show how to access NVIDIA GPUs in containers run with Podman, on a host running RHEL 7 or RHEL 8.
Using GPUs in containers is easier than you think. We start by setting up the host with the necessary NVIDIA drivers and CUDA software, a container runtime hook (nvidia-container-toolkit) and a custom SELinux policy. Then we show an example of how to run a container with Podman so that the GPUs are accessible inside the container.
Shell scripts for the driver installation, container hook setup, as well as verification tests are provided on GitHub.
First, install the necessary container tools to run containers on the host.
# yum -y install podman
NVIDIA Driver Installation
NVIDIA drivers for RHEL must be installed on the host as a prerequisite for using GPUs in containers with podman. Let’s prepare the host by installing NVIDIA drivers and NVIDIA container enablement. See the install guide here.
NVIDIA drivers need to be compiled for the kernel in use. The build process requires the kernel-devel package to be installed.
# yum -y install kernel-devel-`uname -r` kernel-headers-`uname -r`
The NVIDIA driver installation requires the DKMS package. DKMS is not supported or packaged by Red Hat. Work is underway to improve the packaging of NVIDIA drivers for Red Hat Enterprise Linux. DKMS can be installed from the EPEL repository.
First install theEPEL repository. To install EPEL with DKMS on RHEL 7
# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # yum -y install dkms
To install EPEL with DKMS on RHEL 8
# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm # yum -y install dkms
The newest NVIDIA drivers are located in the following repository. To install the CUDA 10.2 repository on RHEL7
# yum -y install http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.2.89-1.x86_64.rpm
To install the CUDA 10.2 repository on RHEL8
# yum -y install http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-repo-rhel8-10.2.89-1.x86_64.rpm
Remove the nouveau kernel module, (otherwise the nvidia kernel module will not load). The installation of the NVIDIA driver package will blacklist the driver in the kernel command line (nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off), so that the nouveau driver will not be loaded on subsequent reboots.
# modprobe -r nouveau
There are many CUDA tools and libraries. You can install the entire CUDA stack on the bare metal system
# yum -y install cuda
Or, you can be more selective and install only the necessary device drivers.
# yum -y install xorg-x11-drv-nvidia xorg-x11-drv-nvidia-devel kmod-nvidia-latest-dkms
Load the NVIDIA and the unified memory kernel modules.
# nvidia-modprobe && nvidia-modprobe -u
Verify that the installation and the drivers are working on the host system.
# nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 | sed -e 's/ /-/g' Tesla-V100-SXM2-16GB
Adding the nvidia-container-runtime-hook
Podman includes support for OCI runtime hooks for configuring custom actions related to the lifecycle of the container. OCI hooks allow users to specify programs to run at various stages in the container lifecycle. Because of this, we only need to install the nvidia-container-toolkit package. See NVIDIA’s documentation for more information.
The next step is to install libnvidia-container and the nvidia-container-runtime repositories
# distribution=$(. /etc/os-release;echo $ID$VERSION_ID) # curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | tee /etc/yum.repos.d/nvidia-docker.repo
The next step will install an OCI prestart hook. The prestart hook is responsible for making NVIDIA libraries and binaries available in a container (by bind-mounting them in from the host). Without the hook, users would have to include libraries and binaries into each and every container image that might use a GPU. Hooks minimize container size and simplify management of container images by ensuring only a single copy of libraries and binaries are required. The prestart hook is triggered by the presence of certain environment variables in the container: NVIDIA_DRIVER_CAPABILITIES=compute,utility.
# yum -y install nvidia-container-toolkit
Installing the toolkit sets up the hook and installs the packages libnvidia-container1 and libnvidia-container-tools.
Adding the SELinux policy module
NVIDIA provides a custom SELinux policy to make it easier to access GPUs from within containers, while still maintaining isolation. To run NVIDIA containers contained and not privileged, we have to install an SELinux policy tailored for running CUDA GPU workloads. The policy creates a new SELinux type (nvidia_container_t) with which the container will be running.
Furthermore, we can drop all capabilities and prevent privilege escalation. See the invocation below to have a glimpse of how to start a NVIDIA container.
First install the SELinux policy module.
# wget https://raw.githubusercontent.com/NVIDIA/dgx-selinux/master/bin/RHEL7/nvidia-container.pp # semodule -i nvidia-container.pp
Check and restore the labels
The SELinux policy heavily relies on the correct labeling of the host. Therefore we have to make sure that the files that are needed have the correct SELinux label.
Restorecon all files that the prestart hook will need
# nvidia-container-cli -k list | restorecon -v -f -
Restorecon all accessed devices
# restorecon -Rv /dev
Verify functionality of SELinux and prestart hook
To verify that the drivers and container tools are configured correctly, try running a cuda-vector-add container. We can run the container with docker or podman.
# podman run --user 1000:1000 --security-opt=no-new-privileges --cap-drop=ALL \ --security-opt label=type:nvidia_container_t \ docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1
If the test passes, the drivers, hooks and the container runtime are functioning correctly.
Try it out with GPU accelerated PyTorch
An interesting application of GPUs is accelerated machine learning training. We can use the PyTorch framework to train a neural network model to recognize handwritten digits from the MNIST dataset, taking advantage of GPU parallelization to accelerate the computation.
Download the python code into a directory
# mkdir pytorch_mnist_ex && cd pytorch_mnist_ex # wget https://raw.githubusercontent.com/pytorch/examples/master/mnist/main.py
As it is written, this example will try to find GPUs and if it does not, it will run on CPU. We want to make sure that it fails with a useful error if it cannot access a GPU, so we make the following modification to the file with sed:
# sed -i '98 s/("cuda.*$/("cuda")/' main.py
# podman run --rm --net=host -v $(pwd):/workspace:Z \ --security-opt=no-new-privileges \ --cap-drop=ALL --security-opt label=type:nvidia_container_t \ docker.io/pytorch/pytorch:latest \ python3 main.py --epochs=3
The expected output if everything is configured correctly is first some lines about downloading and extracting the dataset, and then output of how the training is progressing for 3 epochs, which should train the model to about 99% accuracy.
9920512it [00:00, 15186754.95it/s] 32768it [00:00, 286982.84it/s] 1654784it [00:00, 3721114.36it/s] 8192it [00:00, 56946.85it/s] Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw ... ... Done! Train Epoch: 1 [0/60000 (0%)] Loss: 2.333409 Train Epoch: 1 [640/60000 (1%)] Loss: 1.268057 Train Epoch: 1 [1280/60000 (2%)] Loss: 0.859086 Train Epoch: 1 [1920/60000 (3%)] Loss: 0.609263 Train Epoch: 1 [2560/60000 (4%)] Loss: 0.389265 Train Epoch: 1 [3200/60000 (5%)] Loss: 0.426565 ...
If the training runs, the drivers, hooks and the container runtime are functioning correctly. Otherwise, there will be an error about no CUDA-capable devices detected.
In RHEL8.1 and later, you can run containers rootless with podman. To use GPUs in rootless containers you need to modify /etc/nvidia-container-runtime/config.toml and change these values:
[nvidia-container-cli] #no-cgroups = false no-cgroups = true [nvidia-container-runtime] #debug = "/var/log/nvidia-container-runtime.log" debug = "~/.local/nvidia-container-runtime.log"
As a non-root user the system hooks are not used by default, so you need to set the --hooks-dir option in the podman run command. The following should allow you to run nvidia-smi in a rootless podman container:
$ podman run --security-opt=no-new-privileges --cap-drop=ALL --security-opt \ label=type:nvidia_container_t --hooks-dir=/usr/share/containers/oci/hooks.d/ \ docker.io/nvidia/cuda:10.2-base nvidia-smi
Please note that there is ongoing work with GPU access in rootless containers. The above steps should work, but may need to be reverted to use GPUs in containers run as root.
Relevant GitHub issues can be found at: