登录 / 注册 Account

This post updates previous posts by Zvonko Kaiser about using the nvidia-container-runtime-hook to access GPUs in containers on bare metal using Podman and in OpenShift. This post will show how to access NVIDIA GPUs in containers run with Podman, on a host running RHEL 7 or RHEL 8.

Using GPUs in containers is easier than you think. We start by setting up the host with the necessary NVIDIA drivers and CUDA software, a container runtime hook (nvidia-container-toolkit) and a custom SELinux policy. Then we show an example of how to run a container with Podman so that the GPUs are accessible inside the container.

Shell scripts for the driver installation, container hook setup, as well as verification tests are provided on GitHub.

Host Preparation

First, install the necessary container tools to run containers on the host.

# yum -y install podman

NVIDIA Driver Installation

NVIDIA drivers for RHEL must be installed on the host as a prerequisite for using GPUs in containers with podman. Let’s prepare the host by installing NVIDIA drivers and NVIDIA container enablement. See the install guide here.

NVIDIA drivers need to be compiled for the kernel in use. The build process requires the kernel-devel package to be installed.

# yum -y install kernel-devel-`uname -r` kernel-headers-`uname -r`

The NVIDIA driver installation requires the DKMS package. DKMS is not supported or packaged by Red Hat. Work is underway to improve the packaging of NVIDIA drivers for Red Hat Enterprise Linux. DKMS can be installed from the EPEL repository.

First install theEPEL repository. To install EPEL with DKMS on RHEL 7

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# yum -y install dkms

To install EPEL with DKMS on RHEL 8

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
# yum -y install dkms

The newest NVIDIA drivers are located in the following repository. To install the CUDA 10.2 repository on RHEL7

# yum -y install http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.2.89-1.x86_64.rpm

To install the CUDA 10.2 repository on RHEL8

# yum -y install http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-repo-rhel8-10.2.89-1.x86_64.rpm

Remove the nouveau kernel module, (otherwise the nvidia kernel module will not load). The installation of the NVIDIA driver package will blacklist the driver in the kernel command line (nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off), so that the nouveau driver will not be loaded on subsequent reboots.

# modprobe -r nouveau

There are many CUDA tools and libraries. You can install the entire CUDA stack on the bare metal system

# yum -y install cuda

Or, you can be more selective and install only the necessary device drivers.

# yum -y install xorg-x11-drv-nvidia xorg-x11-drv-nvidia-devel kmod-nvidia-latest-dkms

Load the NVIDIA and the unified memory kernel modules.

# nvidia-modprobe && nvidia-modprobe -u

Verify that the installation and the drivers are working on the host system.

# nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 | sed -e 's/ /-/g'
Tesla-V100-SXM2-16GB

Adding the nvidia-container-runtime-hook

Podman includes support for OCI runtime hooks for configuring custom actions related to the lifecycle of the container. OCI hooks allow users to specify programs to run at various stages in the container lifecycle. Because of this, we only need to install the nvidia-container-toolkit package. See NVIDIA’s documentation for more information.

The next step is to install libnvidia-container and the nvidia-container-runtime repositories

# distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
# curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | tee /etc/yum.repos.d/nvidia-docker.repo

The next step will install an OCI prestart hook. The prestart hook is responsible for making NVIDIA libraries and binaries available in a container (by bind-mounting them in from the host). Without the hook, users would have to include libraries and binaries into each and every container image that might use a GPU. Hooks minimize container size and simplify management of container images by ensuring only a single copy of libraries and binaries are required. The prestart hook is triggered by the presence of certain environment variables in the container: NVIDIA_DRIVER_CAPABILITIES=compute,utility.

# yum -y install nvidia-container-toolkit

Installing the toolkit sets up the hook and installs the packages libnvidia-container1 and libnvidia-container-tools.

Adding the SELinux policy module

NVIDIA provides a custom SELinux policy to make it easier to access GPUs from within containers, while still maintaining isolation. To run NVIDIA containers contained and not privileged, we have to install an SELinux policy tailored for running CUDA GPU workloads. The policy creates a new SELinux type (nvidia_container_t) with which the container will be running.

Furthermore, we can drop all capabilities and prevent privilege escalation. See the invocation below to have a glimpse of how to start a NVIDIA container.

First install the SELinux policy module.

# wget https://raw.githubusercontent.com/NVIDIA/dgx-selinux/master/bin/RHEL7/nvidia-container.pp
# semodule -i nvidia-container.pp

Check and restore the labels

The SELinux policy heavily relies on the correct labeling of the host. Therefore we have to make sure that the files that are needed have the correct SELinux label.

  1. Restorecon all files that the prestart hook will need

# nvidia-container-cli -k list | restorecon -v -f -
  1. Restorecon all accessed devices

# restorecon -Rv /dev
Everything is now set up for running a GPU-enabled container on this host.

Verify functionality of SELinux and prestart hook

To verify that the drivers and container tools are configured correctly, try running a cuda-vector-add container. We can run the container with docker or podman.

# podman run --user 1000:1000 --security-opt=no-new-privileges --cap-drop=ALL \
--security-opt label=type:nvidia_container_t  \
docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1

If the test passes, the drivers, hooks and the container runtime are functioning correctly.

Try it out with GPU accelerated PyTorch

An interesting application of GPUs is accelerated machine learning training. We can use the PyTorch framework to train a neural network model to recognize handwritten digits from the MNIST dataset, taking advantage of GPU parallelization to accelerate the computation.

Download the python code into a directory

# mkdir pytorch_mnist_ex && cd pytorch_mnist_ex
# wget https://raw.githubusercontent.com/pytorch/examples/master/mnist/main.py

As it is written, this example will try to find GPUs and if it does not, it will run on CPU. We want to make sure that it fails with a useful error if it cannot access a GPU, so we make the following modification to the file with sed:

# sed -i '98 s/("cuda.*$/("cuda")/' main.py
Run the training
# podman run --rm --net=host -v $(pwd):/workspace:Z \
--security-opt=no-new-privileges \
--cap-drop=ALL --security-opt label=type:nvidia_container_t \
docker.io/pytorch/pytorch:latest \
python3 main.py --epochs=3

The expected output if everything is configured correctly is first some lines about downloading and extracting the dataset, and then output of how the training is progressing for 3 epochs, which should train the model to about 99% accuracy.

9920512it [00:00, 15186754.95it/s]
32768it [00:00, 286982.84it/s]
1654784it [00:00, 3721114.36it/s]
8192it [00:00, 56946.85it/s]    
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz
Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw
...
...
Done!

Train Epoch: 1 [0/60000 (0%)]  Loss: 2.333409
Train Epoch: 1 [640/60000 (1%)]  Loss: 1.268057
Train Epoch: 1 [1280/60000 (2%)]  Loss: 0.859086
Train Epoch: 1 [1920/60000 (3%)]  Loss: 0.609263
Train Epoch: 1 [2560/60000 (4%)]  Loss: 0.389265
Train Epoch: 1 [3200/60000 (5%)]  Loss: 0.426565
...

If the training runs, the drivers, hooks and the container runtime are functioning correctly. Otherwise, there will be an error about no CUDA-capable devices detected.

Running rootless:

In RHEL8.1 and later, you can run containers rootless with podman. To use GPUs in rootless containers you need to modify /etc/nvidia-container-runtime/config.toml and change these values:

[nvidia-container-cli]
#no-cgroups = false
no-cgroups = true

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
debug = "~/.local/nvidia-container-runtime.log"

As a non-root user the system hooks are not used by default, so you need to set the --hooks-dir option in the podman run command. The following should allow you to run nvidia-smi in a rootless podman container:

$ podman run --security-opt=no-new-privileges --cap-drop=ALL --security-opt \
label=type:nvidia_container_t --hooks-dir=/usr/share/containers/oci/hooks.d/ \
docker.io/nvidia/cuda:10.2-base nvidia-smi

Please note that there is ongoing work with GPU access in rootless containers. The above steps should work, but may need to be reverted to use GPUs in containers run as root.

Relevant GitHub issues can be found at: