photographer

One of the more liberating features of virtual machine infrastructure is the ability to control resources consumed by virtual servers without having to perform hardware maintenance or change server load-outs. A great example of this is the use of volume snapshots to essentially catch a point-in-time backup of a VM without having to run a backup client on its OS. Have a worrisome update you need to perform but would like the ability to roll back in case things go poorly? Take a snapshot first and now you have a roll-back procedure. Have a legacy server in production that you would like to test a configuration change to before committing to it? Take a snapshot, clone the VM to a development network segment, and test with impunity.

OpenShift provides robust snapshot capabilities for virtual machines running in OpenShift Virtualization by extending the base OpenShift snapshot features to include guest OS coordination and multi-disk management..

Snapshots in OpenShift

The OpenShift Container Storage Interface (CSI) provides snapshot functionality for persistent volumes by defining an interface so third-party storage drivers can control and manage their native snapshot capabilities. This allows the implementation of snapshots to stay within their respective storage systems, and supports any one of an ecosystem of back-end storage providers to plug seamlessly into the OpenShift Container Platform. In order to support snapshots, there must be a VolumeSnapshotClass that corresponds to the StorageClass used for virtual machine disks.

In this blog, I will be using Trident for Kubernetes for NetApp which yields the following StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: trident-svm
mountOptions:
- nfsvers=4.1
parameters:
backendType: ontap-nas
provisioner: csi.trident.netapp.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

along with the following VolumeSnapshotClass:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-snapclass
driver: csi.trident.netapp.io
deletionPolicy: Delete

Once the appropriate StorageClass and VolumeSnapshotClass are set up, a snapshot of a PVC may be requested by creating a VolumeSnapshot that points to the source PVC. For example, a snapshot of the volume test01 would look like:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: test01-snapshot
spec:
source:
persistentVolumeClaimName: test01
volumeSnapshotClassName: csi-snapclass

OpenShift Virtualization makes use of VolumeSnapshots to provide snapshots of virtual machines. The custom resource VirtualMachineSnapshot creates corresponding VolumeSnapshots for all supported volumes in the VirtualMachine, and may be taken with the guest in either the stopped or running state. Here is an example of a VirtualMachineSnapshot against a Microsoft Windows VM called win2k19-profound-gecko:

apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineSnapshot
metadata:
name: win2k19-profound-gecko-2022-2-4
spec:
source:
apiGroup: kubevirt.io
kind: VirtualMachine
name: win2k19-profound-gecko

Requirements

Live VM snapshots require the guest OS to have the QEMU guest agent installed and running in order to freeze writes to the file system during the snapshot process. For I/O intensive applications like databases, special subsystems exist to ensure writes are not attempted during snapshots. In Linux, the fsfreeze command can be invoked by the QEMU guest agent to halt activity on file systems, while under Microsoft Windows, the Volume Shadow Copy Service (VSS) is informed by the QEMU Guest Agent VSS Provider when a snapshot copy occurs. If the guest agent is not running, the guest OS will not be given a chance to freeze operation, and corruption could result if that snapshot is used during a restore.

For RPM based Linux distributions, the qemu-guest-agent rpm provides the necessary guest agent. This RPM is installed by default in cloud images for Red Hat Enterprise Linux (RHEL), CentOS, and Fedora. For Microsoft Windows VMs, the guest agent may be installed using the Windows driver disk available through the OpenShift Virtualization web console. See the OpenShift Virtualization Documentation for more details on installing the QEMU guest agent on either set of operating systems.

Finally, if the VM has hot-plugged volumes, live snapshots will not be possible.

Management of Snapshots from the OpenShift Web Console

In the OpenShift console, each VirtualMachine under Workloads -> Virtualization has a Snapshots tab. From here, it is possible to take snapshots, clean up old snapshots by deleting them, or restore from a snapshot. Click here for the documentation on Virtual Machine Snapshots

To take a snapshot, click the Take Snapshot button:

take_snapshot

Once the snapshot triggers and completes, it will be displayed in the list of snapshots for the VM:

win_vm_with_snapshot

Verifying the fsfreeze in Linux

In Linux, a snapshot manifests in the guest OS as a set of system logs generated by the qemu-ga daemon. For example:

Feb 11 00:12:34 rhel8-brainy-bear qemu-ga[824]: info: guest-fsfreeze called
Feb 11 00:12:34 rhel8-brainy-bear qemu-ga[824]: info: executing fsfreeze hook with arg 'freeze'
Feb 11 00:12:35 rhel8-brainy-bear qemu-ga[824]: info: executing fsfreeze hook with arg 'thaw'
Feb 11 00:12:35 rhel8-brainy-bear qemu-ga[824]: info: executing fsfreeze hook with arg 'thaw'

In case an application needs better integration with the fsfreeze functionality, there is a way to call hooks during freeze and thaw events. Any executable script placed in /etc/qemu-ga/fsfreeze-hook.d will be executed with the command-line argument freeze or thaw. Just for illustration purposes, the following script will write a message to all logged in users when a freeze or thaw operation happens:

/etc/qemu-ga/fsfreeze-hook.d/alert.sh

#!/bin/bash
echo $@ | wall

Note:The freeze and thaw functionality is implemented in an idempotent manner. Care should be taken to be sure any additional hooks do not produce side effects if they are run more than once.

Verifying VSS in Windows

For a Microsoft Windows VM with the QEMU Guest Agent running, it is possible to track the progress of the VSS in the System event log:

VSS_running The QEMU Guest Agent VSS Provider also records an event as it starts:

QEMU_VSS_provider_running

The Volume Shadow Copy Service provides an API to ensure I/O performed by applications in Windows do not coincide with snapshots. For a comprehensive list of VSS aware applications included with Microsoft Windows, see Microsoft's list of In-Box VSS writers. Not included in this list is Microsoft SQL Server, which also supports VSS

Conclusion

Whether you're running Linux or Windows VMs, OpenShift Virtualization provides a way to keep your data safe. This blog has only scratched the surface of what is possible with VM snapshots.