What is CSI migration, why does it matter?

Historically, Kubernetes storage drivers have been delivered as “in-tree” plugins where the drivers lived and were shipped as part of the Kubernetes core payload.

Over the past few years, the in-tree approach has transitioned to a new standard called Container Storage Interface (CSI) where the driver’s code lives outside of Kubernetes. New features and improvements aside, this provides better flexibility and freedom to storage vendors who can now fix, improve and release their drivers independently of the Kubernetes release cycle.

The in-tree drivers are progressively being deprecated upstream and will be ultimately removed. In order to provide a transition path to clusters that are currently using in-tree persistent volumes (PVs), a CSI migration path is offered. This process is transparent to end-users and applications, the volumes still appear to be in-tree but the internal calls are transparently translated to CSI.

OpenShift approach to CSI migration

The OpenShift approach is to automatically enable CSI migration when the feature becomes GA upstream. There is no action required from the administrator or end-users and the process is transparent.

Red Hat OpenShift supports CSI migration for the in-tree plugins it supports and has a supported CSI equivalent. Starting OCP 4.14, CSI migration is supported and enabled for all drivers that OpenShift supports, this includes:

  • AWS EBS
  • Azure Disk
  • Azure File
  • GCP PD
  • RH-OSP Cinder

What about vSphere?

Long story short, vSphere CSI migration is supported and automatically enabled in OCP 4.14.

However during the OCP 4.13 pre-release testing, we discovered three important issues that could strongly impact customers using in-tree PVs. These issues are not always reproducible but are serious enough for Red Hat to take proactive measures.

One of the issues is in Kubernetes KCM and the two others are in vSphere itself. These issues can result in volumes becoming un-attachable after they've been migrated thus impacting the workloads relying on persistent storage.

In OCP 4.13 we took the decision to enable CSI migration only for new clusters and keep it off for upgraded clusters. Administrators can explicitly opt in and enable migration.

In OCP 4.14, the Kubernetes bug is fixed and in the meantime we collaborated with VMware who fixed the two other bugs in vSphere 7.0u3L+ or 8.0u2+.

These two versions are the ones officially recommended by VMware for CSI migration.

OpenShift approach to vSphere CSI migration

How do we approach CSI migration?

As mentioned before in OCP 4.13 we took the decision to enable migration only for new environments, this avoids the impact as new clusters don’t have any existing in-tree PVs.

In OCP 4.14, CSI migration is enabled by default for all environments, while new environments are safe, existing clusters upgrading from either 4.12 or 4.13 can be at risk if they run on top of an unfixed vSphere version.

Since OCP is not controlling the underlying vSphere version and in order to preserve workload stability, we added upgrade checks that prevent administrators from upgrading to 4.14 if the cluster meets the following conditions:

  • CSI migration is NOT already enabled
    AND
  • OCP is NOT running on vSphere 7.0u3L+ or 8.0u2+
    AND
  • We detect the presence of vSphere in-tree PVs

As such, the clusters failing into this situation will be set to Upgradeable=False and the following message will be reported by oc adm upgrade:

Sphere CSI migration will be enabled in OpenShift-4.14. Your cluster appears to be using in-tree vSphere volumes and is on a vSphere version that has CSI migration related bugs. See - https://access.redhat.com/node/7011683 for more information, before upgrading to 4.14.

We understand that updating vSphere may not be as easy as it seems; other VMware workloads can run in parallel and OpenShift administrators don’t necessarily have control over the vSphere update policy.

For these reasons, we added an option to override the vSphere version check with an administrator acknoledgment and proceed with upgrade. While this process is fully supported, we strongly recommend to carefully review the risks before going this route.

The following sections cover the details for each upgrade path.

It’s worth noting that new OCP 4.14 clusters don’t require vSphere 7.0u3L+ or 8.0u2+ because fresh environments, by definition, don’t have any in-tree PVs.

OpenShift 4.13 to 4.14 upgrades

OpenShift 4.13 to 4.14 upgrades are blocked for clusters using in-tree PVs and don’t already have CSI migration enabled.

The safest way to unlock the upgrade is to update vSphere to the versions recommended by VMware (7.0u3L+ or 8.0u2+).

Alternatively, If updating vSphere is not possible, an administrator can provide an acknowledgement and proceed with the upgrade.

oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.13-kube-127-vsphere-migration-in-4.14":"true"}}' --type=merge

 

While this process is fully supported, we strongly recommend to carefully review the risks before going this route.

OpenShift 4.12 to 4.14 upgrades

OpenShift 4.12 to 4.14 also called EUS to EUS upgrades are blocked for clusters using in-tree PVs.

The safest way to unlock the upgrade is to update vSphere to the versions recommended by VMware (7.0u3L+ or 8.0u2+).

Alternatively, If updating vSphere is not possible, an administrator can provide an acknowledgement and proceed with the upgrade. In this specific case two separate acks are required:

oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.12-kube-126-vsphere-migration-in-4.14":"true"}}' --type=merge

oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.13-kube-127-vsphere-migration-in-4.14":"true"}}' –type=merge

 

While this process is fully supported, we strongly recommend to carefully review the risks before going this route.

OpenShift 4.12 to 4.13 upgrades

OpenShift 4.12 to 4.13 upgrades are blocked for clusters using in-tree PVs and not running on top of vSphere 7.0u3L+ or 8.0u2+.

This may sound counterintuitive because CSI migration is not enabled by default in 4.13; this is because the mechanism introduced to block 4.12 to 4.14 also blocks 4.12 to 4.13 upgrades.

Since CSI migration is not enabled in 4.13 for upgraded clusters, it is safe to provide an administrator acknowledgement and proceed with the upgrade:

oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.12-kube-126-vsphere-migration-in-4.14":"true"}}' --type=merge

 

While it is safe to provide an admin-ack for this upgrade path, we recommend that you start planning an update of your vSphere environment for a future update to 4.14.

Is it safe to enable migration in OCP 4.13?

If you’re deploying a new OCP 4.13 environment, CSI migration is enabled by default; however if you’re upgrading from 4.12, it will be enabled only after you opt in.

At the time 4.13 went GA, we had three issues at hand, one in KCM and two in vSphere.

The KCM bugfix has been backported to OCP 4.13.10 and VMware fixed the two other issues in vSphere 7.0u3L+ or 8.0u2+.

It is therefore safe to enable CSI migration if you’re running OCP 4.13.10+ on top of vSphere 7.0u3L+ or 8.0u2+.

Conclusion

vSphere CSI migration has been treated differently from other drivers because of a number of serious issues that can impact workloads.

Because Red Hat's mission is to provide open, stable and production ready software, we had to introduce safeguards that provide transparent communication of the risks as well as a safe approach to upgrades.

For any additional question or concerns, Red Hat support is here to help and assist you in your OpenShift journey!


About the author

Gregory Charot is a Principal Technical Product Manager at Red Hat covering OpenStack Storage, Ceph integration, Edge computing as well as OpenShift on OpenStack storage. His primary mission is to define product strategy and design features based on customers and market demands, as well as driving the overall productization to deliver production-ready solutions to the market. OpenSource and Linux-passionate since 1998, Charot worked as a production engineer and system architect for eight years before joining Red Hat—first as an Architect, then as a Field Product Manager prior to his current role as a Technical Product Manager for the Cloud Platform Business Unit.

Read full bio