Our previous blog discussed the persistent volume challenges with peer-pods and how to resolve them. It also introduced using the CSI wrapper as a potential solution to the persistent volume usage challenges with peer-pods.
This post dives deeper into the various components that make up the persistent volume solution in peer-pods.
Interpreting the CSI plugins in peer-pods
To use persistent volumes in peer-pods, intercept the CSI Plugins in the control plane (CSI Controller Plugin) and worker node (CSI Node Plugin) through the CSI Wrapper approach. With the CSI Wrapper injected into CSI Plugins, no code changes in the original CSI Plugins and it achieves the goal to attach, detach, mount, and unmount volumes for containers in peer-pods.
One key aspect of this solution is interpreting the CSI Plugins.
Interpreting the CSI Controller Plugin
As discussed in the previous blog, CSI volume drivers implement the CSI interface. This implementation includes a cluster-level statefulset or a deployment to facilitate communication with the Kubernetes controllers. The solution is the CSI Controller Plugin. We injected a container in the CSI Controller Plugin to handle peer-pods persistent volumes.
The key component to interpret the CSI Controller Plugin is:
- CSI Controller Plugin Wrapper – A container injected in the CSI Controller Plugin to hook and filter the CSI API calls.
Figure 1: CSI Controller Plugin interpreting concept
Note the following:
- The CSI Driver Container and CSI Sidecars are originally linked through the Unix Domain Socket (UDS) /csi/csi.sock.
- The CSI Controller Plugin Wrapper container is between the CSI Driver Container and CSI Sidecars in the CSI Controller Plugin pod.
- All the API calls to the CSI Driver Container will be hooked and filtered by the CSI Controller Plugin Wrapper container.
- The CSI Controller Plugin Wrapper will revise the request or response on demand specifically for peer-pods.
Figure 2 shows how the CSI Controller Plugin Wrapper hooks and filters the API calls by adding an extra UDS /csi/csi-controller-wrapper.sock:
Figure 2: CSI Controller Plugin interpreting with UDS
Note the following:
- The CSI Sidecars in the CSI Controller Plugin communicate with the Kubernetes api-server via HTTP.
- The CSI Controller Plugin Wrapper container communicates with the CSI Sidecars via the UDS /csi/csi-controller-wrapper.sock.
- The CSI Driver Container communicates with the CSI Controller Plugin Wrapper via the UDS /csi/csi.sock.
The following is a list of CSI APIs implemented in the CSI Controller Plugin:
- CreateVolume – Creates a persistent volume through the infrastructure API.
- DeleteVolume – Deletes a persistent volume through the infrastructure API.
- ControllerPublishVolume – Attaches a persistent volume to a worker node through the infrastructure API.
- ControllerUnpublishVolume – Detaches a persistent volume from a worker node through the infrastructure API.
All are hooked and filtered in the CSI Controller Plugin Wrapper container.
Figure 3 shows an example of how the CSI API is hooked when creating and attaching a persistent volume in peer-pods:
Figure 3: APIs interpreted in CSI Controller Plugin
Note the following:
- The CreateVolume API – Goes from the CSI Sidecars to the CSI Controller Plugin Wrapper, and then from the CSI Controller Plugin Wrapper to the CSI Driver Container without change. It can leverage the same function in the CSI Driver Container to create a persistent volume.
- The ControllerPublishVolume API – The CSI Controller Plugin Wrapper will hook it and it will not be passed to the CSI Driver Container because we don’t want to call AttachVolume to attach the persistent volume to the worker node. Instead, we will attach it to the peer-pods by altering the algorithm. This approach leverages Cache and Replay, which we explain in later sections.
Interpreting the CSI Node Plugin
CSI volume drivers also have a node-level daemonset named the CSI Node Plugin to facilitate communication with every Kublet instance. We injected a container in CSI Node Plugin to handle peer-pods persistent volumes.
The key component for understanding the CSI Node Plugin is:
- CSI Node Plugin Wrapper – A container injected in the CSI Node Plugin to hook and filter the CSI API calls.
Figure 4 shows the concept of the CSI Node Plugin:
Figure 4: CSI Node Plugin interpreting concept
Note the following:
- CSI Driver Container in the CSI Controller Plugin pod and Kubelet communicates via UDS /csi/csi.sock.
- CSI Node Plugin Wrapper container is added between the CSI Driver Container in the CSI Controller Plugin pod and Kubelet.
- All the API calls to the CSI Driver Container will be hooked and filtered by the CSI Node Plugin Wrapper container
- The CSI Node Plugin Wrapper will revise the request or response on demand specifically for peer-pods.
Figure 5-1 shows how the CSI Node Plugin Wrapper hooks and filters the API calls via UDS /csi/csi.sock and /csi/csi-controller-wrapper.sock:
Figure 5-1: CSI Node Plugin interpreting via UDS
Note that this is similar to hooking in the CSI Controller Plugin.
The following is a list of CSI APIs implemented in the CSI Node Plugin:
- NodeStageVolume – Mounts a persistent volume to a global path on a worker node.
- NodeUnstageVolume – Unmounts a persistent volume from the global path on a worker node.
- NodePublishVolume – Mounts a persistent volume from the global path on a worker node to a pod.
- NodeUnpublishVolume – Unmounts a persistent volume from a pod.
These CSI APIs will be hooked and filtered in the CSI Node Plugin Wrapper container. CSI Actions will be Cached and then Replayed.
Besides the CSI Driver Container, the CSI Node Plugin Wrapper also talks with the cloud-api-adaptor through the UDS /run/peerpod/hypervisor.sock.
Figure 5-2 shows how the CSI Node Plugin Wrapper communicates with the cloud-api-adaptor to read necessary peer-pods information, such as the peer-pod's ID, and caches it:
Figure 5-2: The CSI Node Plugin communicates with cloud-api-adaptor via UDS
Cache and replay the CSI actions
Since persistent volume attaching and mounting happens on the pod VM rather than the worker node VM for peer-pods, we’ll cache some of the CSI Actions in the CSI Controller Plugin and CSI Node Plugin and replay them on the pod VM later.
The caching mechanism is achieved via the following components (illustrated in Figure 6):
- PeerpodVolume – A customized resource definition (CRD) object to save caching information, like the peer-pod ID.
- PeerpodVolume Controller – A Kubernetes controller to watch for PeerpodVolume status changes and take corresponding steps to replay the CSI actions.
Cache
Figure 6 shows what CSI Actions will be cached in PeerpodVolume and their order.
Figure 6: CSI Actions caching
The list of actions includes the following:
- CreateVolume – A PeerpodVolume object will be created in the CSI Action CreateVolume operation.
- ControllerPublishVolume – A PeerpodVolume object will be updated with the ControllerPublishVolume to cache the AttachVolume CSI Action.
- NodeStageVolume – A PeerpodVolume object will be updated when performing the NodeStageVolume to cache the volume mounting to peer-pods VM CSI Action.
- NodePublishVolume – A PeerpodVolume object will be updated when performing NodePublishVolume to cache the volume mounting to pod CSI Action.
- PeerPod Instance running – A PeerpodVolume object will be added to the PeerPod Instance when the peer-pods VM is created.
- PeerPod Instance ready – A PeerpodVolume Controller in the worker node reads the peer-pods ID, sets it in the PeerpodVolume object, and updates the PeerpodVolume to indicate the PeerPod Instance is ready.
Replay
After the PeerPod instance is in running status, the CSI Node Plugin running on the peer-pods VM will monitor the PeerpodVolume object via the PeerpodVolume Controller and perform the replay procedure when the PeerPod instance is ready.
Figure 7 shows the flow:
Figure 7: CSI Actions replaying
Note the following:
- ControllerPublishVolume – The Replay ControllerPublishVolume CSI Action will attach the created persistent volume to the peer-pods VM instead of the worker node VM.
- NodeStageVolume – The Replay NodeStageVolume CSI Action will mount the persistent volume to a global path on the peer-pods VM.
- NodePublishVolume – The Replay NodePublishVolume CSI Action will mount the persistent volume from the global path on the peer-pods VM to the pod.
- StartContainer – The components on the peer-pods VM will start the container when the persistent volume is mounted to the pod.
The next task is to examine the detailed flow and compare the persistent volume workflow in standard pods versus peer-pods.
Use case example
When using a persistent volume in a pod, PersistentVolume is pre-defined and PersistentVolumeClaim is used in the pod descriptor, as with the Persistent Volume Example. Define the PersistentVolume (PV) and PersistentVolumeClaim (PVC) as seen below. Note that the storageClassName is what the CSI volume drivers define, which is viable depending on the CSI volume drivers.
apiVersion: v1 kind: PersistentVolume metadata: name: my-pv-volume labels: type: local spec: storageClassName: my-csi-volume-driver-name capacity: storage: 10Gi accessModes: - ReadWriteOnce hostPath: path: "/mnt/data"
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-pv-claim spec: storageClassName: my-csi-volume-driver-name accessModes: - ReadWriteOnce resources: requests: storage: 3Gi
Use the PVC in a pod, as seen below:
apiVersion: v1 kind: Pod metadata: name: my-pv-pod spec: volumes: - name: my-pv-storage persistentVolumeClaim: claimName: my-pv-claim containers: - name: my-pv-container image: nginx ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/usr/share/nginx/html" name: my-pv-storage
The next section covers the flow in standard pods.
Workflow in standard pods
Creating a volume
Figure 8 shows the workflow for creating a persistent volume:
Figure 8: Workflow when creating a volume
The flow consists of the following components:
- The user provides a persistent volume claim (PVC) descriptor and creates it in the API server.
- The external-provisioner sidecar container in the CSI Controller Plugin watches the PVC and issues a CreateVolumeRequest call to the CSI socket.
- The csi-driver-container in the CSI Controller Plugin listens on the CSI socket, calls CreateVolume, and informs the CSI about its creation.
- The external-provisioner sidecar container in the CSI Controller Plugin creates a persistent volume (PV) and updates the PVC to be bound. The VolumeAttachment object is created by controller-manager.
Next is the flow for attaching volumes.
Attaching a volume
Figure 9 shows the workflow when attaching the persistent volume:
Figure 9: Workflow when attaching a volume
The flow consists of the following steps:
- The external-attacher sidecar container in the CSI Controller Plugin watches for the VolumeAttachments and submits a ControllerPublishVolume RPC call to the csi-driver-container.
- The csi-driver-container gets the ControllerPublishVolume and calls AttachVolume.
- The external-attacher updates the VolumeAttachment status.
Next is the volume mounting flow.
Mounting a volume
Figure 10 shows the workflow when mounting a persistent volume:
Figure 10: Workflow when mounting a volume
The flow consists of the following:
- Kubelet waits for the volume to be attached and submits the NodeStageVolume (formats and mounts the volume to the node at the staging dir) to the CSI-Node-Plugin.
- The CSI-Node-Plugin gets the NodeStageVolume call and mounts it to path /var/lib/kubelet/plugins/kubernetes.io/csi/pv/<pv-name>/globalmount, then responds to Kubelet.
- Kubelet calls the NodePublishVolume (to mount the volume to the pod's direcotry).
- The CSI-Node-Plugin performs the NodePublishVolume action and mounts the volume to /var/lib/kubelet/pods/<pod-uuid>/volumes/kubernetes.io~csi/<pvc-name>/mount.
- Kubelet starts the container of the pod with the provisioned volume.
Bringing the workflow together
Figure 11 shows the end-to-end workflow when creating, attaching, and mounting the persistent volume to a pod.
Figure 11: The full process for creating, attaching, and mounting persistent volumes
The flow consists of the following:
- The user provides a persistent volume claim (PVC) descriptor and creates it on the API server.
- An external-provisioner sidecar container in the CSI Controller Plugin watches the PVC and issues a CreateVolumeRequest call to the CSI socket.
- A csi-driver-container in the CSI Controller Plugin listens on the CSI socket, calls CreateVolume, and informs CSI about its creation.
- An external-provisioner sidecar container in the CSI Controller Plugin creates a persistent volume (PV) and updates the PVC to be bound. The VolumeAttachment object is created by the controller-manager.
- An external-attacher sidecar container in the CSI Controller Plugin watches for VolumeAttachments and submits a ControllerPublishVolume RPC call to the csi-driver-container.
- A csi-driver-container gets ControllerPublishVolume and calls AttachVolume.
- An external-attacher updates the VolumeAttachment status.
- Kubelet waits for the volume to be attached and submits NodeStageVolume (formats and mounts the volume to the node at the staging directory) to the CSI-Node-Plugin.
- The CSI-Node-Plugin gets the NodeStageVolume call and mounts it to path /var/lib/kubelet/plugins/kubernetes.io/csi/pv/<pv-name>/globalmount, then responds to Kubelet.
- Kubelet calls NodePublishVolume (mounts volume to the pod’s directory)
- The CSI-Node-Plugin performs NodePublishVolume and mounts the volume to /var/lib/kubelet/pods/<pod-uuid>/volumes/kubernetes.io~csi/<pvc-name>/mount.
- Kubelet starts the container of the pod with the provisioned volume.
The next topic is the workflow in peer-pods.
Workflow in peer-pods
Creating a persistent volume with peer-pods
The CSI Controller Plugin Wrapper (orange in the diagram) is injected between the CSI Sidecars (external provisioner, external attacher) and CSI Driver Container. It hooks the CreateVolumeRequest in the CreateVolume CSI Interface and performs the following actions:
- The CSI Controller Plugin Wrapper passes through the CreateVolumeRequest to the CSI Driver Container and calls CreateVolume in the CSI Driver Container to create the persistent volume.
- The CSI Controller Plugin Wrapper creates the PeerpodVolume CR object and saves it to the api-server.
- The CSI Controller Plugin Wrapper returns the VolumeCreated response to the CSI Sidecar (external provisioner).
Figure 12 below illustrates this process:
Figure 12: Creating a persistent volume in the CSI Controller Plugin
Cache-attaching volume actions to peer-pods
When attaching a persistent volume, the CSI Controller Plugin Wrapper (orange in the diagram) hooks the ControllerPublishVolume in the CSI Interface and performs the following actions:
- The CSI Controller Plugin Wrapper does not pass through the ControllerPublishVolume and will not call the AttachVolume in the CSI Driver Container.
- The CSI Controller Plugin Wrapper updates the PeerpodVolume CR object by adding the original node ID (it will be updated to the peer-pods ID in a later phase). It could be used later when replaying this CSI Action to attach the persistent volume to the corresponding peer-pods VM.
- The CSI Controller Plugin Wrapper returns a fake VolumeAttached response to the CSI Sidecar (external attacher).
- The CSI Controller Plugin Wrapper embedded a synchronized PeerpodVolume Controller, which monitors the PeerpodVolume CR object change and calls ControllerPublishVolume to attach the persistent volume to the peer-pods VM instance. This is a replay action.
Figure 13 illustrates this process:
Figure 13: Attaching persistent volume in CSI Controller Plugin
Cache-mounting volume actions to peer-pods
The CSI Node Plugin Wrapper (orange in the diagram) is injected between the CSI Node Plugin and Kubelet on the worker node. It hooks the NodeStageVolume and NodePublishVolume CSI Interface, communicates with cloud-api-adaptor, and performs the following actions:
- The CSI Node Plugin Wrapper hooked the NodeStageVolume and will not pass the API call to the CSI Node Plugin. It will not call the API to mount the persistent volume to the worker node VM.
- The CSI Node Plugin Wrapper updates the PeerpodVolume CR object by adding the NodeStageVolume CSI Actions.
- The CSI Node Plugin Wrapper returns a fake response for NodeStageVolume to Kubelet and the api-server.
- The CSI Node Plugin Wrapper hooked the NodePublishVolume and will not pass the API call to the CSI Node Plugin. It will not call the API to mount the persistent volume to the pod.
- The CSI Node Plugin Wrapper updates the PeerpodVolume CR object by adding the NodePublishVolume CSI Actions.
- The CSI Node Plugin Wrapper returns a fake response for NodePublishVolume to Kubelet and the api-server.
- The CSI Node Plugin Wrapper watches the PeerpodVolume CR object and identifies the running peer-pods instance. It retrieves the peer-pod's ID and sets it in the PeerpodVolume CR, then sets the peer-pods instance to ready.
This is illustrated in Figure 14:
Figure 14: Mounting a persistent volume in the CSI Node Plugin on the worker node
Mounting persistent volumes to containers in peer-pods
The CSI Node Plugin Wrapper (orange in the diagram) is injected between the CSI Node Plugin and Kubelet on the peer-pods VM. It embeds the PeerpodVolume Controller, watches the PeerpodVolume CR objects status change, and performs corresponding actions according to the status of the PeerpodVolume CR:
- The CSI Node Plugin Wrapper watches the PeerpodVolume CR, performs a NodeStageVolume CSI Action, and calls the corresponding API in the CSI Node Plugin to mount the persistent volume to a global path on the peer-pods VM.
- The CSI Node Plugin Wrapper watches the PeerpodVolume CR, performs a NodePublishVolume CSI Action, and calls the corresponding API in the CSI Node Plugin to mount the persistent volume to the pod on the peer-pods VM.
This is illustrated in Figure 15:
Figure 15: Mounting a persistent volume in the CSI Node Plugin on peer-pods
Bringing it all together and end-to-end architecture
Figure 16 shows the overall flow for the caching and replaying procedure:
Figure 16: Cache and replay procedures
Here is a review of the detailed steps:
- CreateVolume – A PeerpodVolume object will be created with the CSI Action CreateVolume operation.
- Cache ControllerPublishVolume – A PeerpodVolume object will be updated when performing ControllerPublishVolume to cache the AttachVolume CSI Action.
- Cache NodeStageVolume – A PeerpodVolume object will be updated when performing NodeStageVolume to cache the volume mounting to a peer-pods VM CSI Action.
- Cache NodePublishVolume – A PeerpodVolume object will be updated when performing NodePublishVolume to cache the volume mounting to a pod CSI Action.
- Cache PeerPod Instance running status – A PeerpodVolume object will be updated to PeerPod Instance running when the peer-pods VM is created.
- Cache PeerPod Instance ready status – A PeerpodVolume Controller on the worker node reads the peer-pods ID, sets it in the PeerpodVolume object, and updates the PeerpodVolume to indicate the PeerPod Instance is ready.
- Replay ControllerPublishVolume – A Replay ControllerPublishVolume CSI Action attaches the created persistent volume to the peer-pods VM instead of the worker node VM.
- Replay NodeStageVolume – A Replay NodeStageVolume CSI Action mounts the persistent volume to a global path on the peer-pods VM.
- Replay NodePublishVolume – A Replay NodePublishVolume CSI Action mounts the persistent volume from the global path on the peer-pods VM to the pod.
- StartContainer – The kata-agent component on the peer-pods VM starts the container when the persistent volume is mounted to the pod.
Peer-pods and CSI Plugins
This blog post dived deeper into the technical implementation of persistent storage in a peer-pods solution. It covered the CSI Plugins in the control plane and worker node. It also discussed the new CRD to help with cache and replay CSI actions.
In addition, it explained the workflow when creating, attaching, and mounting a persistent volume to a container in peer-pods.
The next blog post will provide hands-on instructions for deploying and running the persistent volume in peer-pods.
저자 소개
Qi Feng is an architect of cloud-native infrastructure and Confidential Computing in IBM Cloud and Systems. He is the maintainer of Cloud API Adapter of Confidential Container. He is a big fan of open source and has contributed to various CNCF communities in addition to CoCo.
Da Li is working in the area of confidential containers, is one maintainer of the CNCF confidential containers cloud-api-adaptor project, and focuses on csi-wrapper, podvm image build and e2e test pipelines.
Yohei is working on enhancements of performance and security of software stacks for IBM Z. He has contributed to various open source projects related to cloud and security, and is recently contributing to the Confidential Container project, a CNCF sandbox project.
Lei Li is a Senior Software Engineer at IBM Systems, working on Confidential Computing in IBM Cloud and focusing on the implementation of Confidential computing technology.
유사한 검색 결과
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.