The purpose of this article is to describe a solution to an issue that you may have faced when using Red Hat OpenShift Data Foundation in a cloud environment. Specifically, we’re looking at how to address the demand for more resources, more nodes and more Object Storage Devices (OSDs) as an OpenShift Data Foundation deployment matures.

In this article you’ll find a step-by-step procedure in order to migrate the data from the existing OSDs to new ones with a bigger size, in order to manage more data with the same resources. The procedure avoids data loss, and OpenShift Data Foundation will migrate all the data for you with two simple logical steps:

  • Add a new StorageDeviceSet to the StorageCluster
  • Remove one by one the old OSDs and the old StorageDeviceSet

Important: Before implementing the procedure, specially on production environment, it’s warmly suggested to open a support case in order to let the support team know about the activity and to let them check the environment so that you can proceed more safely

Let’s go with the details!

  1. Backup of StorageCluster CR

$ oc project openshift-storage
Already on project "openshift-storage" on server "https://api.ocpcluster.example.com:6443".
$ oc get storagecluster ocs-storagecluster -o yaml > ocs-storagecluster.yaml
  1. Edit StorageCluster CR

Add a new storageDeviceSet object containing the new disk configuration, in this case the size will be 2Ti. Here the configuration that have to be added under the spec property:

 storageDeviceSets:
 - config: {}
   count: 1
   dataPVCTemplate:
     metadata: {}
     spec:
       accessModes:
       - ReadWriteOnce
       resources:
         requests:
           storage: 2Ti
       storageClassName: managed-csi-azkey
       volumeMode: Block
     status: {}
   name: ocs-deviceset-large
   placement: {}
   preparePlacement: {}
   replica: 3
   resources:
     limits:
       cpu: "2"
       memory: 5Gi
     requests:
       cpu: "2"
       memory: 5Gi
  1. Wait for new PVCs creation

$ oc get pvc
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0                   Bound    pvc-ebd5ef7a-e802-4973-82be-b26fe7af973c   50Gi       RWO            ocs-storagecluster-ceph-rbd   127d
ocs-deviceset-large-0-data-04b54w   Bound    pvc-ed847deb-d716-4d66-83b0-d4c614ad3f55   2Ti        RWO            managed-csi-azkey             74s
ocs-deviceset-large-1-data-0q6mt5   Bound    pvc-b4af134b-8f7b-4d50-a0c0-b2f68068b313   2Ti        RWO            managed-csi-azkey             74s
ocs-deviceset-large-2-data-0b2p6b   Bound    pvc-1c07d21b-d81e-4755-a05b-22947b3b67e1   2Ti        RWO            managed-csi-azkey             74s
ocs-deviceset-small-0-data-025w8j   Bound    pvc-ef3dfb24-ff39-441a-bf41-3d700efe94d4   500Gi      RWO            managed-csi-azkey             17h
ocs-deviceset-small-1-data-08r9h5   Bound    pvc-5121174f-13d1-42d0-a24a-562978d151b4   500Gi      RWO            managed-csi-azkey             17h
ocs-deviceset-small-2-data-0czk2s   Bound    pvc-6d0e8bb8-b999-4367-a65d-bfde4c1c043b   500Gi      RWO            managed-csi-azkey             17h
  1. Check that new OSDs have been created

If you don’t have the rook-ceph-tools pod enabled, you can activate it by following the article: https://access.redhat.com/articles/4628891

$ oc -n openshift-storage rsh $(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
sh-4.4$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                                                 STATUS  REWEIGHT  PRI-AFF
-1         7.46489  root default
-5         7.46489      region northeurope
-10         2.48830          zone northeurope-1
-9         2.48830              host ocpcluster-kl7ds-ocs-northeurope1-zzhql
 2    hdd  2.00000                  osd.2                                         up   1.00000  1.00000
 4    hdd  0.48830                  osd.4                                         up   1.00000  1.00000
-14         2.48830          zone northeurope-2
-13         2.48830              host ocpcluster-kl7ds-ocs-northeurope2-4b6wx
 0    hdd  2.00000                  osd.0                                         up   1.00000  1.00000
 5    hdd  0.48830                  osd.5                                         up   1.00000  1.00000
-4         2.48830          zone northeurope-3
-3         2.48830              host ocpcluster-kl7ds-ocs-northeurope3-4gzb5
 1    hdd  2.00000                  osd.1                                         up   1.00000  1.00000
 3    hdd  0.48830                  osd.3                                         up   1.00000  1.00000
  1. Wait for data rebalance to be completed

The output of ceph status command has to be HEALTH_OK and all pgs have to be in active+clean state.

Before:

$ oc -n openshift-storage rsh $(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
sh-4.4$ ceph status
 cluster:
   id:     e2c7dfa8-fa8b-4ba7-a3f6-b22e2d4d410f
   health: HEALTH_OK
 services:
   mon: 3 daemons, quorum b,c,d (age 2d)
   mgr: a(active, since 2d)
   mds: 1/1 daemons up, 1 hot standby
   osd: 6 osds: 6 up (since 2m), 6 in (since 3m); 125 remapped pgs
 data:
   volumes: 1/1 healthy
   pools:   4 pools, 193 pgs
   objects: 86.97k objects, 332 GiB
   usage:   999 GiB used, 6.5 TiB / 7.5 TiB avail
   pgs:     206865/260913 objects misplaced (79.285%)
            124 active+remapped+backfill_wait
            68  active+clean
            1   active+remapped+backfilling
 io:
   client:   1.9 KiB/s rd, 299 KiB/s wr, 2 op/s rd, 8 op/s wr
   recovery: 23 MiB/s, 5 objects/s

After:

$ oc -n openshift-storage rsh $(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
sh-4.4$ ceph status
 cluster:
   id:     e2c7dfa8-fa8b-4ba7-a3f6-b22e2d4d410f
   health: HEALTH_OK
 services:
   mon: 3 daemons, quorum b,c,d (age 2d)
   mgr: a(active, since 2d)
   mds: 1/1 daemons up, 1 hot standby
   osd: 6 osds: 6 up (since 69m), 6 in (since 70m)
 data:
   volumes: 1/1 healthy
   pools:   4 pools, 193 pgs
   objects: 87.17k objects, 333 GiB
   usage:   1021 GiB used, 6.5 TiB / 7.5 TiB avail
   pgs:     193 active+clean
 io:
   client:   2.2 KiB/s rd, 488 KiB/s wr, 2 op/s rd, 6 op/s wr
  1. Remove old OSDs

This step is based on the solution https://access.redhat.com/solutions/5015451

  1. Scale to zero ocs-operator and rook-ceph-operator deployments

$ oc scale deploy ocs-operator --replicas 0
deployment.apps/ocs-operator scaled
$ oc scale deploy rook-ceph-operator --replicas 0
deployment.apps/rook-ceph-operator scaled
$ oc get deploy ocs-operator rook-ceph-operator
NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
ocs-operator         0/0     0            0           128d
rook-ceph-operator   0/0     0            0           128d
  1. Get the osd.id of all the OSDs that are going to be removed

In this case osd.3 osd.4 and osd.5:

$ oc -n openshift-storage rsh $(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
sh-4.4$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                                                 STATUS  REWEIGHT  PRI-AFF
-1         7.46489  root default
-5         7.46489      region northeurope
-10         2.48830          zone northeurope-1
-9         2.48830              host ocpcluster-kl7ds-ocs-northeurope1-zzhql
 2    hdd  2.00000                  osd.2                                         up   1.00000  1.00000
 4    hdd  0.48830                  osd.4                                         up   1.00000  1.00000
-14         2.48830          zone northeurope-2
-13         2.48830              host ocpcluster-kl7ds-ocs-northeurope2-4b6wx
 0    hdd  2.00000                  osd.0                                         up   1.00000  1.00000
 5    hdd  0.48830                  osd.5                                         up   1.00000  1.00000
-4         2.48830          zone northeurope-3
-3         2.48830              host ocpcluster-kl7ds-ocs-northeurope3-4gzb5
 1    hdd  2.00000                  osd.1                                         up   1.00000  1.00000
 3    hdd  0.48830                  osd.3                                         up   1.00000  1.00000

Important: Execute the following steps in serial mode one OSD a time for each OSD to remove, waiting for the data rebalance to be terminated after each OSD removal, in order to avoid potential data loss

  1. Scale to zero of the osd.id deployment to remove

In this case the first one will be osd.3:

$ oc scale deploy rook-ceph-osd-3 --replicas 0
deployment.apps/rook-ceph-osd-3 scaled
$ oc get deploy rook-ceph-osd-3
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
rook-ceph-osd-3   0/0     0            0           19h
  1. Remove the OSD

The failed_osd_id variable must contain the ID of the OSD to remove, in this case respectively 3, 4, and 5:

$ failed_osd_id=3
$ oc process -n openshift-storage ocs-osd-removal -p FORCE_OSD_REMOVAL=true -p FAILED_OSD_IDS=${failed_osd_id} | oc create -f -
job.batch/ocs-osd-removal-job created
  1. Wait for the job completion

Check the log of the newly created pod and look for the "completed removal" message:

$ oc get jobs
NAME                                                     COMPLETIONS   DURATION   AGE
ocs-osd-removal-job                                      1/1           13s        45s
rook-ceph-osd-prepare-2abe011277f790a287a5a129e960558c   1/1           32s        85m
rook-ceph-osd-prepare-a604505c4d1ba7640d40e4553f495658   1/1           29s        85m
rook-ceph-osd-prepare-dac4a35f2d709d73b7af34935b4fd19b   1/1           30s        85m
rook-ceph-osd-prepare-e0b2c88b9729e8cccd0f64c3bfa09dbb   1/1           31s        19h
rook-ceph-osd-prepare-e81129ea7423d35d417a8675f58f8d1c   1/1           30s        19h
$ oc get pod | grep ocs-osd-removal-job
ocs-osd-removal-job-mswng                                         0/1     Completed   0          56s
$ oc logs ocs-osd-removal-job-mswng | tail -2
2023-12-14 10:53:15.403183 I | cephosd: no ceph crash to silence
2023-12-14 10:53:15.403231 I | cephosd: completed removal of OSD 3
  1. Remove the job

$ oc delete job ocs-osd-removal-job
job.batch "ocs-osd-removal-job" deleted
  1. Wait for data rebalance to be completed

The output of ceph status command has to be HEALTH_OK and all pgs have to be in active+clean state.

Before:

$ oc -n openshift-storage rsh $(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
sh-4.4$ ceph status
 cluster:
   id:     e2c7dfa8-fa8b-4ba7-a3f6-b22e2d4d410f
   health: HEALTH_WARN
           Degraded data redundancy: 12672/261621 objects degraded (4.844%), 24 pgs degraded, 24 pgs undersized
 services:
   mon: 3 daemons, quorum b,c,d (age 2d)
   mgr: a(active, since 2d)
   mds: 1/1 daemons up, 1 hot standby
   osd: 5 osds: 5 up (since 3m), 5 in (since 2m); 28 remapped pgs
 data:
   volumes: 1/1 healthy
   pools:   4 pools, 193 pgs
   objects: 87.21k objects, 333 GiB
   usage:   958 GiB used, 6.0 TiB / 7.0 TiB avail
   pgs:     12672/261621 objects degraded (4.844%)
            2081/261621 objects misplaced (0.795%)
            165 active+clean
            24  active+undersized+degraded+remapped+backfilling
            4   active+remapped+backfilling
 io:
   client:   852 B/s rd, 99 KiB/s wr, 1 op/s rd, 9 op/s wr
   recovery: 147 MiB/s, 37 objects/s

After:

$ oc -n openshift-storage rsh $(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
sh-4.4$ ceph status
 cluster:
   id:     e2c7dfa8-fa8b-4ba7-a3f6-b22e2d4d410f
   health: HEALTH_OK
 services:
   mon: 3 daemons, quorum b,c,d (age 2d)
   mgr: a(active, since 2d)
   mds: 1/1 daemons up, 1 hot standby
   osd: 5 osds: 5 up (since 19m), 5 in (since 17m)
 data:
   volumes: 1/1 healthy
   pools:   4 pools, 193 pgs
   objects: 87.56k objects, 334 GiB
   usage:   1004 GiB used, 6.0 TiB / 7.0 TiB avail
   pgs:     193 active+clean
 io:
   client:   852 B/s rd, 71 KiB/s wr, 1 op/s rd, 6 op/s wr

NOTE: Repeat steps from 6.3 to 6.7 for each OSD to remove

  1. Remove the old storageDeviceSet from the storageCluster CR

Edit the storageCluster CR with the command oc edit storagecluster ocs-storagecluster and remove the section related to the old storageDeviceSet, in this case that will be the one with the 500Gi disk size:

- config: {}
 count: 1
 dataPVCTemplate:
   metadata: {}
   spec:
     accessModes:
     - ReadWriteOnce
     resources:
       requests:
         storage: 500Gi
     storageClassName: managed-csi-azkey
     volumeMode: Block
   status: {}
 name: ocs-deviceset-small
 placement: {}
 preparePlacement: {}
 replica: 3
 resources:
   limits:
     cpu: "2"
     memory: 5Gi
   requests:
     cpu: "2"
     memory: 5Gi
  1. Scale to replica 1 the ocs-operator deployment

$ oc scale deploy ocs-operator --replicas 1
deployment.apps/ocs-operator scaled
$ oc get deploy
NAME                                                        READY   UP-TO-DATE   AVAILABLE   AGE
csi-addons-controller-manager                               1/1     1            1           128d
csi-cephfsplugin-provisioner                                2/2     2            2           128d
csi-rbdplugin-provisioner                                   2/2     2            2           128d
noobaa-endpoint                                             1/1     1            1           128d
noobaa-operator                                             1/1     1            1           128d
ocs-metrics-exporter                                        1/1     1            1           128d
ocs-operator                                                1/1     1            1           128d
odf-console                                                 1/1     1            1           128d
odf-operator-controller-manager                             1/1     1            1           128d
rook-ceph-crashcollector-2ab761a21d224ffa17656fcbf9ca40b7   1/1     1            1           19d
rook-ceph-crashcollector-58b62ca45efa9920a18db0e7f340975a   1/1     1            1           19d
rook-ceph-crashcollector-812474f5d99299c4d9485f0394522c7c   1/1     1            1           19d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a           1/1     1            1           128d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b           1/1     1            1           128d
rook-ceph-mgr-a                                             1/1     1            1           128d
rook-ceph-mon-b                                             1/1     1            1           128d
rook-ceph-mon-c                                             1/1     1            1           128d
rook-ceph-mon-d                                             1/1     1            1           78d
rook-ceph-operator                                          1/1     1            1           128d
rook-ceph-osd-0                                             1/1     1            1           153m
rook-ceph-osd-1                                             1/1     1            1           153m
rook-ceph-osd-2                                             1/1     1            1           153m
rook-ceph-tools                                             1/1     1            1           114d
  1. Final check

Check for the old OSDs and PVCs removal:

$ oc get pvc
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0                   Bound    pvc-ebd5ef7a-e802-4973-82be-b26fe7af973c   50Gi       RWO            ocs-storagecluster-ceph-rbd   128d
ocs-deviceset-large-0-data-04b54w   Bound    pvc-ed847deb-d716-4d66-83b0-d4c614ad3f55   2Ti        RWO            managed-csi-azkey             154m
ocs-deviceset-large-1-data-0q6mt5   Bound    pvc-b4af134b-8f7b-4d50-a0c0-b2f68068b313   2Ti        RWO            managed-csi-azkey             154m
ocs-deviceset-large-2-data-0b2p6b   Bound    pvc-1c07d21b-d81e-4755-a05b-22947b3b67e1   2Ti        RWO            managed-csi-azkey             154m
$ oc -n openshift-storage rsh $(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
sh-4.4$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                                                 STATUS  REWEIGHT  PRI-AFF
-1         6.00000  root default
-5         6.00000      region northeurope
-10         2.00000          zone northeurope-1
-9         2.00000              host ocpcluster-kl7ds-ocs-northeurope1-zzhql
 2    hdd  2.00000                  osd.2                                         up   1.00000  1.00000
-14         2.00000          zone northeurope-2
-13         2.00000              host ocpcluster-kl7ds-ocs-northeurope2-4b6wx
 0    hdd  2.00000                  osd.0                                         up   1.00000  1.00000
-4         2.00000          zone northeurope-3
-3         2.00000              host ocpcluster-kl7ds-ocs-northeurope3-4gzb5
 1    hdd  2.00000                  osd.1                                         up   1.00000  1.00000
$ oc -n openshift-storage rsh $(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
sh-4.4$ ceph status
 cluster:
   id:     e2c7dfa8-fa8b-4ba7-a3f6-b22e2d4d410f
   health: HEALTH_OK
 services:
   mon: 3 daemons, quorum b,c,d (age 2d)
   mgr: a(active, since 2d)
   mds: 1/1 daemons up, 1 hot standby
   osd: 3 osds: 3 up (since 28m), 3 in (since 27m)
 data:
   volumes: 1/1 healthy
   pools:   4 pools, 193 pgs
   objects: 87.72k objects, 335 GiB
   usage:   1004 GiB used, 5.0 TiB / 6 TiB avail
   pgs:     193 active+clean
 io:
   client:   852 B/s rd, 71 KiB/s wr, 1 op/s rd, 5 op/s wr

저자 소개

Luca Busetti was born and raised in the province of Bergamo in the northern Italy. He spent more than 10 years as a consultant in a system integrator company, and during those years he had the opportunity to work for some customers who operate in different industries like banking, insurance, energy and the public sector.

Read full bio