Targeted for v1.7
Custom Resources in Rook-Ceph can act as data/storage "providers" for other Resources in the Kubernetes cluster which may be "dependent" on the data/storage provided to them. This dependency relationship is not codified in Rook, and it is possible for Custom Resources which have dependents to be deleted, removing the underlying data that dependents need. This leaves those resources orphaned and could destroy valuable end-user data.
An example of this is a user needing NFS storage from Rook-Ceph. The user needs to create a CephCluster. They also need a CephFilesystem. Finally, they need a CephNFS which references the CephFilesystem. In this example, the CephNFS is dependent on the data pool created by the CephFilesystem which is in turn dependent on the underlying CephCluster.
It should not be possible for an administrator to delete the CephFilesystem until the CephNFS resource is deleted because a user may be using the CephNFS resource. Similarly, it should not be possible for an administrator to delete the CephCluster while the CephFilesystem exists because the filesystem could be in use. It is up to the administrator to ensure users no longer need storage before they delete resources like CephNFS or CephFilesystem.
The goal of this design is to add safety measures into Rook to help prevent accidental destruction of end-user data. This design proposes to enable this by tracking resource dependency relationships and block deletion of any given Rook-Ceph Custom Resource when it is referenced by another resource in the Kubernetes cluster.
All Rook-Ceph CRDs:
Resources for which Rook-Ceph acts as a driver/provisioner:
Kubernetes resources which can depend on Rook:
A graph of proposed dependency relationships is shown below with more detail to follow.
CephCluster
A CephCluster does not create pools itself, but the Ceph cluster it represents houses pools, and users can manually create pools using Ceph tooling outside of Kubernetes manifests. It is useful but not critical to understand which resources interact with pools and in what ways.
Dependents which can create/delete pools:
Dependents which can consume arbitrary pools including user-created pools:
Dependents that do not interact with pools:
It is most safe if the CephCluster
treats all possible Rook-Ceph CRs besides itself as simple
dependents. If a dependent exists in the same namespace, block deletion. In this way, CephCluster
is the most protected provider resource. It also acts as a root for preserving deletion ordering.
CephBlockPool
Dependents which can consume this provider's pools:
spec.pool == <a provider pool>
spec.caps
value with the string pool=<a provider pool>
Dependents via CSI:
provisioner == <operator namespace>.rbd.csi.ceph.com
ANDparameters.clusterID == <ceph cluster namespace>
ANDparameters.pool
OR parameters.dataPool
references a CephBlockPool
poolspec.CSI.Driver
== <operator namespace>.rbd.csi.ceph.com
ANDspec.CSI.VolumeAttributes["clusterID"] == <ceph cluster namespace>
ANDspec.CSI.VolumeAttributes["pool"]
OR spec.CSI.VolumeAttributes["journalPool"]
references a CephBlockPool
poolspec.storageClassName == <name of StorageClass which references the CephBlockPool>
spec.cleanupPolicy.allowUninstallWithVolumes == true
CephFilesystem
Dependents which can consume this provider's pools:
spec.pool == <a provider pool>
spec.caps
value with the string pool=<a provider pool>
Dependents via CSI:
provisioner == <operator namespace>.cephfs.csi.ceph.com
ANDparameters.clusterID == <ceph cluster namespace>
ANDparameters.pool
OR parameters.dataPool
references a CephBlockPool
poolspec.CSI.Driver
== <operator namespace>.cephfs.csi.ceph.com
ANDspec.CSI.VolumeAttributes["clusterID"] == <ceph cluster namespace>
ANDspec.CSI.VolumeAttributes["pool"]
OR spec.CSI.VolumeAttributes["journalPool"]
references a CephBlockPool
poolspec.storageClassName == <name of StorageClass which references the CephBlockPool>
spec.cleanupPolicy.allowUninstallWithVolumes == true
CephObjectStore
Dependents which can consume this provider's pools:
spec.pool == <a provider pool>
spec.caps
value with the string pool=<a provider pool>
Dependents which reference this provider by name:
spec.store == <CephObjectStore.metadata.name>
Dependents via lib-bucket-provisioner:
spec.endpoint.bucketHost == <provider's service name>.<ceph cluster namespace>.svc
ANDspec.endpoint.bucketName == <a provider bucket>
Dependents via COSI:
CephObjectZone
, CephObjectZoneGroup
, and CephObjectRealm
These resources are all part of Rook-Ceph's multi-site object storage strategy.
CephObjectZone
creates pools. A zone can be effectively thought of as the "object store" itself.
Dependents which can consume this provider's pools:
spec.zone.name == CephObjectZone.metadata.name
spec.pool = <a provider pool>
spec.caps
value with the string pool=<a provider pool>
CephObjectRealm
has dependents:
spec.realm == CephObjectRealm.metadata.name
CephObjectZoneGroup
has dependents:
spec.zoneGroup == CephObjectZoneGroup.metadata.name
CephRBDMirror has dependents:
spec.mirroring.enabled == true
CephFilesystemMirror has dependents:
spec.mirroring.enabled == true
We can identify some common metrics for determining whether a resource is a dependent of a given "provider" resource. Not all metrics are always applicable, but each these metrics appear more than once. It should be possible to design reusable patterns/methods for reusing logic.
It will be important for the user to understand why resources are not being deleted if Rook is blocking the deletion. This design proposes that the Rook operator report to the user when it is blocking deletion of a resource due to dependents in two ways:
Using both of these methods will maximize user visibility.
Status: Reported statuses will be modified as follows:
Event: Reported events will have the following content:
Rook currently inspects Kubernetes PersistentVolume (PV) resources when deleting CephClusters. This provides protection from deleting the backing Ceph cluster when user applications are using it.
With the changes proposed here, it would be more specific to block deleting CephBlockPool resources when there are PVs referencing the specific CephBlockPool. Similarly, it would be more specific to block deleting CephFilesystem resources when there are PVs referencing the specific CephFilesystem.
Kubernetes APIs are quite stable, and it is unlikely that the methods Rook uses to inspect PersistentVolumes will require changes with any regularity. If changes are necessary, Kubernetes will likely give well over a year of time to migrate away from deprecated API elements.
Removing a Ceph cluster that is hosting PVs in use by Kubernetes applications could be disastrous for users. Therefore, this document proposes to continue checking for PersistentVolumes that depend on Ceph storage. The document further proposes to increase protection of user data by detecting dependencies for specific CephBlockPools and CephFilesystems when they are deleted rather than checking when the CephCluster is deleted. Detecting the existence of PVs when the CephCluster is deleted then becomes redundant and should be removed.
As a note, StorageClasses (SCs) are only used during initial creation of a PersistentVolume based on the StorageClass.
It may not be a good idea to block deletion when there are StorageClasses that reference Rook-Ceph block/file storage. An admin may at times wish to leave StorageClasses (a more user-focused API point) and replace the Rook-Ceph resources providing the storage represented by the SC without disturbing users' ability to reference the SC. Any user that tried to use the StorageClass while the cluster was down would merely fail until a replacement cluster came online.
This document outlines the steps needed to treat StorageClasses as a dependent but proposes not to implement the dependency at this time. If we get more information in the future that provides a compelling use-case for treating StorageClasses as dependencies, this functionality can be implemented at that time.
Detecting lib-bucket-provisioner ObjectBuckets that are dependent on a given CephObjectStore
requires inspecting ObjectBuckets for a reference to a bucket found in the object store as well as a
reference to the address of the Kubernetes Service created for access to RGWs
(<service name>.<namespace>.svc
). Detecting COSI Buckets will be similar.
This detection does require accessing external APIs (lib-bucket-provisioner and COSI). This is non-ideal for COSI whose APIs will be progressing from v1alpha1, through beta stages, and then into v1 in the coming months/years. Rook can merely check for the existence of buckets in order to support lib-bucket-provisioner and COSI simultaneously with the same code. This would also remove any need for Rook to update its COSI API for dependency checking though it will still need to update its API to continue operating as a COSI driver. This would allow Rook to block deletion of CephObjectStores without care for who has "claimed" a bucket.
Given that a Kubernetes cluster might have many hundreds of ObjectBuckets, or COSI Buckets, having a simpler way of querying for dependents can lighten the load on the Kubernetes API server and reduce Rook's resource usage.
Since OBs and COSI Buckets both result in buckets being created in a CephObjectStore, Rook can query the object store and block deletion if buckets exist.
This is an elegant solution for blocking when buckets are claimed by these outside resources. It does mean that CephObjectStores that have had buckets created in them directly (by admins or users) will block until the buckets are manually deleted. An admin may need to request that users remove unneeded buckets or may instead remove the buckets themselves.
If the admin wishes to preserve the CephObjectStore's pools on deletion along with their data, the admin may merely remove the finalizer on the CephObjectStore.
While this requires more steps to delete the CephObjectStore, it provides additional safety for user data by requiring users or admins to inspect and clean up unneeded buckets.
The main downside to this approach is that Rook will not be able to report if there are specific OBs or COSI Buckets consuming storage from the CephObjectStore. An admin would need to examine the resources in their cluster to determine if there are claims to the storage manually. A diligent admin will likely have done this work beforehand.
This document proposes to implement this strategy to avoid reliance on external APIs. Rook developers should revisit the decision at a later date to discuss whether the strategy is continuing to adequately meet users' needs and whether the drawbacks noted are causing any issues.
This design will result in changes to every major Rook-Ceph controller. However, it should be quite easy to tackle these changes in stages so that changes can be more easily implemented and reviewed.
Stages by priority:
Immediate feedback is always more helpful to users when possible.