Under extenuating circumstances, steps may be necessary to recover the cluster health. There are several types of recovery addressed in this document.
Under extenuating circumstances, the mons may lose quorum. If the mons cannot form quorum again, there is a manual procedure to get the quorum going again. The only requirement is that at least one mon is still healthy. The following steps will remove the unhealthy mons from quorum and allow you to form a quorum again with a single mon, then grow the quorum back to the original size.
The Rook kubectl Plugin has a command restore-quorum
that will
walk you through the mon quorum automated restoration process.
If the name of the healthy mon is c
, you would run the command:
kubectl rook-ceph mons restore-quorum c
See the restore-quorum documentation for more details.
When the Rook CRDs are deleted, the Rook operator will respond to the deletion event to attempt to clean up the cluster resources. If any data appears present in the cluster, Rook will refuse to allow the resources to be deleted since the operator will refuse to remove the finalizer on the CRs until the underlying data is deleted. For more details, see the dependency design doc.
While it is good that the CRs will not be deleted and the underlying Ceph data and daemons continue to be
available, the CRs will be stuck indefinitely in a Deleting
state in which the operator will not
continue to ensure cluster health. Upgrades will be blocked, further updates to the CRs are prevented, and so on.
Since Kubernetes does not allow undeleting resources, the following procedure will allow you to restore
the CRs to their prior state without even necessarily suffering cluster downtime.
!!! note
In the following commands, the affected `CephCluster` resource is called `rook-ceph`. If yours is named differently, the
commands will need to be adjusted.
Scale down the operator.
kubectl -n rook-ceph scale --replicas=0 deploy/rook-ceph-operator
Backup all Rook CRs and critical metadata
# Store the `CephCluster` CR settings. Also, save other Rook CRs that are in terminating state.
kubectl -n rook-ceph get cephcluster rook-ceph -o yaml > cluster.yaml
# Backup critical secrets and configmaps in case something goes wrong later in the procedure
kubectl -n rook-ceph get secret -o yaml > secrets.yaml
kubectl -n rook-ceph get configmap -o yaml > configmaps.yaml
Remove the owner references from all critical Rook resources that were referencing the CephCluster
CR.
Programmatically determine all such resources, using this command:
# Determine the `CephCluster` UID
ROOK_UID=$(kubectl -n rook-ceph get cephcluster rook-ceph -o 'jsonpath={.metadata.uid}')
# List all secrets, configmaps, services, deployments, and PVCs with that ownership UID.
RESOURCES=$(kubectl -n rook-ceph get secret,configmap,service,deployment,pvc -o jsonpath='{range .items[?(@.metadata.ownerReferences[*].uid=="'"$ROOK_UID"'")]}{.kind}{"/"}{.metadata.name}{"\n"}{end}')
# Show the collected resources.
kubectl -n rook-ceph get $RESOURCES
Verify that all critical resources are shown in the output. The critical resources are these:
rook-ceph-admin-keyring
, rook-ceph-config
, rook-ceph-mon
, rook-ceph-mons-keyring
rook-ceph-mon-endpoints
rook-ceph-mon-*
, rook-ceph-mgr-*
rook-ceph-mon-*
, rook-ceph-osd-*
, rook-ceph-mgr-*
rook-ceph-mon-*
and the OSD PVCs (named <deviceset>-*
, for example set1-data-*
)For each listed resource, remove the ownerReferences
metadata field, in order to unlink it from the deleting CephCluster
CR.
To do so programmatically, use the command:
for resource in $(kubectl -n rook-ceph get $RESOURCES -o name); do
kubectl -n rook-ceph patch $resource -p '{"metadata": {"ownerReferences":null}}'
done
For a manual alternative, issue kubectl edit
on each resource, and remove the block matching:
ownerReferences:
- apiVersion: ceph.rook.io/v1
blockOwnerDeletion: true
controller: true
kind: `CephCluster`
name: rook-ceph
uid: <uid>
Before completing this step, validate these things. Failing to do so could result in data loss.
cluster.yaml
contains the CephCluster
CR.ownerReference
to the CephCluster
CR removed.Remove the finalizer from the CephCluster
resource. This will cause the resource to be immediately deleted by Kubernetes.
kubectl -n rook-ceph patch cephcluster/rook-ceph --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'
After the finalizer is removed, the CephCluster
will be immediately deleted. If all owner references were properly removed,
all ceph daemons will continue running and there will be no downtime.
Create the CephCluster
CR with the same settings as previously
# Use the same cluster settings as exported in step 2.
kubectl create -f cluster.yaml
If there are other CRs in terminating state such as CephBlockPools, CephObjectStores, or CephFilesystems, follow the above steps as well for those CRs:
Scale up the operator
kubectl -n rook-ceph scale --replicas=1 deploy/rook-ceph-operator
Watch the operator log to confirm that the reconcile completes successfully.
kubectl -n rook-ceph logs -f deployment/rook-ceph-operator
Situations this section can help resolve:
up
and in
before disaster.CephCluster
CephBlockPool
CephFilesystem
CephNFS
CephObjectStore
.fsid
in secrets/rook-ceph-mon
with that of the old one.Assuming dataHostPathData
is /var/lib/rook
, and the CephCluster
trying to adopt is named rook-ceph
.
/var/lib/rook
in all the Rook Ceph nodes to a different directory. Backups will be used later./var/lib/rook/rook-ceph/rook-ceph.config
from any previous Rook Ceph node and save the old cluster fsid
from its content./var/lib/rook
from all the Rook Ceph nodes.CephCluster
descriptor to the new Kubernetes cluster, especially identical spec.storage.config
and spec.storage.nodes
, except mon.count
, which should be set to 1
.CephFilesystem
CephBlockPool
CephNFS
CephObjectStore
descriptors (if any) to the new Kubernetes cluster.kubectl -n rook-ceph logs -f rook-ceph-operator-xxxxxxx
, and wait until the orchestration has settled.rook-ceph-mon-a
, rook-ceph-mgr-a
, and all the auxiliary pods up and running, and zero (hopefully) rook-ceph-osd-ID-xxxxxx
running. ceph -s
output should report 1 mon, 1 mgr running, and all of the OSDs down, all PGs are in unknown
state. Rook should not start any OSD daemon since all devices belongs to the old cluster (which have a different fsid
).Run kubectl -n rook-ceph exec -it rook-ceph-mon-a-xxxxxxxx bash
to enter the rook-ceph-mon-a
pod,
mon-a# cat /etc/ceph/keyring-store/keyring # save this keyring content for later use
mon-a# exit
Stop the Rook operator by running kubectl -n rook-ceph edit deploy/rook-ceph-operator
and set replicas
to 0
.
Stop cluster daemons by running kubectl -n rook-ceph delete deploy/X
where X is every deployment in namespace rook-ceph
, except rook-ceph-operator
and rook-ceph-tools
.
Save the rook-ceph-mon-a
address with kubectl -n rook-ceph get cm/rook-ceph-mon-endpoints -o yaml
in the new Kubernetes cluster for later use.
SSH to the host where rook-ceph-mon-a
in the new Kubernetes cluster resides.
/var/lib/rook/mon-a
rook-ceph-mon-ID
directory (/var/lib/rook/mon-ID
) in the previous backup, copy to /var/lib/rook/mon-a
. ID
is any healthy mon node ID of the old cluster./var/lib/rook/mon-a/keyring
with the saved keyring, preserving only the [mon.]
section, remove [client.admin]
section.Run docker run -it --rm -v /var/lib/rook:/var/lib/rook ceph/ceph:v14.2.1-20190430 bash
. The Docker image tag should match the Ceph version used in the Rook cluster. The /etc/ceph/ceph.conf
file needs to exist for ceph-mon
to work.
touch /etc/ceph/ceph.conf
cd /var/lib/rook
ceph-mon --extract-monmap monmap --mon-data ./mon-a/data # Extract monmap from old ceph-mon db and save as monmap
monmaptool --print monmap # Print the monmap content, which reflects the old cluster ceph-mon configuration.
monmaptool --rm a monmap # Delete `a` from monmap.
monmaptool --rm b monmap # Repeat, and delete `b` from monmap.
monmaptool --rm c monmap # Repeat this pattern until all the old ceph-mons are removed
monmaptool --rm d monmap
monmaptool --rm e monmap
monmaptool --addv a [v2:10.77.2.216:3300,v1:10.77.2.216:6789] monmap # Replace it with the rook-ceph-mon-a address you got from previous command.
ceph-mon --inject-monmap monmap --mon-data ./mon-a/data # Replace monmap in ceph-mon db with our modified version.
rm monmap
exit
Tell Rook to run as old cluster by running kubectl -n rook-ceph edit secret/rook-ceph-mon
and changing fsid
to the original fsid
. Note that the fsid
is base64 encoded and must not contain a trailing carriage return. For example:
echo -n a811f99a-d865-46b7-8f2c-f94c064e4356 | base64 # Replace with the fsid from your old cluster.
Disable authentication by running kubectl -n rook-ceph edit cm/rook-config-override
and adding content below:
data:
config: |
[global]
auth cluster required = none
auth service required = none
auth client required = none
auth supported = none
Bring the Rook Ceph operator back online by running kubectl -n rook-ceph edit deploy/rook-ceph-operator
and set replicas
to 1
.
Watch the operator logs with kubectl -n rook-ceph logs -f rook-ceph-operator-xxxxxxx
, and wait until the orchestration has settled.
STATE: Now the new cluster should be up and running with authentication disabled. ceph -s
should report 1 mon & 1 mgr & all of the OSDs up and running, and all PGs in either active
or degraded
state.
Run kubectl -n rook-ceph exec -it rook-ceph-tools-XXXXXXX bash
to enter tools pod:
vi key
# [paste keyring content saved before, preserving only `[client admin]` section]
ceph auth import -i key
rm key
Re-enable authentication by running kubectl -n rook-ceph edit cm/rook-config-override
and removing auth configuration added in previous steps.
Stop the Rook operator by running kubectl -n rook-ceph edit deploy/rook-ceph-operator
and set replicas
to 0
.
Shut down entire new cluster by running kubectl -n rook-ceph delete deploy/X
where X is every deployment in namespace rook-ceph
, except rook-ceph-operator
and rook-ceph-tools
, again. This time OSD daemons are present and should be removed too.
Bring the Rook Ceph operator back online by running kubectl -n rook-ceph edit deploy/rook-ceph-operator
and set replicas
to 1
.
Watch the operator logs with kubectl -n rook-ceph logs -f rook-ceph-operator-xxxxxxx
, and wait until the orchestration has settled.
STATE: Now the new cluster should be up and running with authentication enabled. ceph -s
output should not change much comparing to previous steps.
It is possible to migrate/restore an rook/ceph cluster from an existing Kubernetes cluster to a new one without resorting to SSH access or ceph tooling. This allows doing the migration using standard kubernetes resources only. This guide assumes the following:
Do the following in the new cluster:
rook-ceph-operator
down to zero: kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas 0
and deleting the other deployments. An example command to do this is k -n rook-ceph delete deployment -l operator!=rook
rook-ceph-mgr-a-keyring
, rook-ceph-mon
, rook-ceph-mons-keyring
, rook-ceph-osd-0-keyring
, ...rook-ceph-mon-a
, rook-ceph-mon-b
, ... Note that simply re-applying won't work because the goal here is to restore the clusterIP
in each service and this field is immutable in Service
resources.rook-ceph-mon-endpoints
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas 1
When the rook-ceph namespace is accidentally deleted, the good news is that the cluster can be restored. With the content in the directory dataDirHostPath
and the original OSD disks, the ceph cluster could be restored with this guide.
You need to manually create a ConfigMap and a Secret to make it work. The information required for the ConfigMap and Secret can be found in the dataDirHostPath
directory.
The first resource is the secret named rook-ceph-mon
as seen in this example below:
apiVersion: v1
data:
ceph-secret: QVFCZ0h6VmorcVNhSGhBQXVtVktNcjcrczNOWW9Oa2psYkErS0E9PQ==
ceph-username: Y2xpZW50LmFkbWlu
fsid: M2YyNzE4NDEtNjE4OC00N2MxLWIzZmQtOTBmZDRmOTc4Yzc2
mon-secret: QVFCZ0h6VmorcVNhSGhBQXVtVktNcjcrczNOWW9Oa2psYkErS0E9PQ==
kind: Secret
metadata:
finalizers:
- ceph.rook.io/disaster-protection
name: rook-ceph-mon
namespace: rook-ceph
ownerReferences: null
type: kubernetes.io/rook
The values for the secret can be found in $dataDirHostPath/rook-ceph/client.admin.keyring
and $dataDirHostPath/rook-ceph/rook-ceph.config
.
ceph-secret
and mon-secret
are to be filled with the client.admin
's keyring contents.ceph-username
: set to the string client.admin
fsid
: set to the original ceph cluster id.All the fields in data section need to be encoded in base64. Coding could be done like this:
echo -n "string to code" | base64 -i -
Now save the secret as rook-ceph-mon.yaml
, to be created later in the restore.
The second resource is the configmap named rook-ceph-mon-endpoints as seen in this example below:
apiVersion: v1
data:
csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["169.169.241.153:6789","169.169.82.57:6789","169.169.7.81:6789"],"namespace":""}]'
data: k=169.169.241.153:6789,m=169.169.82.57:6789,o=169.169.7.81:6789
mapping: '{"node":{"k":{"Name":"10.138.55.111","Hostname":"10.138.55.111","Address":"10.138.55.111"},"m":{"Name":"10.138.55.120","Hostname":"10.138.55.120","Address":"10.138.55.120"},"o":{"Name":"10.138.55.112","Hostname":"10.138.55.112","Address":"10.138.55.112"}}}'
maxMonId: "15"
kind: ConfigMap
metadata:
finalizers:
- ceph.rook.io/disaster-protection
name: rook-ceph-mon-endpoints
namespace: rook-ceph
ownerReferences: null
The Monitor's service IPs are kept in the monitor data store and you need to create them by original ones. After you create this configmap with the original service IPs, the rook operator will create the correct services for you with IPs matching in the monitor data store. Along with monitor ids, their service IPs and mapping relationship of them can be found in dataDirHostPath/rook-ceph/rook-ceph.config, for example:
[global]
fsid = 3f271841-6188-47c1-b3fd-90fd4f978c76
mon initial members = m o k
mon host = [v2:169.169.82.57:3300,v1:169.169.82.57:6789],[v2:169.169.7.81:3300,v1:169.169.7.81:6789],[v2:169.169.241.153:3300,v1:169.169.241.153:6789]
mon initial members
and mon host
are holding sequences of monitors' id and IP respectively; the sequence are going in the same order among monitors as a result you can tell which monitors have which service IP addresses. Modify your rook-ceph-mon-endpoints.yaml
on fields csi-cluster-config-json
and data
based on the understanding of rook-ceph.config
above.
The field mapping
tells rook where to schedule monitor's pods. you could search in dataDirHostPath
in all Ceph cluster hosts for mon-m,mon-o,mon-k
. If you find mon-m
in host 10.138.55.120
, you should fill 10.138.55.120
in field mapping
for m
. Others are the same.
Update the maxMonId
to be the max numeric ID of the highest monitor ID. For example, 15 is the 0-based ID for mon o
.
Now save this configmap in the file rook-ceph-mon-endpoints.yaml, to be created later in the restore.
Now that you have the info for the secret and the configmap, you are ready to restore the running cluster.
Deploy Rook Ceph using the YAML files or Helm, with the same settings you had previously.
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
After the operator is running, create the configmap and secret you have just crafted:
kubectl create -f rook-ceph-mon.yaml -f rook-ceph-mon-endpoints.yaml
Create your Ceph cluster CR (if possible, with the same settings as existed previously):
kubectl create -f cluster.yaml
Now your Rook Ceph cluster should be running again.