NFS-Ganesha is a user space NFS server that is well integrated with CephFS and RGW backends. It can export Ceph's filesystem namespaces and Object gateway namespaces over NFSv4 protocol.
Rook already orchestrates Ceph filesystem and Object store (or RGW) on Kubernetes (k8s). It can be extended to orchestrate NFS-Ganesha server daemons as highly available and scalable NFS gateway pods to the Ceph filesystem and Object Store. This will allow NFS client applications to use the Ceph filesystem and object store setup by rook.
This feature mainly differs from the feature to add NFS as an another storage backend for rook (the general NFS solution) in the following ways:
It will use the rook's Ceph operator and not a separate NFS operator to deploy the NFS server pods.
The NFS server pods will be directly configured with CephFS or RGW backend setup by rook, and will not require CephFS or RGW to be mounted in the NFS server pod with a PVC.
The NFS-Ganesha server settings will be exposed to Rook as a Custom Resource Definition (CRD). Creating the nfs-ganesha CRD will launch a cluster of NFS-Ganesha server pods that will be configured with no exports.
The NFS client recovery data will be stored in a Ceph RADOS pool; and the servers will have stable IP addresses by using k8s Service. Export management will be done by updating a per-pod config file object in RADOS by external tools and issuing dbus commands to the server to reread the configuration.
This allows the NFS-Ganesha server cluster to be scalable and highly available.
A running rook Ceph filesystem or object store, whose namespaces will be exported by the NFS-Ganesha server cluster. e.g.,
kubectl create -f deploy/examples/filesystem.yaml
An existing RADOS pool (e.g., CephFS's data pool) or a pool created with a Ceph Pool CRD to store NFS client recovery data.
/etc/sssd/conf.d/*
; they must use /etc/sssd/sssd.conf
. Newer versions support
either method./etc/sssd/sssd.conf
method. This may reduce some configurability, but it is much simpler
to document for users. For an option that is already complex, the simplicity here is a value.sssd.conf
via a VolumeMount subPath
, which is how Rook
will mount the file into the SSSD sidecar.Below is an example NFS-Ganesha CRD, nfs-ganesha.yaml
apiVersion: ceph.rook.io/v1
kind: CephNFS
metadata:
# The name of Ganesha server cluster to create. It will be reflected in
# the name(s) of the ganesha server pod(s)
name: mynfs
# The namespace of the Rook cluster where the Ganesha server cluster is
# created.
namespace: rook-ceph
spec:
# Settings for the ganesha server
server:
# the number of active ganesha servers
active: 3
# where to run the nfs ganesha server
placement:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - mds-node
# tolerations:
# - key: mds-node
# operator: Exists
# podAffinity:
# podAntiAffinity:
# The requests and limits set here allow the ganesha pod(s) to use half of
# one CPU core and 1 gigabyte of memory
resources:
# limits:
# cpu: "3"
# memory: "8Gi"
# requests:
# cpu: "3"
# memory: "8Gi"
# the priority class to set to influence the scheduler's pod preemption
priorityClassName:
security:
sssd:
sidecar:
image: registry.access.redhat.com/rhel7/sssd:latest
sssdConfigFile:
volumeSource: # any standard kubernetes volume source
# example
configMap:
name: rook-ceph-nfs-organization-sssd-config
defaultMode: 0600 # mode must be 0600
additionalFiles:
- subPath: some-dir
volumeSource:
# example
configMap:
name: rook-ceph-nfs-organization-sssd-ca-bundle
defaultMode: 0600 # mode must be 0600 for CA certs
resources:
# requests:
# cpu: "2"
# memory: "1024Mi"
# limits:
# cpu: "2"
# memory: "1024Mi"
kerberos:
principalName: nfs
configFiles:
volumeSource:
configMap:
name: rook-ceph-nfs-organization-krb-conf
keytabFile:
volumeSource:
secret:
secretName: rook-ceph-nfs-organization-keytab
defaultMode: 0600 # required
When the nfs-ganesha.yaml is created the following will happen:
Rook's Ceph operator sees the creation of the NFS-Ganesha CRD.
The operator creates as many k8s Deployments as the number of active Ganesha servers mentioned in the CRD. Each deployment brings up a Ganesha server pod, a replicaset of size 1.
The ganesha servers, each running in a separate pod, use a mostly-identical ganesha config (ganesha.conf) with no EXPORT definitions. The end of the file will have it do a %url include on a pod-specific RADOS object from which it reads the rest of its config.
The operator creates a k8s service for each of the ganesha server pods to allow each of the them to have a stable IP address.
The ganesha server pods constitute an active-active high availability NFS
server cluster. If one of the active Ganesha server pods goes down, k8s brings
up a replacement ganesha server pod with the same configuration and IP address.
The NFS server cluster can be scaled up or down by updating the
number of the active Ganesha servers in the CRD (using kubectl edit
or
modifying the original CRD and running kubectl apply -f <CRD yaml file>
).
After loading the basic ganesha config from inside the container, the node will read the rest of its config from an object in RADOS. This allows external tools to generate EXPORT definitions for ganesha.
The object will be named "conf-.", where metadata.name
is taken from the CRD and the index is internally generated. It will be stored
in rados.pool
and rados.namespace
from the above CRD.
An external consumer will fetch the ganesha server IPs by querying the k8s services of the Ganesha server pods. It should have network access to the Ganesha pods to manually mount the shares using a NFS client. Later, support will be added to allow user pods to easily consume the NFS shares via PVCs.
The NFS shares exported by rook's ganesha server pods can be consumed by OpenStack cloud's user VMs. To do this, OpenStack's shared file system service, Manila will provision NFS shares backed by CephFS using rook. Manila's CephFS driver will create NFS-Ganesha CRDs to launch ganesha server pods. The driver will dynamically add or remove exports of the ganesha server pods based on OpenStack users' requests. The OpenStack user VMs will have network connectivity to the ganesha server pods, and manually mount the shares using NFS clients.
NFS-Ganesha requires DBus. Run DBus as a sidecar container so that it can be restarted if the
process fails. The /run/dbus
directory must be shared between Ganesha and DBus.
SSSD is able to provide user ID mapping to NFS-Ganesha. It can integrate with LDAP, Active Directory, and FreeIPA.
Prototype information detailed on Rook blog: https://blog.rook.io/prototyping-an-nfs-connection-to-ldap-using-sssd-7c27f624f1a4
NFS-Ganesha (via libraries within its container) is the client to SSSD. As of Ceph v17.2.3, the Ceph
container image does not have the sssd-client
package installed which is required for supporting
SSSD. It is available starting from Ceph v17.2.4.
The following directories must be shared between SSSD and the NFS-Ganesha container:
/var/lib/sss/pipes
: this directory holds the sockets used to communicate between client and SSSD/var/lib/sss/mc
: this is a memory-mapped "L0" cache shared between client and SSSDThe following directories should not be shared between SSSD and other containers:
/var/lib/sss/db
: this is a memory-mapped "L1" cache that is intended to survive reboots/run/dbus
: using the DBus instance from the sidecar caused SSSD errors in testing. SSSD only
uses DBus for internal communications and creates its own socket as needed.Kerberos is the authentication mechanism natively supported by NFS-Ganesha.
The Kerberos service principal used by NFS-Ganesha to authenticate with the Kerberos server is built up from 3 components:
spec.security.kerberos.principalName
in Rook) that acts as the
service namegetaddrinfo()
)The full service principal name is constructed as <principalName>/<hostname>@<realm>
.
Users must add this service principal to their Kerberos server configuration. Therefore, this principal must be static, and for the benefit of users it should be deterministic.
The hostname of Kubernetes pods is the partly-random name of the pod by default. In order to give
NFS-Ganesha server pods a deterministic hostname, the hostname
field of the pod spec will be set
to the namespace plus name of the CephNFS resource. This also means that all servers will be able to
use the same service principal, which will be valuable for auto-scaling NFS servers in the future.
The principal then becomes easy to construct from known CephNFS fields as
<principalName>/<namespace>-<name>@<realm>
.
Additionally, getaddrinfo()
doesn't return the hostname by default because the default
resolv.conf
in the pod does not check localhost. The pod's DNS config must be updated as shown.
dnsConfig:
searches:
- localhost
Volumes that should be mounted into nfs-ganesha container to support Kerberos:
keytabFile
volume: use subPath
on the mount to add the krb5.keytab
file to /etc/krb5.keytab
configFiles
volume: mount (without subPath
) to /etc/krb5.conf.rook/
to allow all files to
be mounted (e.g., if a ConfigMap has multiple data items or hostPath has multiple conf.d files)Should add configuration. Docs say Active_krb5 is default true if krb support is compiled in, but
most examples have this explicitly set.
Default PrincipalName is "nfs".
Default for keytab path is reportedly empty. Rook can use /etc/krb5.keytab
.
Create a new RADOS object named kerberos
to configure Kerberos.
NFS_KRB5
{
PrincipalName = nfs ; # <-- set from spec.security.kerberos.principalName (or "nfs" if unset)
KeytabPath = /etc/krb5.keytab ;
Active_krb5 = YES ;
}
Add the following line to to the config object (conf-nfs.${nfs-name}
) to reference the new
kerberos
RADOS object. Remove this line from the config object if Kerberos is disabled.
%url "rados://.nfs/${nfs-name}/kerberos"
These steps can be done from the Rook operator.
Rook should take steps to remove any default configuration. This means that it should create its own minimal krb.conf and ensure that any imported directories are empty. While it might be nice for some users to include the default configurations that are present in the container, it is extremely difficult in practice to adequately document the interactions between defaults that may change from time to time in the container image (or between different distros), and users will have to expend mental effort to understand how their configurations may override defaults. Upstream documentation for krb5.conf is unclear about file ordering and override behavior. Therefore, Rook will rely on users to specify nearly all of the configuration which will ensure users are able to easily supply the configuration they require.
The minimal /etc/krb5.conf
file Rook will create is as so:
includedir /etc/krb5.conf.rook/ # include all user-defined config files
[logging]
default = STDERR # only log to stderr by default
Currently the ceph nfs ...
CLI tool is unable to create exports with Kerberos security enabled.
Users must manually add it by modifying the raw RADOS export object. This should be documented.
Example export (with sectype
manually added):
EXPORT {
FSAL {
name = "CEPH";
user_id = "nfs.my-nfs.1";
filesystem = "myfs";
secret_access_key = "AQBsPf1iNXTRKBAAtw+D5VzFeAMV4iqbfI0IBA==";
}
export_id = 1;
path = "/";
pseudo = "/test";
access_type = "RW";
squash = "none";
attr_expiration_time = 0;
security_label = true;
protocols = 4;
transports = "TCP";
sectype = krb5,krb5i,krb5p; # <-- not included in ceph nfs exports by default
}