FUSE

HDFS contains a native FUSE driver/application, enabling you to mout an existing HDFS filesystem into a Linux environment.

Stackable images of Apache Hadoop contain the necessary binaries and libraries to use the HDFS FUSE driver. To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod. This Pod, however, needs some extra capabilities.

This is an example Pod that works as long as the host system that is running the kubelet does support FUSE:

apiVersion: v1
kind: Pod
metadata:
  name: hdfs-fuse
spec:
  containers:
    - name: hdfs-fuse
      env:
        - name: HADOOP_CONF_DIR
          value: /stackable/conf/hdfs
      image: docker.stackable.tech/stackable/hadoop:<version> (1)
      imagePullPolicy: Always
      securityContext:
        privileged: true
      command:
        - tail
        - -f
        - /dev/null
      volumeMounts:
        - mountPath: /stackable/conf/hdfs
          name: hdfs-config
  volumes:
    - name: hdfs-config
      configMap:
        name: <your hdfs here> (2)
1 Ideally use the same version your HDFS is using. Stackable HDFS images contain the FUSE driver since 23.11.
2 This needs to be a reference to a discovery ConfigMap as written by the HDFS operator.
Privileged Pods

Instead of privileged it might work to only add the capability SYS_ADMIN, our tests showed that this sometimes works and sometimes doesn’t, depending on the environment.

Example
securityContext:
  capabilities:
    add:
      - SYS_ADMIN

Unfortunately, there is no way around some extra privileges. In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE needs access to the underlying Kernel modules.

Inside this Pod you can get a shell (e.g. using kubectl exec --stdin --tty hdfs-fuse — /bin/bash) to get access to a script called fuse_dfs_wrapper (it is in the PATH of our Hadoop images).

To see the available options, call the script without any parameters.

To mount HDFS call the script like this:

fuse_dfs_wrapper dfs://<your hdfs> <target> (1) (2)

# This runs in debug mode and stays in the foreground
fuse_dfs_wrapper -odebug dfs://<your hdfs> <target>

# Example:
mkdir simple-hdfs
fuse_dfs_wrapper dfs://simple-hdfs simple-hdfs
cd simple-hdfs
# Any operations in this directory now happens in HDFS
1 Again, use the name of the HDFS service as above
2 target is the directory in which HDFS is mounted, it must exist otherwise this command fails