FUSE
HDFS contains a native FUSE driver/application, enabling you to mout an existing HDFS filesystem into a Linux environment.
Stackable images of Apache Hadoop contain the necessary binaries and libraries to use the HDFS FUSE driver. To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod. This Pod, however, needs some extra capabilities.
This is an example Pod that works as long as the host system that is running the kubelet does support FUSE:
apiVersion: v1
kind: Pod
metadata:
name: hdfs-fuse
spec:
containers:
- name: hdfs-fuse
env:
- name: HADOOP_CONF_DIR
value: /stackable/conf/hdfs
image: docker.stackable.tech/stackable/hadoop:<version> (1)
imagePullPolicy: Always
securityContext:
privileged: true
command:
- tail
- -f
- /dev/null
volumeMounts:
- mountPath: /stackable/conf/hdfs
name: hdfs-config
volumes:
- name: hdfs-config
configMap:
name: <your hdfs here> (2)
1 | Ideally use the same version your HDFS is using. Stackable HDFS images contain the FUSE driver since 23.11. |
2 | This needs to be a reference to a discovery ConfigMap as written by the HDFS operator. |
Privileged Pods
Instead of Example
Unfortunately, there is no way around some extra privileges. In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE needs access to the underlying Kernel modules. |
Inside this Pod you can get a shell (e.g. using kubectl exec --stdin --tty hdfs-fuse — /bin/bash
) to get access to a script called fuse_dfs_wrapper
(it is in the PATH
of our Hadoop images).
To see the available options, call the script without any parameters.
To mount HDFS call the script like this:
fuse_dfs_wrapper dfs://<your hdfs> <target> (1) (2)
# This runs in debug mode and stays in the foreground
fuse_dfs_wrapper -odebug dfs://<your hdfs> <target>
# Example:
mkdir simple-hdfs
fuse_dfs_wrapper dfs://simple-hdfs simple-hdfs
cd simple-hdfs
# Any operations in this directory now happens in HDFS
1 | Again, use the name of the HDFS service as above |
2 | target is the directory in which HDFS is mounted, it must exist otherwise this command fails |