Connect Jetson Nano and K8S in 15 minutes to easily build a machine learning cluster

Posted by bacil on Thu, 03 Mar 2022 13:31:56 +0100

In this article, I will show how to connect the Jetson Nano development board to the Kubernetes cluster as a GPU node. I will introduce the NVIDIA docker settings required to run the container using GPU and connect Jetson to the Kubernetes cluster. After successfully connecting the nodes to the cluster, I will also show how to run a simple TensorFlow 2 training session using a GPU on the Jetson Nano.

K3s or K8s?

K3s is a lightweight Kubernetes distribution with a size of no more than 100MB. In my opinion, it is an ideal choice for a single board computer because it requires significantly less resources. You can check out our previous articles to learn more about k3s tutorials and ecology. In the k3s ecosystem, there is an open source tool K3sup that has to be mentioned. It is developed by Alex Ellis to simplify the installation of k3s clusters. You can visit Github to learn about this tool:
https://github.com/alexellis/k3sup

What do we need to prepare?

A K3s Cluster - only one correctly configured master node is required
NVIDIA Jetson Nano development board and install the developer kit

If you want to know how to install the developer suite on the development board, you can check the following documents:
https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit#write

K3sup
15 minutes

Planning steps

Set NVIDIA docker
Add Jetson Nano to K3s cluster
Run a simple MNIST example to show the use of GPU in Kubernetes pod

Set NVIDIA docker

Before we configure Docker to use NVIDIA Docker as the default runtime, I need to explain why. By default, when the user runs the container on the Jetson Nano, the operation mode is the same as that of other hardware devices. You cannot access the GPU from the container, at least in the absence of hacker attacks. If you want to test it yourself, you can run the following command and you should see similar results:

 1. root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run -i
    icetekio/jetson-nano-tensorflow /bin/bash
 2. 2020-05-14 00:10:23.370761: W
    tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could
    not load dynamic library 'libcudart.so.10.2'; dlerror:
    libcudart.so.10.2: cannot open shared object file: No such file or
    directory; LD_LIBRARY_PATH:
    /usr/local/cuda-10.2/targets/aarch64-linux/lib:
 3. 2020-05-14 00:10:23.370859: I
    tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above
    cudart dlerror if you do not have a GPU set up on your machine.
 4. 2020-05-14 00:10:25.946896: W
    tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could
    not load dynamic library 'libnvinfer.so.7'; dlerror:
    libnvinfer.so.7: cannot open shared object file: No such file or
    directory; LD_LIBRARY_PATH:
    /usr/local/cuda-10.2/targets/aarch64-linux/lib:
 5. 2020-05-14 00:10:25.947219: W
    tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could
    not load dynamic library 'libnvinfer_plugin.so.7'; dlerror:
    libnvinfer_plugin.so.7: cannot open shared object file: No such file
    or directory; LD_LIBRARY_PATH:
    /usr/local/cuda-10.2/targets/aarch64-linux/lib:
 6. 2020-05-14 00:10:25.947273: W
    tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen
    some TensorRT libraries. If you would like to use Nvidia GPU with
    TensorRT, please make sure the missing libraries mentioned above are
    installed properly.
 7. /usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning:
    Conversion of the second argument of issubdtype from `float` to
    `np.floating` is deprecated. In future, it will be treated as
    `np.float64 == np.dtype(float).type`.
 8. from ._conv import register_converters as _register_converters

If you try to run the same command now, but add the * * - runtime=nvidia * * parameter to the docker command, you should see something similar to the following:

 1. root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run
    --runtime=nvidia -i icetekio/jetson-nano-tensorflow /bin/bash
 2. 2020-05-14 00:12:16.767624: I
    tensorflow/stream_executor/platform/default/dso_loader.cc:48]
    Successfully opened dynamic library libcudart.so.10.2
 3. 2020-05-14 00:12:19.386354: I
    tensorflow/stream_executor/platform/default/dso_loader.cc:48]
    Successfully opened dynamic library libnvinfer.so.7
 4. 2020-05-14 00:12:19.388700: I
    tensorflow/stream_executor/platform/default/dso_loader.cc:48]
    Successfully opened dynamic library libnvinfer_plugin.so.7
 5. /usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning:
    Conversion of the second argument of issubdtype from `float` to
    `np.floating` is deprecated. In future, it will be treated as
    `np.float64 == np.dtype(float).type`.
 6. from ._conv import register_converters as _register_converters

nvidia docker has been configured, but it is not enabled by default. To enable docker to run nvidia docker runtime as the default value, you need to add * * "default runtime": "nvidia" * * to / etc / docker / daemon JSON configuration file, as follows:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

Now you can skip the * * - runtime=nvidia * * parameter in the docker run command, and the GPU will be initialized by default. In this way, K3s will use the NVIDIA Docker runtime to use Docker, so that Pod can use GPU without any special configuration.

Connect Jetson as K8S node

Using K3sup to connect Jetson as a Kubernetes node requires only one command. However, in order to successfully connect Jetson and master nodes, we need to be able to connect to both Jetson and master nodes without a password, do sudo without a password, or connect as root user.

If you need to generate SSH keys and copy them, you need to run the following command:

 1. ssh-keygen -t rsa -b 4096 -f ~/.ssh/rpi -P ""
 2. ssh-copy-id -i .ssh/rpi user@host

By default, Ubuntu installation requires users to enter a password when using sudo command, so a simpler way is to use K3sup with root account. To make this method effective, you need to put your * * ~ / ssh/authorized_ Copy keys to / root / SSH / * * directory.

Before connecting to Jetson, let's take a look at the cluster we want to connect to:

 1. upgrade@ZeroOne:~$ kubectl get node -o wide
 2. NAME      STATUS   ROLES    AGE   VERSION        INTERNAL-IP   
    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     
    CONTAINER-RUNTIME
 3. nexus     Ready    master   32d   v1.17.2+k3s1   192.168.0.12  
    <none>        Ubuntu 18.04.4 LTS   4.15.0-96-generic  
    containerd://1.3.3-k3s1
 4. rpi3-32   Ready    <none>   32d   v1.17.2+k3s1   192.168.0.30  
    <none>        Ubuntu 18.04.4 LTS   5.3.0-1022-raspi2  
    containerd://1.3.3-k3s1
 5. rpi3-64   Ready    <none>   32d   v1.17.2+k3s1   192.168.0.32  
    <none>        Ubuntu 18.04.4 LTS   5.3.0-1022-raspi2  
    containerd://1.3.3-k3s1

You may notice that the master node is a nexus host with IP 192.168.0.12. It is running containerd. By default, k3s will use containerd as the runtime, but this can be modified. Since we set NVIDIA docker to run together with docker, we need to modify containerd. Don't worry. Change containerd to docker. We just need to pass an additional parameter to the k3sup command. Therefore, you can connect Jetson to the cluster by running the following command:

 1. k3sup join --ssh-key ~/.ssh/rpi  --server-ip 192.168.0.12  --ip
    192.168.0.40   --k3s-extra-args '--docker'

IP 192.168.0.40 is my Jetson Nano. As you can see, we passed the * * – k3s extra args' – docker 'flag. When installing the k3s agent, we passed the – docker * * flag to it. Thanks to this, we use the docker set by NVIDIA docker instead of containerd.

To check whether the nodes are connected correctly, we can run kubectl get node -o wide:

 1. upgrade@ZeroOne:~$ kubectl get node -o wide
 2. NAME      STATUS   ROLES    AGE   VERSION        INTERNAL-IP   
    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     
    CONTAINER-RUNTIME
 3. nexus     Ready    master   32d   v1.17.2+k3s1   192.168.0.12  
    <none>        Ubuntu 18.04.4 LTS   4.15.0-96-generic  
    containerd://1.3.3-k3s1
 4. rpi3-32   Ready    <none>   32d   v1.17.2+k3s1   192.168.0.30  
    <none>        Ubuntu 18.04.4 LTS   5.3.0-1022-raspi2  
    containerd://1.3.3-k3s1
 5. rpi3-64   Ready    <none>   32d   v1.17.2+k3s1   192.168.0.32  
    <none>        Ubuntu 18.04.4 LTS   5.3.0-1022-raspi2  
    containerd://1.3.3-k3s1
 6. jetson    Ready    <none>   11s   v1.17.2+k3s1   192.168.0.40  
    <none>        Ubuntu 18.04.4 LTS   4.9.140-tegra      
    docker://19.3.6

Simple verification

We can now run pod with the same docker image and command to check whether there will be the same results as running docker on Jetson Nano at the beginning of this article. To do this, we can apply this pod specification:

 1. apiVersion: v1
 2. kind: Pod
 3. metadata:
  
 4. name: gpu-test
 5. spec:
 
 6. nodeSelector:
    
 7. kubernetes.io/hostname: jetson
 
 8. containers:
 9. image: icetekio/jetson-nano-tensorflow
   
 10. name: gpu-test
   
 11. command:
    - 
 12. "/bin/bash"
    - 
 13. "-c"
    - 
 14. "echo 'import tensorflow' | python3"
 15. restartPolicy: Never

Wait for the docker image to be pulled, and then view the log by running the following command:

1. upgrade@ZeroOne:~$ kubectl logs gpu-test
 2. 2020-05-14 10:01:51.341661: I
    tensorflow/stream_executor/platform/default/dso_loader.cc:48]
    Successfully opened dynamic library libcudart.so.10.2
 3. 2020-05-14 10:01:53.996300: I
    tensorflow/stream_executor/platform/default/dso_loader.cc:48]
    Successfully opened dynamic library libnvinfer.so.7
 4. 2020-05-14 10:01:53.998563: I
    tensorflow/stream_executor/platform/default/dso_loader.cc:48]
    Successfully opened dynamic library libnvinfer_plugin.so.7
 5. /usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning:
    Conversion of the second argument of issubdtype from `float` to
    `np.floating` is deprecated. In future, it will be treated as
    `np.float64 == np.dtype(float).type`.
  
 6. from ._conv import register_converters as _register_converters

As you can see, our log information is similar to running Docker on Jetson before.

Run MNIST training

We have a running node that supports GPU, so now we can test the "Hello world" of machine learning and run TensorFlow 2 model example using MNIST dataset.

To run a simple training session to prove the usage of GPU, apply the following manifest:

 1. apiVersion: v1
 2. kind: Pod
 3. metadata:
 
 4. name: mnist-training
 5. spec:
 6. nodeSelector:
  
 7. kubernetes.io/hostname: jetson
 
 8. initContainers:
    - 
 9. name: git-clone
      
 10. image: iceci/utils
    
 11. command:
        - 
 12. "git"
        - 
 13. "clone"
    
 14. - "<https://github.com/IceCI/example-mnist-training.git>"
        - 
 15. "/workspace"
    
 16. volumeMounts:
        - 
 17. mountPath: /workspace
     
 18. name: workspace
 19. containers:
    - 
 20. image: icetekio/jetson-nano-tensorflow
   
 21. name: mnist
   
 22. command:
       - 
 23. "python3"
        - 
 24. "/workspace/mnist.py"
     
 25. volumeMounts:
        - 
 26. mountPath: /workspace
      
 27. name: workspace
 
 28. restartPolicy: Never
 29. volumes:
    - 
 30. name: workspace
 
 31. emptyDir: {}

As can be seen from the following logs, the GPU is running:

 1. ...
 2. 2020-05-14 11:30:02.846289: I
    tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding
    visible gpu devices: 0
 3. 2020-05-14 11:30:02.846434: I
    tensorflow/stream_executor/platform/default/dso_loader.cc:48]
    Successfully opened dynamic library libcudart.so.10.2
 4. ....

If you are on a node, you can test the CPU and GPU usage by running the tegrastats command:

1. upgrade@jetson:~$ tegrastats --interval 5000
 2. RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU
    [52%@1479,41%@1479,43%@1479,34%@1479] EMC_FREQ 0% GR3D_FREQ 9%
    PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@25C POM_5V_IN
    3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1355/1355
 3. RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU
    [53%@1479,42%@1479,45%@1479,35%@1479] EMC_FREQ 0% GR3D_FREQ 9%
    PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@24.75C
    POM_5V_IN 3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1353/1354
 4. RAM 2461/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU
    [52%@1479,38%@1479,43%@1479,33%@1479] EMC_FREQ 0% GR3D_FREQ 10%
    PLL@24C CPU@26C PMIC@100C GPU@24C AO@29C thermal@25.25C POM_5V_IN
    3410/3410 POM_5V_GPU 493/465 POM_5V_CPU 1314/1340

Summary

As you can see, connecting the Jetson Nano to the Kubernetes cluster is a very simple process. In just a few minutes, you can use Kubernetes to run machine learning workloads - and take advantage of the power of NVIDIA's pocket GPU. You will be able to run any GPU container designed for the Jetson Nano on Kubernetes, which will simplify your development and testing.

Author: Jakub Czapli ń ski, Icetek editor
Original link:
https://medium.com/icetek/how-to-connect-jetson-nano-to-kubernetes-using-k3s-and-k3sup-c715cf2bf212

Topics: Kubernetes

Programmer Think