In this article, I will show how to connect the Jetson Nano development board to the Kubernetes cluster as a GPU node. I will introduce the NVIDIA docker settings required to run the container using GPU and connect Jetson to the Kubernetes cluster. After successfully connecting the nodes to the cluster, I will also show how to run a simple TensorFlow 2 training session using a GPU on the Jetson Nano.
K3s or K8s?
K3s is a lightweight Kubernetes distribution with a size of no more than 100MB. In my opinion, it is an ideal choice for a single board computer because it requires significantly less resources. You can check out our previous articles to learn more about k3s tutorials and ecology. In the k3s ecosystem, there is an open source tool K3sup that has to be mentioned. It is developed by Alex Ellis to simplify the installation of k3s clusters. You can visit Github to learn about this tool:
https://github.com/alexellis/k3sup
What do we need to prepare?
- A K3s Cluster - only one correctly configured master node is required
- NVIDIA Jetson Nano development board and install the developer kit
If you want to know how to install the developer suite on the development board, you can check the following documents:
https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit#write
- K3sup
- 15 minutes
Planning steps
- Set NVIDIA docker
- Add Jetson Nano to K3s cluster
- Run a simple MNIST example to show the use of GPU in Kubernetes pod
Set NVIDIA docker
Before we configure Docker to use NVIDIA Docker as the default runtime, I need to explain why. By default, when the user runs the container on the Jetson Nano, the operation mode is the same as that of other hardware devices. You cannot access the GPU from the container, at least in the absence of hacker attacks. If you want to test it yourself, you can run the following command and you should see similar results:
1. root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run -i icetekio/jetson-nano-tensorflow /bin/bash 2. 2020-05-14 00:10:23.370761: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.2'; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib: 3. 2020-05-14 00:10:23.370859: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 4. 2020-05-14 00:10:25.946896: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib: 5. 2020-05-14 00:10:25.947219: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib: 6. 2020-05-14 00:10:25.947273: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 7. /usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. 8. from ._conv import register_converters as _register_converters
If you try to run the same command now, but add the * * - runtime=nvidia * * parameter to the docker command, you should see something similar to the following:
1. root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run --runtime=nvidia -i icetekio/jetson-nano-tensorflow /bin/bash 2. 2020-05-14 00:12:16.767624: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2 3. 2020-05-14 00:12:19.386354: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7 4. 2020-05-14 00:12:19.388700: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7 5. /usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. 6. from ._conv import register_converters as _register_converters
nvidia docker has been configured, but it is not enabled by default. To enable docker to run nvidia docker runtime as the default value, you need to add * * "default runtime": "nvidia" * * to / etc / docker / daemon JSON configuration file, as follows:
{ "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia" }
Now you can skip the * * - runtime=nvidia * * parameter in the docker run command, and the GPU will be initialized by default. In this way, K3s will use the NVIDIA Docker runtime to use Docker, so that Pod can use GPU without any special configuration.
Connect Jetson as K8S node
Using K3sup to connect Jetson as a Kubernetes node requires only one command. However, in order to successfully connect Jetson and master nodes, we need to be able to connect to both Jetson and master nodes without a password, do sudo without a password, or connect as root user.
If you need to generate SSH keys and copy them, you need to run the following command:
1. ssh-keygen -t rsa -b 4096 -f ~/.ssh/rpi -P "" 2. ssh-copy-id -i .ssh/rpi user@host
By default, Ubuntu installation requires users to enter a password when using sudo command, so a simpler way is to use K3sup with root account. To make this method effective, you need to put your * * ~ / ssh/authorized_ Copy keys to / root / SSH / * * directory.
Before connecting to Jetson, let's take a look at the cluster we want to connect to:
1. upgrade@ZeroOne:~$ kubectl get node -o wide 2. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 3. nexus Ready master 32d v1.17.2+k3s1 192.168.0.12 <none> Ubuntu 18.04.4 LTS 4.15.0-96-generic containerd://1.3.3-k3s1 4. rpi3-32 Ready <none> 32d v1.17.2+k3s1 192.168.0.30 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1 5. rpi3-64 Ready <none> 32d v1.17.2+k3s1 192.168.0.32 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1
You may notice that the master node is a nexus host with IP 192.168.0.12. It is running containerd. By default, k3s will use containerd as the runtime, but this can be modified. Since we set NVIDIA docker to run together with docker, we need to modify containerd. Don't worry. Change containerd to docker. We just need to pass an additional parameter to the k3sup command. Therefore, you can connect Jetson to the cluster by running the following command:
1. k3sup join --ssh-key ~/.ssh/rpi --server-ip 192.168.0.12 --ip 192.168.0.40 --k3s-extra-args '--docker'
IP 192.168.0.40 is my Jetson Nano. As you can see, we passed the * * – k3s extra args' – docker 'flag. When installing the k3s agent, we passed the – docker * * flag to it. Thanks to this, we use the docker set by NVIDIA docker instead of containerd.
To check whether the nodes are connected correctly, we can run kubectl get node -o wide:
1. upgrade@ZeroOne:~$ kubectl get node -o wide 2. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 3. nexus Ready master 32d v1.17.2+k3s1 192.168.0.12 <none> Ubuntu 18.04.4 LTS 4.15.0-96-generic containerd://1.3.3-k3s1 4. rpi3-32 Ready <none> 32d v1.17.2+k3s1 192.168.0.30 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1 5. rpi3-64 Ready <none> 32d v1.17.2+k3s1 192.168.0.32 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1 6. jetson Ready <none> 11s v1.17.2+k3s1 192.168.0.40 <none> Ubuntu 18.04.4 LTS 4.9.140-tegra docker://19.3.6
Simple verification
We can now run pod with the same docker image and command to check whether there will be the same results as running docker on Jetson Nano at the beginning of this article. To do this, we can apply this pod specification:
1. apiVersion: v1 2. kind: Pod 3. metadata: 4. name: gpu-test 5. spec: 6. nodeSelector: 7. kubernetes.io/hostname: jetson 8. containers: 9. image: icetekio/jetson-nano-tensorflow 10. name: gpu-test 11. command: - 12. "/bin/bash" - 13. "-c" - 14. "echo 'import tensorflow' | python3" 15. restartPolicy: Never
Wait for the docker image to be pulled, and then view the log by running the following command:
1. upgrade@ZeroOne:~$ kubectl logs gpu-test 2. 2020-05-14 10:01:51.341661: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2 3. 2020-05-14 10:01:53.996300: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7 4. 2020-05-14 10:01:53.998563: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7 5. /usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. 6. from ._conv import register_converters as _register_converters
As you can see, our log information is similar to running Docker on Jetson before.
Run MNIST training
We have a running node that supports GPU, so now we can test the "Hello world" of machine learning and run TensorFlow 2 model example using MNIST dataset.
To run a simple training session to prove the usage of GPU, apply the following manifest:
1. apiVersion: v1 2. kind: Pod 3. metadata: 4. name: mnist-training 5. spec: 6. nodeSelector: 7. kubernetes.io/hostname: jetson 8. initContainers: - 9. name: git-clone 10. image: iceci/utils 11. command: - 12. "git" - 13. "clone" 14. - "<https://github.com/IceCI/example-mnist-training.git>" - 15. "/workspace" 16. volumeMounts: - 17. mountPath: /workspace 18. name: workspace 19. containers: - 20. image: icetekio/jetson-nano-tensorflow 21. name: mnist 22. command: - 23. "python3" - 24. "/workspace/mnist.py" 25. volumeMounts: - 26. mountPath: /workspace 27. name: workspace 28. restartPolicy: Never 29. volumes: - 30. name: workspace 31. emptyDir: {}
As can be seen from the following logs, the GPU is running:
1. ... 2. 2020-05-14 11:30:02.846289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0 3. 2020-05-14 11:30:02.846434: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2 4. ....
If you are on a node, you can test the CPU and GPU usage by running the tegrastats command:
1. upgrade@jetson:~$ tegrastats --interval 5000 2. RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [52%@1479,41%@1479,43%@1479,34%@1479] EMC_FREQ 0% GR3D_FREQ 9% PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@25C POM_5V_IN 3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1355/1355 3. RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [53%@1479,42%@1479,45%@1479,35%@1479] EMC_FREQ 0% GR3D_FREQ 9% PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@24.75C POM_5V_IN 3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1353/1354 4. RAM 2461/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [52%@1479,38%@1479,43%@1479,33%@1479] EMC_FREQ 0% GR3D_FREQ 10% PLL@24C CPU@26C PMIC@100C GPU@24C AO@29C thermal@25.25C POM_5V_IN 3410/3410 POM_5V_GPU 493/465 POM_5V_CPU 1314/1340
Summary
As you can see, connecting the Jetson Nano to the Kubernetes cluster is a very simple process. In just a few minutes, you can use Kubernetes to run machine learning workloads - and take advantage of the power of NVIDIA's pocket GPU. You will be able to run any GPU container designed for the Jetson Nano on Kubernetes, which will simplify your development and testing.
Author: Jakub Czapli ń ski, Icetek editor
Original link:
https://medium.com/icetek/how-to-connect-jetson-nano-to-kubernetes-using-k3s-and-k3sup-c715cf2bf212