Understanding linux network namespace

Posted by toivo on Wed, 05 Jan 2022 08:15:46 +0100

network namespace is an important function provided by linux kernel to realize network virtualization. It can create multiple isolated network spaces. The firewall, network card, routing table, neighbor table and protocol stack in an independent network space are independent. Whether it is a virtual machine or a container, when running in a separate namespace, it is like a separate host.

The following will illustrate the network namespace through some examples to deepen understanding. The ip command of iproute2 toolkit will be used. Please install it yourself and operate with root permission

Execute the following command under centos:

yum install iproute2

Verify that the installation is complete:

[root@worker3 ~]# ip help
Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }
       ip [ -force ] -batch filename
where  OBJECT := { link | address | addrlabel | route | rule | neigh | ntable |
                   tunnel | tuntap | maddress | mroute | mrule | monitor | xfrm |
                   netns | l2tp | fou | macsec | tcp_metrics | token | netconf | ila |
                   vrf }
       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |
                    -h[uman-readable] | -iec |
                    -f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | link } |
                    -4 | -6 | -I | -D | -B | -0 |
                    -l[oops] { maximum-addr-flush-attempts } | -br[ief] |
                    -o[neline] | -t[imestamp] | -ts[hort] | -b[atch] [filename] |
                    -rc[vbuf] [size] | -n[etns] name | -a[ll] | -c[olor]}

Create network namespace

The command used to operate the network namespace in the ip command is ip netns. Use help to see which commands are available at a time

[root@worker3 ~]# ip netns help
Usage: ip netns list
       ip netns add NAME
       ip netns set NAME NETNSID
       ip [-all] netns delete [NAME]
       ip netns identify [PID]
       ip netns pids NAME
       ip [-all] netns exec [NAME] cmd ...
       ip netns monitor
       ip netns list-id

The common command is to add, delete and query. First, create a cyberspace ns1

ip netns add ns1

View all current network namespaces

ip netns list
ns1(id:4)

Some people may be confused. There are several docker containers running on my host. It is reasonable to say that each container runs in an independent network namespace. Why is it not listed here? Don't worry, as will be mentioned below.

Let's first feel what an independent network card is and an independent routing table. To view the network card in the ns1 namespace, the iproute2 tool provides the command ip netns exec ns1. The commands following this command will be executed in the network namespace

First check the network card and routing table of the host

[root@worker3 ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:bb:ab:df brd ff:ff:ff:ff:ff:ff

[root@worker3 ~]# ip route
default via 10.57.4.1 dev eth0
10.1.2.0/24 dev br0 proto kernel scope link src 10.1.2.1

Take another look at the network card and routing table in ns1

[root@worker3 ~]# ip netns exec ns1 ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    
[root@worker3 ~]# ip netns exec ns3 ip route
[root@worker3 ~]#

It is a little troublesome to execute the command in this way, but it can also be simpler:

[root@worker3 ~]#ip netns exec ns1 bash
//All commands executed after this command are executed in ns1
[root@worker3 ~]#ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
[root@worker3 ~]#ip route
[root@worker3 ~]#

Use exit to return to the default space of the host

[root@worker3 ~]#exit
[root@worker3 ~]#ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:bb:ab:df brd ff:ff:ff:ff:ff:ff

Principle of ip netns add

When we execute ip netns add ns1 on the host, we actually create a file of ns1 under / var/run/netns

[root@worker3 ~]# ls /var/run/netns
ns1

The following command can simulate IP netns add NS2 & & IP netns exec NS2 Bash

[root@worker3 ~]# touch /var/run/netns/ns2
[root@worker3 ~]# unshare --net bash
[root@worker3 ~]# mount --bin /proc/self/ns/net /var/run/netns/ns2
//The above process actually executes IP netns add NS2 & & IP netns exec NS2 bash

[root@worker3 ~]# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

[root@worker3 ~]# exit
//Exit and return to the host default namespace. Using the ip netns list, you can see ns2
[root@worker3 ~]# ip netns list
ns2
ns1
//If you want to enter ns2 again, there is another way:
[root@worker3 ~]# nsenter --net=/var/run/netns/ns2

As can be seen from the above example, creating a named network namespace is actually creating a file, and then binding the newly created network namespace file with the / proc/self/ns/net file of the process by binding and mounting.

View the network namespace of the container

Next, I should answer the remaining questions above. Why can't I see the network namespace of docker when I am on the host ip netns list, because the ip netns list only displays the files under / var/run/netns, and the docker files are created under / var/run/docker/netns by default, Therefore, we can use ls /var/run/docker/netns to display the network namespaces of all current containers, and enter the network namespaces of containers through nsenter --net=/var/run/docker/xxx

[root@worker3 ~]# ls /var/run/docker/netns
5bbd5f99d403  a2eabf9acccb  b63ec59b3d9e  d6e4ff961713  default
[root@worker3 ~]# nsenter --net=/var/run/docker/netns/b63ec59b3d9e
[root@worker3 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
4: eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether fa:a7:8d:05:03:a6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.0.11/32 scope global eth0
       valid_lft forever preferred_lft forever

If you want to view the file corresponding to a docker container, you can use:

docker inspect $CONTAINER_ID$|grep SandboxKey

Note that if the docker is k8s pulled up, take the pause container that is not hostNetwork=true. If hostNetwork=true, the following value is / var/run/docker/netns/default, which is the default network namespace of the host. If it is not a pause container, the following value is empty, because only the pause container will create a new network namespace, Other containers are just added to this network namespace. (remember this SandboxKey first, and this value will be mentioned later when writing the cni component)

[root@worker3 ~]# docker inspect ebd6855901ef|grep SandboxKey
            "SandboxKey": "/var/run/docker/netns/b63ec59b3d9e",

There is another way:

[root@worker3 ~]# docker inspect nginx|grep Pid
            "Pid": 31817,
            "PidMode": "",
            "PidsLimit": null,
[root@worker3 ~]# mkdir -p /var/run/netns/
[root@worker3 ~]# ln -s /proc/31817/ns/net /var/run/netns/ns100
[root@worker3 ~]# ip netns ls
ns100
ns2
ns1
[root@worker3 ~]# ip netns exec ns100 bash
[root@worker3 ~]# //At this time, it is already in the container network, which is more convenient than nsenter

This little skill is very useful when we debug the network of pod. Most of the time, there are very few tools in pod. There is no curl and no telnet. At this time, use this skill to enter the network space of the empty device first, and then execute the command. Because only the network naming space is cut, and others are still on the host, the tools used are all the tools of the host.

Topics: Linux Kubernetes network