When the kernel receives a network packet

Posted by jhbalaji on Tue, 15 Feb 2022 00:49:34 +0100

From network cable to network card

Now, there is a packet, which will enter the network card from the network cable.

This data packet is sent from another computer far away. After going through many difficulties and obstacles, it reaches the network card of this computer. This process can be read If you design the network Anyway, now this packet has come.

After the data packet comes, it's just a pile of electrical signals. It's still thousands of miles away from the processing of the kernel program. You need to experience the torture of the hardware of the network card first.

Let's enlarge the network card in the figure above.

At the beginning, it is our common network cable socket. Then it will go through the signal conversion module PHY, then through the MAC module, and finally reach a buffer in the network card. Note that this is the buffer in the hardware device of the network card. At this time, it has nothing to do with the kernel code.

In short, this process is essentially to convert the high and low levels in the network cable to a buffer on the network card for storage.

From network card to memory

In the previous step, the data reached the buffer of the network card. Now we need to get it to the buffer in memory. This is a simple picture.

Moreover, this process does not need the participation of CPU at all. It only needs the hardware device of DMA and the hardware device of network card to complete.

Of course, the premise of this process is that the network card driver needs to apply for a buffer called SK in memory_ Buffer, and then put this sk_ The address of the buffer tells the network card, so that DMA can know where to copy it to the memory when there is data in the buffer of the network card.

The specific process is as follows.

Register hard interrupt handler

The previous part does not show the code, which is cumbersome and not helpful to the understanding of the main process. In short, now this packet has been copied from the buffer in the network card and then to sk in memory through DMA_ Buffer is in this structure.

Since this process is completely completed by hardware, the last thing the network card should do next is to notify the kernel to process the data.

How to inform? It's interruption.

The network card sends an interrupt signal to the CPU. The CPU interrupts the current program, finds the interrupt handler according to the interrupt number and starts execution.

Let's mainly look at what the interrupt handler is for the network card to receive packets and how it is registered in the interrupt vector table.

Since the drivers of various types of network cards are different, here we take e1000 as an example. We're at e1000_main.c found such a line of code.

request_irq(netdev->irq, &e1000_intr, ...);

The function of this code is when the data packet is transferred from the network card buffer to sk in memory_ An interrupt is issued after the buffer and will be executed to e1000_intr # this interrupt handler.

Hard interrupt E1000_ What did intr do

drivers\net\e1000\e1000_main.c

 
//Registered hard interrupt handler
static irqreturn_t e1000_intr(int irq, void *data, struct pt_regs *regs) {
   __netif_rx_schedule(netdev);
}
include\linux\netdevice.h
static inline void __netif_rx_schedule(struct net_device *dev) {
    list_add_tail(&dev->poll_list, &__get_cpu_var(softnet_data).poll_list);
    //Issue soft interrupt
    __raise_softirq_irqoff(NET_RX_SOFTIRQ);
}

 

Yes, I did almost nothing. Put the network card device dev into the poll_list, and then immediately launched a soft interrupt, and then it ended.

Soft interrupt principle in "Seriously talk about soft interruption" As mentioned, in fact, it is to modify a flag bit of "pending", and then a thread in the kernel continuously polls this group of flag bits to see which is 1. Go to the soft interrupt vector table to find the handler corresponding to this flag bit, and then execute it.

This is to respond to the hard interrupt as soon as possible so that the computer can handle the next hard interrupt as soon as possible. After all, mouse clicks and keyboard strokes need to respond in a particularly timely manner. For example, in the copy and parsing process after the arrival of network packets, the priority is not so high in front of hard interrupts, so it's good to trigger a soft interrupt and wait for the kernel thread to execute.

Register soft interrupt handler

Just now in the code, we triggered a value of {NET_RX_SOFTIRQ's soft interrupt, which soft interrupt processing function will this soft interrupt execute?

The kernel registered the processing function corresponding to the soft interrupt as early as the initialization of the network subsystem.

net\core\dev.c

static int __init net_dev_init(void) {
    open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL);
    open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL);
}

//transmit
static void net_tx_action(struct softirq_action *h) {...}
//receive
static void net_rx_action(struct softirq_action *h) {...}

This is open_softirq # is to register a soft interrupt function. It is very simple to assign this function to the action at the corresponding position in the soft interrupt vector table. Still the picture above.

Two soft interrupts are registered here, one for sending and one for receiving. We are receiving this time, so after the soft interrupt is triggered, it will be executed to {net_rx_action - this function.

Soft interrupt net_ rx_ What did action do

Look directly!

net\core\dev.c

static void net_rx_action(struct softirq_action *h) {
    struct softnet_data *queue = &__get_cpu_var(softnet_data);   
    while (!list_empty(&queue->poll_list)) {
        struct net_device dev = list_entry(
            queue->poll_list.next, struct net_device, poll_list);
        dev->poll(dev, &budget);
    }
}

Traversal poll_list takes out one device dev and calls its poll function.

Remember the line of code before we launch the soft interrupt? It is the network card device dev with data packets coming that is put into the poll_list, now it's taken out again.

Due to the need to call the poll function of the corresponding driver of the network card, when the network card is initialized, the poll function of e1000 network card is attached with this function address.

netdev->poll = &e1000_clean;

So, let's look at this function next. You can know by listening to the name that it is the work of cleaning up the data packets of the network card.

drivers\net\e1000\e1000_main.c

static int e1000_clean(struct net_device *netdev, int *budget) {
    struct e1000_adapter *adapter = netdev->priv;    
    e1000_clean_tx_irq(adapter);
    e1000_clean_rx_irq(adapter, &work_done, work_to_do);
}

Since we only look at the process of reading data in this lecture, we'll just look at the rx part.

This function is too long. We follow only one line.

// drivers\net\e1000\e1000_main.c
e1000_clean_rx_irq(struct e1000_adapter *adapter) {
    ...
    netif_receive_skb(skb);
    ...
}

// net\core\dev.c
int netif_receive_skb(struct sk_buff *skb) {
    ...
    list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) {
        ...
        deliver_skb(skb, ptype, 0);
        ...
    }
    ...
}

static __inline__ int deliver_skb(
        struct sk_buff *skb, struct packet_type *pt_prev, int last) {
    ...
    return pt_prev->func(skb, skb->dev, pt_prev);
}

We saw that we followed all the way and executed pt_prev func function.

What is this function for? Or first, which function does the specific implementation of this function point to? This involves the registration of protocol stack.

Registration of protocol stack

IP protocol registration, here.

// net\ipv4\ip_output.c
static struct packet_type ip_packet_type = {
    .type = __constant_htons(ETH_P_IP),
    .func = ip_rcv,
};

void __init ip_init(void) {
    dev_add_pack(&ip_packet_type);
}

// net\core\dev.c
void dev_add_pack(struct packet_type *pt) {
    if (pt->type == htons(ETH_P_ALL)) {
        list_add_rcu(&pt->list, &ptype_all);
    } else {
        hash = ntohs(pt->type) & 15;
        list_add_rcu(&pt->list, &ptype_base[hash]);
    }
}

We see that func is assigned to ip_rcv, this function is naturally executed in the last step. In fact, it means who is responsible for parsing in the network layer.

By the way, let's also read the protocol registration of the transport layer. It's not difficult to think of IP_ After the RCV function is processed, it must be handed over to the transport layer for further processing.

module_init(inet_init);

static struct inet_protocol tcp_protocol = {
    .handler =  tcp_v4_rcv,
    .err_handler =  tcp_v4_err,
    .no_policy =    1,
};

static struct inet_protocol udp_protocol = {
    .handler =  udp_rcv,
    .err_handler =  udp_err,
    .no_policy =    1,
};

static int __init inet_init(void) {
    inet_add_protocol(&udp_protocol, IPPROTO_UDP);
    inet_add_protocol(&tcp_protocol, IPPROTO_TCP);
    ip_init();
    tcp_init();
}

It is very intuitive and clear. Remember that the above two handler s are tcp_v4_rcv # and udp_rcv.

Let's look back and keep looking_ RCV is in this function.

Network layer processing function IP_ What did RCV do

// net\ipv4\ip_input.c
int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt) {
    ...
    return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL,
               ip_rcv_finish);
}

static inline int ip_rcv_finish(struct sk_buff *skb) {
    ...
    if (skb->dst == NULL) {
        if (ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, dev))
            goto drop; 
    }
    ...
    return dst_input(skb);
}

// include\net\dst.h
// rth->u.dst.input= ip_local_deliver;
static inline int dst_input(struct sk_buff *skb) {
    ...
    skb->dst->input(skb);
    ...
}

// net\ipv4\ip_input.c
int ip_local_deliver(struct sk_buff *skb) {
    ...
    return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL,
               ip_local_deliver_finish);
}

static inline int ip_local_deliver_finish(struct sk_buff *skb) {
    ...
    ipprot = inet_protos[hash];
    ipprot->handler(skb);
    ...
}
 

OK, it's done! Finally, the} handler is executed. Remember the one registered in the protocol stack in the previous section?

static struct inet_protocol tcp_protocol = {
    .handler =  tcp_v4_rcv,
    .err_handler =  tcp_v4_err,
    .no_policy =    1,
};

static struct inet_protocol udp_protocol = {
    .handler =  udp_rcv,
    .err_handler =  udp_err,
    .no_policy =    1,
};

Since the transport layer protocol parsed by the network layer is tcp, the handler points to the function that handles the tcp protocol, tcp_v4_rcv!

Then there is the processing flow of tcp protocol. The parsed data is received and processed by the application program, which is our "socket bind listen read" process. This is a new world. I haven't studied it yet. Write it here.

However, for the principle of TCP, you can read this article to understand it vividly, "You call this shit TCP".

Finally, let's take a picture to help understand the whole process of kernel packet collection.

You see, we often say that the protocol stack keeps removing the header and handing it over to the upper protocol stack. At the code level, this sentence is actually the method of network layer protocol parsing_ At the end of RCV, TCP, the transport layer protocol parsing method, is called_ v4_ RCV, that's all.

Linux interrupt processing is divided into the top half and the bottom half. The code level is to directly initiate a soft interrupt in the code of the hard interrupt processing function, and then return, that's all.

Topics: Linux network p2p