IO multiplexing

Posted by fri3ndly on Fri, 21 Jan 2022 03:59:12 +0100

catalogue

1. Several IO operating modes

(1) Blocking wait

2 non blocking, busy polling

2 IO multiplexer select/poll/epoll

The first type is select/poll

The second epoll

Basic practice of IO multiplexing

3 select function

select function definition

File descriptor set fd_set operation function

select working process

select multiplex code

select advantages and disadvantages

4. Poll function

poll function definition

Advantages and disadvantages of poll

Horizontal trigger & edge trigger

5 epoll function

epoll_create function

epoll_ctl function

epoll_wait function

Three working modes of epoll

Horizontal trigger mode LT (default)

Edge trigger mode ET

Edge non blocking trigger

6 file descriptor exceeds 1024 limit

1. Several IO operating modes

(1) Blocking wait

When blocking, it does not occupy CPU time slice (process blocking state). It means to wake up the server when there is a connection request.

If there is only one connection request, this method is acceptable. If there are multiple connection requests, only one can be processed at the same time. The solution is that the main process or main thread is only responsible for blocking and waiting. Once the connection request is received, the sub process or multi thread method is used to process the connection.

2 non blocking, busy polling

The advantage is that it improves the execution efficiency of the program (faster execution? And it takes time to create a process with multiple processes or threads by blocking and waiting)

The disadvantage is that it needs to occupy more cpu and system resources

Busy polling is also inappropriate for processing multiple connection requests, because it can only process one connection at a time.

2 IO multiplexer select/poll/epoll

IO multiplexing is to entrust the kernel to process the business and return the processing result after processing.

The first type is select/poll

IO multiplexing is to entrust the kernel to help detect how many connected clients need to communicate with the server process. In the select/epoll mode, the kernel will tell the server process how many processes need to communicate with you, but it will not say which processes and which processes need to be traversed. In this way, a linear table is traversed.

The second epoll

The kernel not only tells the server how many clients want to communicate, but also points out which clients. Red and black trees are used in this way.

Basic practice of IO multiplexing

(1) First, construct a list of file descriptors and add the file descriptors we want the kernel to help detect to the table.

(2) Call a function (select/poll/epoll) to always detect the file descriptor in the table, This function does not return until one of these file descriptors performs IO operation (detecting IO operation means detecting the server read buffer. If there is an array in the buffer, it means to communicate. Note that the buffer is associated with the file descriptor, and each file descriptor corresponds to its own buffer).

It should be noted that this function is a blocking function, and its detection of file descriptors is completed by the kernel.

(3) When the function returns, it tells the process how many (which) descriptors to perform IO operations.

3 select function

select function definition

int select(int nfds, 
           fd_set *readfds,
           fd_set *writefds,
           fd_set *exceptfds,
           struct timeval *timeout);

nfds: The largest of the file descriptors to detect fd Add 1, you can fill 1024 directly
readfds: When reading the set, the kernel will detect the corresponding read buffer of each file descriptor in the read set on the server to see if there is data
write: Write a collection and wear half NULL,Because half of the write operations are initiated actively, there is no need to detect
exceptfds: Exception collection. Exceptions may occur in some file descriptors
timeout: 
    pass NULL: Permanent blocking, when detected fd Return when changing.
    timeout: timeval Structure object used to set blocking time. If both seconds and microseconds are set to zero, blocking will not occur.
        struct{
                long tv_sec; // Seconds
                long tv_usec; // Microseconds
                }



fd_set is a set of file descriptors, which corresponds to a table containing 1024 flag bits, numbers 0 to 1023, and the file descriptor table of the corresponding process. It should be noted that FD_ All stored in set are file descriptors for communication on the server side. fd_set is implemented internally with an array, and the size is limited to 1024.

File descriptor set fd_set operation function

Empty all

void FD_ZERO(fd_set *set);

Delete an item from the collection

void FD_CLR(int fd, fd_set *set);

Adds a file descriptor to the collection

void FD_SET(int fd, fd_set *set);

Determine whether a file descriptor is in the collection

int FD_ISSET(int fd, fd_set *set);

select workflow

Clients a, B, C, D, e and F are connected to the server, corresponding to the file descriptor 3, 410101102103 used by the server for communication. Suppose three clients of ABC send data when entrusting kernel detection.

The first step is to create FD_ Table of type set.

fd_set reads;

The second step is to add the file descriptor to be detected to reads.

FD_SET(3, &read);

FD_SET(4, &read);

FD_SET(100, &read);

......

Step 3: call the select function.

select(103+1, &reads, NULL, NULL, NULL);

Blocking waits until a buffer change is detected before returning.

Because the kernel will be entrusted to work after calling select. Our original table reads is in user space. After calling, the kernel will copy a copy to kernel space. The kernel does two things, one is to get the initial table, and the other is to modify the table and return. As shown below, on the left is the initial table obtained by the kernel, and then the kernel starts to detect which file descriptor has data in the corresponding buffer. In our example, there is data in the buffer corresponding to the client ABC, so the 3 and 4100 corresponding to the file descriptor remain 1, while the other detected bits are modified to 0, as shown on the right. It is a kernel modified table. After the select function returns, the modified table is stored in reads, so reads can also be considered as an incoming and outgoing parameter. We need to back up the original reads at the beginning, otherwise it will have changed after select.

After we get the modified table reads, we just use FD_ISSET to traverse each bit in the reads, and see which bit is 1, you can know which clients request communication.

select multiplex code

Pseudo code

int main(){
    int lfd = socket(); //Create a socket. lfd is the file descriptor for listening
    bind(); // Bind port and IP
    listen(); // monitor

    // Create FD_ The set table is actually equivalent to a file descriptor table. Two are defined because one of them needs to be backed up
    fd_set reads, temp;
    // Table initialization
    FD_ZERO(&read);
    // Add the monitored file descriptor lfd to the table. When a client applies for connection, it also sends a SYN packet to the read buffer of the server. Therefore, you can check whether there is data in the buffer through select to determine whether there is a client requesting connection.
    FD_SET(lfd, &read);
    int manfd = lfd;

    // Calling select once is definitely not enough. You need to call select circularly
    while(1){
        // Delegate kernel detection
        temp = reads; // reads is used for backup, and temp is used as the incoming and outgoing parameters
        // When select returns, the value of temp has changed
        int ret = select(maxfd + 1, &temp, NULL, NULL, NULL);

        // According to the returned result, first judge whether a new connection has arrived, that is, whether there is data in the buffer corresponding to the lfd listening descriptor
        if(FD_ISSET(lfd, &temp)){
            // If new data is detected in the listening descriptor buffer, it indicates that there is a new connection and accepts the new connection
            int cfd = accept(); // At this time, accept will not block because the kernel has detected a new connection
            // cfd corresponds to the communication descriptor of a new client and adds it to reads to detect whether the client sends data
            fd_set(cfd, &reads);
            // After adding a new file descriptor to the read set, the maximum descriptor may change. Update maxfd
            maxfd = maxfd < cfd ? cfd : maxfd;
        }
        // Detect whether the client sends data
        for(int i = lfd + 1; i <= maxfd; ++i){
            // Traverse the modified temp returned by the select to see which client has data
            if(FD_ISSET(i, &temp)){ 
                int len = read();
                if(len == 0){ // Description client disconnected? Why?
                    FD_CLR(i, &reads); // Clear the communication descriptor corresponding to the disconnected client
                }
                write(); // Communication, write data to the client
            }
        }
      


    }



}

select advantages and disadvantages

Advantages: cross platform

Disadvantages:

Every time you call select, you must set fd_ Copying a set from user mode to kernel mode, fd often costs a lot. Frequent conversion between user space and kernel space is time-consuming, and the copy cost can not be ignored..

Every time you call select, you need to traverse all fd file descriptors passed in the kernel. This overhead is also great when there are many fd files, and the overhead of linear traversal is great.

The number of file descriptors supported by select is too small. The default is 1024, FD_ The bottom layer of set is implemented by array.

4. Poll function

poll function definition

int poll(struct pollfd *fd, nfds_t nfds, int timeout);

fd: pollfd Type array pointer, passed when actually used pollfd Type array name, automatically degenerated into a pointer
nfds: The index of the last element used in the array+1,The kernel polls for detection fd Each file descriptor of the array
timeout: 
    -1: Permanent blocking
     0: Return immediately after the call is completed
    >0: Length of wait in milliseconds
 Return value: IO Number of file descriptors to send changes

pollfd structure is defined as:

struct pollfd{
    int fd; // File descriptor
    short events; // Waiting events
    short revents; // The actual event is the feedback from the kernel, similar to the incoming and outgoing parameters of the function
}

Pollfd structure is equivalent to merging the three sets of read, write and exception in select. Each pollfd structure object has an fd parameter, and then the events parameter is used to select whether the fd is in the read set, write set or exception set. Of course, you can also have all three kinds. Connect the options with |.

Advantages and disadvantages of poll

advantage:

It is said that the file descriptors in the bottom layer of poll are stored in a chain, so there is no limit on the maximum number of connections in select. However, I see that there are no pointer members in the pollfd structure. I don't know how to do this linked list.

Disadvantages:

A large number of fd arrays (linked list?) It is copied between the user state and the kernel address space as a whole, which is expensive.

Traversing fd is also expensive.

"Horizontal trigger": that is, if fd is reported and not processed, the fd will be reported again in the next poll.

Horizontal trigger & edge trigger

Horizontal trigger: when a descriptor event is detected to be ready and notified to the application, the application may not process the event immediately. The event will be notified again when it is called next time.

Edge trigger: when a descriptor event is detected ready and notified to the application, the application must process the event immediately. If not, the event will not be notified in the next call. That is, the edge trigger is only notified once when the status changes from not ready to ready.

5 epoll function

There are three main functions.

epoll_create function

Create underlying red black tree

int epoll_create(int size);
size: The maximum number of file descriptors to be detected is planned, but in fact, it doesn't matter if this parameter is filled in small, and it will be expanded automatically.
Return value: file descriptor

This function generates a special file descriptor for epoll. Because the file descriptor is organized in the form of red black tree in the bottom epoll, epoll is called_ The create function is equivalent to constructing the root node of a red black tree. In my opinion, call epoll_create is equivalent to creating a red black tree. You can insert and delete nodes in this tree in the future.

epoll_ctl function

Insert the node to be detected into the red black tree or delete it from the tree

epoll_ctl is used to control an epoll file descriptor event, which can be registered, modified and deleted

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
epfd: afferent epoll_create Return value of
op: Three macros are used to control the registration, modification and deletion of nodes
    EPOLL_CTL_ADD --- register
    EPOLL_CTL_MOD --- modify
    EPOLL_CTL_DEL --- delete
fd: To insert on a red black tree/delete/Modified file descriptor
event: epoll The real node on the red black tree at the bottom is epoll_event Object of structure type

epoll_event Structure is defined as follows:
struct epoll_event{
    uint32_t events;
    epoll_data_t data;
}

For members in a structure:
events: It is used to set the event corresponding to the detection file descriptor. The following macro can be used | Separate and detect multiple events at the same time
    EPOLLIN----read
    EPOLLOUT---write
    EPOLLERR---abnormal
    EPOLLPRI---The corresponding file descriptor has emergency data readable
    EPOLLHUP---The corresponding file descriptor is hung up
    EPOLLET---take EPOLL Set to edge trigger mode
    EPOLLONESHOT---Only monitor one event. After listening to this event, if you need to continue to monitor this event socket,You need to put this again socket Add to EPOLL In the queue.
data: yes epoll_data_t Type of consortium

typedef union epoll_data{
    void *ptr; // If you want to describe more information about a node, use this member
    int fd; // This member is usually used
    uint32_t u32;
    uint64_t u64;
}

Every time epoll is called_ CTL will copy the parameter event to the red black tree as a node, and then copy it to epoll when the change of file descriptor is detected_ Go to the events array of the wait function parameter to know which file descriptors have changed.

epoll_wait function

It is equivalent to the previous select or poll function, which is used to detect the occurrence of IO events, and the epoll above_ Create and epoll_ctl is preparing for this step.

int epoll_wait(
    int epfd,
    struct epoll_event *events,
    int maxevents,
    int timeout
);

epfd: epoll_create The return value of, the root node of the red black tree
events: epoll_event Type structure array, when it is detected that some file descriptors have IO Event, the corresponding file descriptors            
    epoll_event Nodes of type are copied into this array, that is, this array is an incoming and outgoing parameter. It is precisely because of the existence of this array,    
    epoll To mark which file descriptors have changed.
maxevents: events The size of the array is used to control the overflow when copying elements into the array.
timeout: 
    -1: Permanent blocking
    0: Return now
    >0: After the timeout, the event returns
 Return value: how many file descriptor states have changed

Three working modes of epoll

Horizontal trigger mode LT (default)

As shown in the figure below, the customer service side sends 100 bytes of data each time, and the server side can only read an array of 50 bytes at a time. When the client sends data, epoll_wait detected data in the read buffer, epoll_ The wait function returns, and then the server reads the data. Wait until 50 bytes are read at a time, carry out the next cycle, and then epoll_wait detects that there is data in the read buffer corresponding to the same file descriptor (the remaining 50 bytes last time), so epoll_wait returns again and continues to read the remaining 50 bytes. This is the horizontal trigger mode.

Horizontal trigger mode features:

(1) As long as the buffer corresponding to fd has data, epoll_wait returns.

(2) The number of times returned is not related to the number of times sent. One send may correspond to multiple epolls_ Wait returns.

Edge trigger mode ET

The client sends data to the server:

When the client sends it once, the server returns it once, that is, the number of times the server returns is strictly determined by the number of times the client sends it.

What if the client sends too much data at one time and the server does not read it all at once? Then, the remaining data remains in the buffer. When the client sends the data again, the data in these buffers will be read out. That is, although the client sends the data again, the server may only read the data left in the buffer that was not read last time.

In this way, the client sends epoll once and the server epoll_wait returns once. If the client sends too much data each time, will it cause the data not to be read in the end? The answer is yes, but the edge trigger mode does not care whether the data is read or not. The edge trigger mode exists to improve efficiency because epoll_ The more wait calls are made, the more overhead the system will incur. The edge trigger mode is used to reduce epoll_ Number of calls to wait.

Edge non blocking trigger

The upper edge trigger mode default file descriptor has a blocking attribute.

In the edge trigger mode mentioned above, the reason why you can't finish reading data is that the server reads too little at a time. If you use while(recv()) to read more at a time, can you read all the data? Because the file descriptor fd under the edge trigger has the blocking attribute, the recv() function will block after reading the data once, so it can no longer run. Of course, epoll can no longer be called_ Wait function to delegate kernel detection.

Therefore, a third mode, edge non blocking trigger, is needed, that is, the file descriptor fd has a non blocking attribute.

Edge non blocking trigger is the most efficient.

There are two ways to set the non blocking of the corresponding file descriptor in the edge trigger mode.

(1) Through the open() function

Set the flags parameter in the open() function to O_WDRW | O_NONBLOCK

(2) Set via fcntl

Step 1: obtain the current flag: int flag = fcntl(fd, F_GETFL);

Step 2: add non blocking attribute to flag: flag |= O_NONBLOCK;

Step 3: set the flag back: fcntl(fd, F_SETFL,flag);

6 file descriptor exceeds 1024 limit

select: the 1024 limit cannot be exceeded because it is implemented with an array. If you want to modify it, you need to change the size of the array and recompile the kernel.

poll: it can break through 1024 because the internal linked list is implemented.

epoll: it can break through 1024 because the internal red black tree is implemented.

The maximum number of file descriptors that can be supported is also related to the specific machine. You can view a configuration file:

In any case, the upper limit value of the configuration modification file descriptor cannot exceed this number.