I/O multiplexing means that multiple descriptors can be monitored through a mechanism. Once a descriptor is ready (generally read ready or write ready), it can notify the program to perform corresponding read-write operations.
When the native socket client establishes a connection with the server, that is, when the server calls the accept method, it is blocked. At the same time, the server and client are blocked when sending and receiving data (calling recv, send, sendall). The native socket server can only process one client request at a time, that is, the server cannot communicate with multiple clients at the same time to achieve concurrency, resulting in idle server resources (at this time, the server only occupies I/O and the CPU is idle).
If our requirement is to connect multiple clients to the server, and the server needs to process requests from multiple clients. Obviously, the native socket cannot meet this requirement. At this time, we can use the I/O multiplexing mechanism to meet this requirement. We can listen to multiple file descriptors at the same time. Once the descriptors are ready, we can notify the program to read and write accordingly.
IO multiplexing in linux
(1)select
Select first appeared in 4.2BSD in 1983. It is used to monitor the array of multiple file descriptors through a select() system call. When select() returns, the ready file descriptors in the array will be modified by the kernel, so that the process can obtain these file descriptors for subsequent read and write operations.
select is currently supported on almost all platforms, and its good cross platform support is also one of its advantages. In fact, from now on, this is one of its few remaining advantages.
One disadvantage of select is that there is a maximum limit on the number of file descriptors that a single process can monitor, which is generally 1024 on Linux, but this limit can be raised by modifying the macro definition or even recompiling the kernel.
In addition, the data structure maintained by select () stores a large number of file descriptors. With the increase of the number of file descriptors, the replication overhead increases linearly. At the same time, due to the delay of network response time, a large number of TCP connections are inactive, but calling select() will perform a linear scan on all socket s, so it also wastes some overhead.
(2)poll
Poll was born in System V Release 3 in 1986. It is not much different from select in essence, but poll has no limit on the maximum number of file descriptors.
poll and select also have a disadvantage that the array containing a large number of file descriptors is copied between the user state and the address space of the kernel as a whole. Regardless of whether these file descriptors are ready or not, its overhead increases linearly with the increase of the number of file descriptors.
In addition, after select() and poll() tell the process of ready file descriptors, if the process does not perform IO operations on them, these file descriptors will be reported again the next time select() and poll() are called, so they generally do not lose ready information. This method is called Level Triggered.
(3)epoll
Until Linux 2 The implementation method directly supported by the kernel, epoll, has almost all the advantages mentioned before and is recognized as Linux 2 6, the best multi-channel I/O ready notification method.
epoll can support both horizontal trigger and edge trigger (Edge Triggered, which only tells the process which file descriptors have just become ready. It only says it once. If we do not take action, it will not tell again. This method is called edge trigger). Theoretically, the performance of edge trigger is higher, but the code implementation is quite complex.
Epoll also tells only those ready file descriptors, and when we call epoll_ When wait() obtains ready file descriptors, it returns not the actual descriptors, but a value representing the number of ready descriptors. You only need to obtain the corresponding number of file descriptors from an array specified by epoll. Here, memory mapping (mmap) technology is also used, This completely eliminates the overhead of copying these file descriptors during system calls.
Another essential improvement is that epoll adopts event based ready notification. In select/poll, the kernel scans all monitored file descriptors only after the process calls a certain method, and epoll passes epoll in advance_ CTL () to register a file descriptor. Once a file descriptor is ready, the kernel will use a callback mechanism similar to callback to quickly activate the file descriptor. When the process calls epoll_ You are notified when you wait ().
Summary:
select
Several disadvantages of select:
(1) Every time you call select, you need to copy the fd set from the user state to the kernel state. This overhead will be great when fd is a lot
(2) At the same time, every time you call select, you need to traverse all fd passed in the kernel. This overhead is also great when there are many fd
(3) The number of file descriptors supported by select is too small. The default is 1024
poll
The mechanism of poll is similar to that of select, which is not much different in essence. Managing multiple descriptors is also polling, which is processed according to the status of descriptors, but poll has no limit on the maximum number of file descriptors. Poll and select also have a disadvantage that the array containing a large number of file descriptors is copied between the user state and the address space of the kernel as a whole. Regardless of whether these file descriptors are ready or not, its overhead increases linearly with the increase of the number of file descriptors.
epoll
Epoll is proposed in the 2.6 kernel and is an enhanced version of the previous select and poll. Compared with select and poll, epoll is more flexible and has no descriptor restrictions. Epoll uses a file descriptor to manage multiple descriptors, and stores the events of the file descriptor of user relationship in an event table of the kernel, so that it only needs to copy once in user space and kernel space.
Final call epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);The function waits for events to arrive. The return value is the number of events to be processed, events Represents a collection of events to process.
One sentence summary
(1) The select and poll implementation needs to poll all fd sets continuously until the device is ready, during which sleep and wake-up may alternate many times. Epoll actually needs to call epoll_wait continuously polls the ready linked list. It may alternate between sleep and wake-up for many times, but it calls the callback function when the device is ready, puts the ready fd into the ready linked list, and wakes up in epoll_ The process of entering sleep in wait. Although both sleep and alternate, select and poll need to traverse the entire fd set when they are "awake", while epoll only needs to judge whether the ready list is empty when they are "awake", which saves a lot of CPU time. This is the performance improvement brought by the callback mechanism.
(2) select and poll copy the fd set from the user state to the kernel state once for each call, and hang the current to the device waiting queue once, while epoll only needs one copy, Moreover, you can only hang current to the waiting queue once (at the beginning of epoll_wait, note that the waiting queue here is not a device waiting queue, but a waiting queue defined in epoll). This can also save a lot of expenses.
How to use epoll
epoll interface is very simple, a total of three functions.
1,epoll_create
/* size: In the implementation of some of the latest Linux kernel versions, this size parameter has no meaning. Return value: the return value is a file descriptor, which is used as the parameter of the following two functions */ int epoll_create(int size)
This function can create a kernel event table in the kernel and manage it through the returned kernel event table
2,epoll_ctl
/* epfd: The file descriptor of the operating kernel schedule, epoll_ Return value of create function op: How to operate the kernel schedule EPOLL_CTL_ADD(Add a file descriptor to the kernel schedule (i.e. registration); EPOLL_CTL_MOD(Modify the kernel event table (event); EPOLL_CTL_DEL (Delete events in the kernel event table); fd: File descriptor for the operation event: Point to struct epoll_ Pointer to event */ int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)
Event registration function of poll, epoll_ctl adds, modifies, or deletes an event of interest to the epoll object. A return of 0 indicates success. Otherwise, a return of – 1 is returned. At this time, the error type needs to be determined according to the errno error code.
event structure
struct epoll_event { /* Store things that users are interested in and ready events, events It can be a collection of the following macros: EPOLLIN : Indicates that the corresponding file descriptor can be read (including the normal shutdown of the opposite SOCKET); EPOLLOUT: Indicates that the corresponding file descriptor can be written; EPOLLPRI: Indicates that the corresponding file descriptor has urgent data readability (here it should indicate the arrival of out of band data); EPOLLERR: Indicates that an error occurred in the corresponding file descriptor; EPOLLHUP: Indicates that the corresponding file descriptor is hung up; EPOLLET: Set EPOLL to edge triggered mode, which is relative to level triggered. EPOLLONESHOT: Only listen to one event. After listening to this event, if you need to continue listening to this socket, you need to add this socket to the EPOLL queue again */ uint32_t events; epoll_data_t data; //The most important part of the consortium is fd, that is, the file descriptor to be operated }; typedef union epoll_data { void *ptr; int fd; _uint32_t u32; _uint64_t u64; }epoll_data_t;
3,epoll_wait
/* epfd: Same as above function events: An array used to receive ready events returned by the kernel maxevents: The maximum number of events that the user can handle Timeout value of waiting for I/O (later programming is set to - 1, which means never timeout), unit: ms The return value refers to the number of ready events */ int epoll_wait(int epfd, struct epoll_event events, int maxevents, int timeout)
Wait for the event to occur, similar to the select() call. The parameter events is used to get the set of events from the kernel. Maxevents tells the kernel how big the events are. The value of maxevents cannot be greater than the value of the created epoll_ The size of create(), and the parameter timeout is the timeout (milliseconds, 0 will be returned immediately, and - 1 will be uncertain. It is also said that it is permanently blocked). This function returns the number of events to be processed. If 0 is returned, it indicates that it has timed out. If – 1 is returned, it indicates that an error has occurred. You need to check the errno error code to determine the error type.
The following describes the use of epoll through a case of echo echo echo server client and server
Server event poll
int epollFd; struct epoll_event events[MAX_EVENTS]; int ret; char buf[MAXSIZE]; memset(buf,0,MAXSIZE); //Create an epoll descriptor and manage multiple descriptors through this description epollFd = epoll_create(FDSIZE); //Add listening descriptor event add_event(epollFd,listenFd,EPOLLIN); while(1){ //Get prepared descriptor events, block ret = epoll_wait(epollFd, events, MAX_EVENTS,-1); //Handle events. ret is the number of events that occurred handle_events(epollFd,events,ret,listenFd,buf); } close(epollFd);
Client event poll
int sockfd; struct sockaddr_in servaddr; sockfd = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP); bzero(&servaddr,sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons(SERV_PORT); servaddr.sin_addr.s_addr = inet_addr(IPADDRESS); printf("start\n"); if(connect(sockfd,(struct sockaddr*)&servaddr, sizeof(sockaddr_in)) < 0){ perror("connect err: "); return 0; } else{ printf("connect succ\n"); } //Process connection handle_connection(sockfd); close(sockfd); return 0;
Program running results
client
./cli start connect succ cli hello epollfd 4, rdfd 0, sockfd 3, read 10 epollfd 4, wrfd 3, sockfd 3, write 10 epollfd 4, rdfd 3, sockfd 3, read 10 cli hello epollfd 4, wrfd 1, sockfd 3, write 10 cli over epollfd 4, rdfd 0, sockfd 3, read 9 epollfd 4, wrfd 3, sockfd 3, write 9 epollfd 4, rdfd 3, sockfd 3, read 9 cli over epollfd 4, wrfd 1, sockfd 3, write 9 ^C
Server
./srv accept a new client: 127.0.0.1:37098, fd = 5read fd=5, num read=10read message is : cli hellowrite fd=5, num write=10read fd=5, num read=9read message is : cli overwrite fd=5, num write=9read fd=5, num read=0client close.^C
This paper briefly summarizes the use methods, advantages and disadvantages of select, poll and epoll, and writes a demo of epoll for reference and detailed operation mechanism article,
The source code is detailed in the article "select, poll, epoll and the way of use" in the official account xutopia77.