Python multiplexing: selector module

Posted by PHP-Nut on Wed, 10 Nov 2021 20:42:36 +0100

catalogue

1. IO multiplexing

O multiplexing technology uses an intermediary that can monitor multiple IO blocking at the same time to monitor these different IO objects. If any one or more IO objects monitored have a message returned, it will trigger the intermediary to return these IO objects with messages for their messages.

The advantage of using IO multiplexing is that the process can also handle multiple IO blocks at the same time in the case of a single thread. Compared with the traditional multi thread / multi process model, I/O multiplexing has less system overhead. The system does not need to create new processes or threads, nor maintain the operation of these processes and threads, which reduces the maintenance workload of the system and saves system resources,

Python provides a selector module to implement IO multiplexing. At the same time, the types of intermediaries are different on different operating systems. At present, the common ones are epoll, kqueue, devpoll, poll,select, etc; Kqueue (BSD, mac support), devpoll (solaris support) and epoll are basically the same. Epoll is implemented in Linux 2.5 + kernel, and Windows system only implements select.

1.1. Comparison of epoll, poll and select

Select and poll use polling to detect whether all monitored IOS have data returned. They need to constantly traverse each IO object. This is a time-consuming operation with low efficiency. The advantage of poll over select is that select limits the maximum number of monitored IOS to 1024, which is obviously not enough for servers requiring a large number of network IO connections; There is no limit on the number of poll. However, this is also a problem. When polling is used to monitor these IOS, the larger the number of IOS, the more time each polling consumes. The lower the efficiency, which can not be solved by polling.

epoll was born to solve this problem. First of all, it has no limit on the maximum number of IO monitored, and does not use polling to detect these IOS. Instead, it uses event notification mechanism and callback to obtain these IO objects with message returns. Only "active" IO will actively call the callback function. This IO will be processed directly without polling.

2. Basic use of selector module

import selectors
import socket

# Create a socketIO object. After listening, you can accept the request message
sock = socket.socket()
sock.bind(("127.0.0.1", 80))
sock.listen()

slt = selectors.DefaultSelector()  # Use the system default selector, select for Windows and epoll for linux
# Add the socketIO object to select to monitor
slt.register(fileobj=sock, events=selectors.EVENT_READ, data=None)

# Loop processing messages
while True:
    # select method: poll the selector. When at least one IO object has a message returned, it will return the IO object with a message
    ready_events = slt.select(timeout=None)
    print(ready_events)     # Ready IO objects
    break

ready_events is a list (representing all IO objects with data to receive registered in this select), and each tuple in the list is:

  • SelectorKey object:
    • fileobj: registered socket object
    • fd: file descriptor
    • data: the parameters we pass in during registration can be arbitrary values and bound to an attribute for later use.
  • mask value
    • EVENT_READ: indicates readable; Its value is actually 1;
    • EVENT_WRITE: indicates writable; Its value is actually 2;
    • Or a combination of the two

For example:

[(SelectorKey(fileobj=<socket.socket fd=456, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 80)>, fd=456, events=1, data=None),
    1)]

To process the request, you only need to use the corresponding method of the socket. The socket is used to receive the connection of the request. You can process the request by using the accept method.

After accepting the request, a new client will be generated. We will put it into the selector and monitor it together. When a message comes, if it is a connection request, handle_ The request () function handles the message. If it is a client message, handle_client_msg() function processing.

There are two types of sockets in select, so we need to judge which socket is returned after being activated, and then call different functions to make different requests. If there are many types of sockets in this select, you can't judge it in this way. The solution is to bind the processing function to the corresponding selectkey object. You can use the data parameter.

def handle_request(sock:socket.socket, mask):    # Process new connections
    conn, addr = sock.accept()
    conn.setblocking(False)  # Set non blocking
    slt.register(conn, selector.EVENT_READ, data=handle_client_msg)

def handle_client_msg(sock:socket.socket, mask)  # Processing messages
    data = sock.recv()
    print(data.decode())

sock = socket.socket()
sock.bind(("127.0.0.1", 80))
sock.listen()

slt = selectors.DefaultSelector()
slt.register(fileobj=sock, events=selectors.EVENT_READ, data=handle_request)

while True:
    ready_events = slt.select(timeout=None)
    for event, mask in ready_events:
        event.data(event.fileobj, mask)
        # Different sockets have different data functions. Use your own bound data function call, and then use your own socket as a parameter. You can handle different types of sockets.

The above problem is well solved by using data, but it should be noted that the functions (or callable objects) bound to the data attribute will eventually be called in the way of event.data(event.fileobj), and the parameters accepted by these functions should be the same.