Event driven framework in Redis

Posted by ianr on Wed, 05 Jan 2022 06:47:03 +0100

Event driven framework in Redis

In the previous article, I learned the basic working mechanism of Redis event driven framework, introduced the Reactor model based on the event driven framework, and took the client connection event in the IO event as an example to introduce the basic process of the framework: calling the aeCreateFileEvent function to register and listen for events during server initialization, After the server initialization is completed, the aeMain function is called, and the aeMain function executes the aeProceeEvent function circulate to capture and process the event triggered by the client request.

However, in the last article, we mainly focused on the basic process of the framework, so you may still have some questions here, such as:

  • Are there any other IO events monitored by Redis event driven framework other than the client connection described in the previous lesson?
  • Besides IO events, will the framework listen for other events?
  • What specific operations in Redis source code do the creation and processing of these events correspond to?

In today's lesson, I'll introduce you to two types of events in Redis event driven framework: IO events and time events, as well as their corresponding processing mechanisms.

In fact, understanding and learning this part can help us more comprehensively grasp how Redis event driven framework handles the request operations and multiple tasks faced by the server in the form of events. For example, what events and functions are used to process normal client read-write requests, and how the background snapshot task is started in time.

Because the event driven framework is the core circular process after the Redis server runs, it is very helpful for us to understand when and what functions it uses to deal with which events.

On the other hand, we can also learn how to handle IO events and time events simultaneously in one framework. We usually develop server-side programs and often need to deal with periodic tasks. Redis's processing and implementation of two types of events gives us a good reference.

OK, in order to have a relatively comprehensive understanding of these two types of events, let's start with the data structure and initialization of the event driven framework cycle process, because it includes the data structure definition and initialization operations for these two types of events.

aeEventLoop structure and initialization

First, let's take a look at the data structure aeEventLoop corresponding to the Redis event driven framework cycle process. This structure is in the event driven framework code AE H, which records the information during the framework cycle operation, including the variables that record two types of events, namely:

  • The * pointer events of aeFileEvent type represents IO events. The type name is aeFileEvent because all IO events are identified by file descriptors;
  • The * pointer timeEventHead of aeTimeEvent type represents a time event, that is, an event triggered according to a certain time period.

In addition, in the aeEventLoop structure, there is a pointer fired of aeFiredEvent type. This is not a special event type. It is only used to record the file descriptor information corresponding to the triggered event *.

The following code shows the structure definition of event loop in Redis. You can have a look.

ae.h view in file

/* State of an event based program */
typedef struct aeEventLoop {
    int maxfd;   /* highest file descriptor currently registered */
    int setsize; /* max number of file descriptors tracked */
    long long timeEventNextId;
    // IO event array
    aeFileEvent *events; /* Registered events */ 
    // Event array triggered
    aeFiredEvent *fired; /* Fired events */
    // Record time event chain header
    aeTimeEvent *timeEventHead;
    int stop;
    // Data related to API call interface
    void *apidata; /* This is used for polling API specific data */
    // Function executed before entering the event loop flow
    aeBeforeSleepProc *beforesleep;
    // Function executed after exiting the event loop process
    aeBeforeSleepProc *aftersleep;
    int flags;
} aeEventLoop;

After understanding the aeEventLoop structure, let's take a look at how this structure is initialized, including the initialization of IO event array and time event linked list.

Initialization of aeCreateEventLoop function

Because Redis server starts to run the circular process of event driven framework after initialization, aeEventLoop structure is in server The initServer function of C is initialized by calling the aeCreateEventLoop function. This function has only one parameter, setsize.

The following code shows the call to the aeCreateEventLoop function in the initServer function.

server.c view in file

initServer() {
...
//Call the aeCreateEventLoop function to create the aeEventLoop structure and assign it to the el variable of the server structure
server.el = aeCreateEventLoop(server.maxclients+CONFIG_FDSET_INCR);
...
}

From here, we can see that the size of the parameter setsize is actually defined by the maxclients variable and macro of the server structure_ FDSET_ Jointly determined by incr. The value of the maxclients variable can be found in the Redis configuration file Redis Conf, the default value is 1000. The macro defines config_ FDSET_ The size of the incr, equal to the macro definition config_ MIN_ RESERVED_ Add 96 to the value of FDS, as shown below. The two macro definitions here are in server Defined in the H file.

#define CONFIG_MIN_RESERVED_FDS 32
#define CONFIG_FDSET_INCR (CONFIG_MIN_RESERVED_FDS+96)

Well, here you may have a question: the setsize parameter of the aeCreateEventLoop function is set to the maximum number of clients plus a macro definition value, but what's the use of this parameter? This is related to the initialization operation performed by the aeCreateEventLoop function. Next, let's look at the operations performed by the aeCreateEventLoop function, which can be roughly divided into the following three steps.

  • In the first step, the aeCreateEventLoop function will create a variable eventLoop of aeEventLoop structure type. Then, the function will allocate memory space to the member variable of eventLoop. For example, allocate corresponding memory space to IO event array and triggered event array according to the passed parameter setsize. In addition, the function also assigns an initial value to the member variable of eventLoop.
  • In the second step, the aeCreateEventLoop function will call the aeApiCreate function. The aeApiCreate function encapsulates the IO multiplexing function provided by the operating system. Assuming that Redis runs on the Linux operating system and the IO multiplexing mechanism is epoll, the aeApiCreate function will call epoll_create creates an epoll instance and epoll will be created at the same time_ An array of event structure. The array size is equal to the parameter setsize.

Here, you should note that the aeApiCreate function combines the created epoll instance descriptor with epoll_ The event array is saved in the variable state of aeApiState structure type, as shown below:

In ae_epoll.c view in file

// aeApiState structure definition
typedef struct aeApiState {
    // Descriptor of epoll instance
    int epfd;
    // epoll_ The event structure array records listening events
    struct epoll_event *events;
} aeApiState;

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));

    if (!state) return -1;
    // Epoll_ The event array is saved in the aeApiState struct variable state
    state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize);
    if (!state->events) {
        zfree(state);
        return -1;
    }
    // Save the epoll instance descriptor in the aeApiState struct variable state
    state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */
    if (state->epfd == -1) {
        zfree(state->events);
        zfree(state);
        return -1;
    }
    anetCloexec(state->epfd);
    eventLoop->apidata = state;
    return 0;
}

Next, the aeApiCreate function assigns the state variable to apidata in eventLoop. In this way, there are epoll instances and epoll in the eventLoop structure_ Event array, which can be used to create and process events based on epoll. I'll give you more details later.

eventLoop->apidata = state;
  • Step 3: aeCreateEventLoop function initializes the mask of file descriptor corresponding to all network IO events to AE_NONE indicates that no event will be monitored for the time being.

I put the main part of the code of the aeCreateEventLoop function here. You can have a look.

ae.c view in file

aeEventLoop *aeCreateEventLoop(int setsize) {
    aeEventLoop *eventLoop;
    int i;

    monotonicInit();    /* just in case the calling app didn't initialize */

    // Allocate memory space to the eventLoop variable
    if ((eventLoop = zmalloc(sizeof(*eventLoop))) == NULL) goto err;
    // Allocate memory space for IO events and triggered events
    eventLoop->events = zmalloc(sizeof(aeFileEvent)*setsize);
    eventLoop->fired = zmalloc(sizeof(aeFiredEvent)*setsize);
    if (eventLoop->events == NULL || eventLoop->fired == NULL) goto err;
    eventLoop->setsize = setsize;
    // Set the linked list header of time event to NULL
    eventLoop->timeEventHead = NULL;
    eventLoop->timeEventNextId = 0;
    eventLoop->stop = 0;
    eventLoop->maxfd = -1;

    eventLoop->beforesleep = NULL;
    eventLoop->aftersleep = NULL;
    eventLoop->flags = 0;
    // Call the aeApiCreate function to actually call the IO multiplexing function provided by the operating system
    if (aeApiCreate(eventLoop) == -1) goto err;
    /* Events with mask == AE_NONE are not set. So let's initialize the
     * vector with it. */
    // Set the mask of the file descriptor corresponding to all network IO events to AE_NONE
    for (i = 0; i < setsize; i++)
        eventLoop->events[i].mask = AE_NONE;
    return eventLoop;
//Processing logic after initialization failure
err:
    if (eventLoop) {
        zfree(eventLoop->events);
        zfree(eventLoop->fired);
        zfree(eventLoop);
    }
    return NULL;
}

OK, from the execution process of aeCreateEventLoop function, we can actually see the following two key points:

  • The size of the IO event array monitored by the event driven framework is equal to the parameter setsize, which determines the number of clients connected to the Redis server. Therefore, when you encounter the error "max number of clients reached" when the client connects to Redis, you can go to Redis The conf file modifies the maxclients configuration item to expand the number of clients that the framework can listen to.
  • When the epoll mechanism of Linux system is used, the framework loop process initialization operation will create epoll through aeApiCreate function_ Event structure array and call epoll_ The create function creates an epoll instance, which is a preparation requirement for using the epoll mechanism.

IO event handling

In fact, Redis IO events mainly include three types: readable events, writable events and barrier events.

Among them, readable events and writable events are actually easier to understand, that is, corresponding to Redis instances, we can read data from or write data to the client. The main function of barrier events is to reverse the processing order of events. For example, by default, Redis will first return the results to the client, but if it is necessary to write the data to the disk as soon as possible, Redis will use the barrier event to adjust the order of writing data and replying to the client, drop the data to the disk first, and then reply to the client.

In Redis source code, the data structure of IO events is the aeFileEvent structure, and the creation of IO events is completed through the aeCreateFileEvent function. The following code shows the definition of aeFileEvent structure. You can review it again:

typedef struct aeFileEvent {
    int mask; //Mask tags, including readable events, writable events, and barrier events
    aeFileProc *rfileProc;   //Callback function that handles readable events
    aeFileProc *wfileProc;   //Callback function that handles writable events
    void *clientData;  //Private data
} aeFileEvent;

For the aeCreateFileEvent function, we learned in the last lesson that it completes event registration through the aeApiAddEvent function. Next, let's look at how it is executed from the code level, which can help us more thoroughly understand how the event driven framework monitors IO events based on epoll mechanism.

IO event creation

First, let's look at the prototype definition of aeCreateFileEvent function, as shown below:

ae.c view in file

int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask, aeFileProc *proc, void *clientData)
{
    if (fd >= eventLoop->setsize) {
        errno = ERANGE;
        return AE_ERR;
    }
    aeFileEvent *fe = &eventLoop->events[fd];

    if (aeApiAddEvent(eventLoop, fd, mask) == -1)
        return AE_ERR;
    fe->mask |= mask;
    if (mask & AE_READABLE) fe->rfileProc = proc;
    if (mask & AE_WRITABLE) fe->wfileProc = proc;
    fe->clientData = clientData;
    if (fd > eventLoop->maxfd)
        eventLoop->maxfd = fd;
    return AE_OK;
}
int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask, aeFileProc *proc, void *clientData)

This function has five parameters, which are

  • Circular process structure * eventLoop
  • File descriptor fd corresponding to IO event
  • Event type mask
  • Event handling callback function * proc
  • Event private data * clientData.

Because there is an IO event array in the loop process structure * eventLoop, and the elements of this array are of aeFileEvent type, each array element records the listening event type and callback function associated with a file descriptor (such as a socket).

aeCreateFileEvent function will first obtain the IO event pointer variable * fe associated with the file descriptor fd in the IO event array of eventLoop, as shown below:

aeFileEvent *fe = &eventLoop->events[fd];

Next, the aeCreateFileEvent function will call the aeApiAddEvent function to add the event to listen to:

if (aeApiAddEvent(eventLoop, fd, mask) == -1)
   return AE_ERR;

The aeApiAddEvent function actually calls the IO multiplexing function provided by the operating system to complete the event addition. We still assume that the Redis instance is running on Linux using epoll mechanism, and the aeApiAddEvent function will call epoll_ctl function, add the event to listen. I actually introduced epoll to you in Lecture 9_ CTL function, which receives four parameters:

  • epoll instance;
  • The type of operation to be performed (add or modify);
  • File descriptor to listen on;
  • epoll_event type variable.

So, how does the calling process prepare epoll_ctl function needs parameters to complete the execution?

  • First, the epoll instance is the aeCreateEventLoop function I just introduced to you. It is created by calling the aeApiCreate function and saved in the apidata variable of the eventLoop structure. The type is aeApiState. Therefore, aeApiAddEvent function will get the variable first, as shown below:
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    //Get the aeApiState variable from the eventLoop structure, which holds the epoll instance
  aeApiState *state = eventLoop->apidata;
    ...
 }
  • Secondly, for the setting of the operation type to be performed, the aeApiAddEvent function will find the fd in the IO event array in the eventLoop structure according to the passed in file descriptor fd. Because each element of the IO event array corresponds to a file descriptor, and when the array is initialized, the value of each element is set to AE_NONE.

Therefore, if the type of the file descriptor fd to listen in the array is not AE_NONE indicates that the descriptor has been set, so the operation type is modify operation, corresponding to the macro definition epoll in epoll mechanism_ CTL_ MOD. Otherwise, the operation type is an add operation, corresponding to the macro definition epoll in the epoll mechanism_ CTL_ ADD. This part of the code is as follows:

//If the IO event corresponding to the file descriptor fd already exists, the operation type is modify, otherwise it is add
 int op = eventLoop->events[fd].mask == AE_NONE ? EPOLL_CTL_ADD : EPOLL_CTL_MOD;

Third, epoll_ The listening file descriptor required by the CTL function is the parameter fd received by the aeApiAddEvent function.

  • Finally, epoll_ The CTL function also needs an epoll_event type variable, so the aeApiAddEvent function is calling epoll_ Before the CTL function, a new epoll is created_ Event type variable ee. Then, the aeApiAddEvent function will set the listening event type and listening file descriptor in the variable ee.

The parameter mask of aeApiAddEvent function indicates the event type mask to listen. Therefore, the aeApiAddEvent function will set whether the event type ee listens to is EPOLLIN or EPOLLOUT according to whether the mask value is a readable (AE_READABLE) or writable (AE_WRITABLE) event. In this way, the read-write events in Redis event driven framework can correspond to the read-write events in epoll mechanism. The following code shows this part of the logic, you can have a look.

...
struct epoll_event ee = {0}; //Create epoll_event type variable
...
//Converts a readable or writable IO event type to the type that epoll listens to, epolin or EPOLLOUT
if (mask & AE_READABLE) ee.events |= EPOLLIN;
if (mask & AE_WRITABLE) ee.events |= EPOLLOUT;
ee.data.fd = fd;  //Assign the file descriptor to listen to ee
...  

Well, here, the aeApiAddEvent function prepares the epoll instance, operation type, listening file descriptor and epoll_event type variable, and then it calls epoll_ctl starts to actually create listening events, as shown below:

static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
...
//Call epoll_ctl actually creates listening events
if (epoll_ctl(state->epfd,op,fd,&ee) == -1) return -1;
return 0;
}

After understanding these codes, we can learn how the event driven framework encapsulates and implements the creation of IO events based on epoll. Then, after the Redis server starts running, the first IO event to listen to is a readable event, which corresponds to the connection request of the client. Specifically, the initServer function calls the aeCreateFileEvent function to create a readable event, and sets the callback function to acceptTcpHandler to handle the client connection.

Next, let's look at how to handle IO events once there is a client connection request?

Read event handling

When the Redis server receives the connection request from the client, it will use the registered acceptTcpHandler function for processing.

The acceptTcpHandler function is in networking C file, it accepts client connections and creates a connected socket cfd. Then, the acceptCommonHandler function (in the networking.c file) will be called, and the connected socket cfd just created will be passed to the acceptCommonHandler function as a parameter.

Viewing in networking file

void acceptTcpHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
    int cport, cfd, max = MAX_ACCEPTS_PER_CALL;
    char cip[NET_IP_STR_LEN];
    UNUSED(el);
    UNUSED(mask);
    UNUSED(privdata);

    while(max--) {
        cfd = anetTcpAccept(server.neterr, fd, cip, sizeof(cip), &cport);
        if (cfd == ANET_ERR) {
            if (errno != EWOULDBLOCK)
                serverLog(LL_WARNING,
                    "Accepting client connection: %s", server.neterr);
            return;
        }
        anetCloexec(cfd);
        serverLog(LL_VERBOSE,"Accepted %s:%d", cip, cport);
        acceptCommonHandler(connCreateAcceptedSocket(cfd),0,cip);
    }
}

At this time, aeCreateFileEvent function will create a listening event on the connected socket, with the type of AE_READABLE, the callback function is readQueryFromClient (in the networking.c file).

Well, here, the event driven framework adds listening to a client connected socket. Once the client sends a request to the server, the framework will call back the readQueryFromClient function to process the request. In this way, client requests can be processed through the event driven framework

The following code shows the procedure of the createClient function calling aeCreateFileEvent

client *createClient(int fd) {
...
    if (fd != -1) {
            ...
            //Call aeCreateFileEvent to listen for read events, corresponding to client read / write requests, and use the readQueryFromclient callback function to process
            if (aeCreateFileEvent(server.el,fd,AE_READABLE,readQueryFromClient, c) == AE_ERR)
            {
                close(fd);
                zfree(c);
                return NULL;
            } 
    }
...
}

In order to facilitate you to master the event creation process from listening to the connection request of the client to listening to the regular read-write request of the client, I drew the following figure, you can have a look.

After understanding the read event processing in the event driven framework, let's look at the write event processing.

Write event handling

After receiving the client request, the Redis instance will write the data to be returned to the client output buffer after processing the client command. The following figure shows the function call logic of this process:

In the Redis event driven framework, each cycle enters the event handler function, that is, calling aeProcessEvents in the main function aeMain of the framework to handle the triggered events or the time events before that, server. is invoked. The beforeSleep function in the C file performs some task processing, including calling the handleClientsWithPendingWrites function, which will write the data in the Redis sever client buffer back to the client.

The code given below is the main function aeMain of the event driven framework. Before each call to the aeProcessEvents function, the beforeSleep function will be called. You can see.

ae.c view in file

void aeMain(aeEventLoop *eventLoop) {
    eventLoop->stop = 0;
  while (!eventLoop->stop) {
      //If the beforeSleep function is not empty, the beforeSleep function is called
        if (eventLoop->beforesleep != NULL)
            eventLoop->beforesleep(eventLoop);
        //After calling the beforeSleep function, process the event
        aeProcessEvents(eventLoop, AE_ALL_EVENTS|AE_CALL_AFTER_SLEEP);
    }
}

Here you need to know that the handleClientsWithPendingWrites function of the beforeSleep function calls every client that is to be written back to the data, and then calls the writeToClient function to write back the data in the output buffer. The following figure shows the process. You can have a look.

However, if the data in the output buffer has not been written, the handleClientsWithPendingWrites function will call the aeCreateFileEvent function to create a writable event and set the callback function sendReplyToClient. The sendReplyToClient function will call the writeToClient function to write back the data.

The following code shows the basic flow of the handleClientsWithPendingWrite function. You can have a look.

networking.c view in file

/* This function is called just before entering the event loop, in the hope
 * we can just write the replies to the client output buffer without any
 * need to use a syscall in order to install the writable event handler,
 * get it called, and so forth. */
int handleClientsWithPendingWrites(void) {
    listIter li;
    listNode *ln;
    int processed = listLength(server.clients_pending_write);

    // Get the list of clients to write back
    listRewind(server.clients_pending_write,&li);
    // Traverse each client to be written back
    while((ln = listNext(&li))) {
        client *c = listNodeValue(ln);
        c->flags &= ~CLIENT_PENDING_WRITE;
        listDelNode(server.clients_pending_write,ln);

        /* If a client is protected, don't do anything,
         * that may trigger write error or recreate handler. */
        if (c->flags & CLIENT_PROTECTED) continue;

        /* Don't write to clients that are going to be closed anyway. */
        if (c->flags & CLIENT_CLOSE_ASAP) continue;

        // Call writeToClient to write the output buffer data of the current client back
        /* Try to write buffers to the client socket. */
        if (writeToClient(c,0) == C_ERR) continue;

        // If there is data to be written back
        /* If after the synchronous writes above we still have data to
         * output to the client, we need to install the writable handler. */
        if (clientHasPendingReplies(c)) {
            int ae_barrier = 0;
            /* For the fsync=always policy, we want that a given FD is never
             * served for reading and writing in the same event loop iteration,
             * so that in the middle of receiving the query, and serving it
             * to the client, we'll call beforeSleep() that will do the
             * actual fsync of AOF to disk. the write barrier ensures that. */
            // Create a listener for writable events and set callback functions
            if (server.aof_state == AOF_ON &&
                server.aof_fsync == AOF_FSYNC_ALWAYS)
            {
                ae_barrier = 1;
            }
            if (connSetWriteHandlerWithBarrier(c->conn, sendReplyToClient, ae_barrier) == C_ERR) {
                freeClientAsync(c);
            }
        }
    }
    return processed;
}

Well, what we just learned is the callback handler corresponding to the read-write event. In fact, in order to handle these events in time, the aeMain function of Redis event driven framework will also call the aeProcessEvents function repeatedly to detect triggered events and call the corresponding callback function for processing.

From the code of the aeProcessEvents function, we can see that the function will call the aeApiPoll function to query which of the monitored file descriptors are ready. Once the descriptor is ready, the aeProcessEvents function will call the corresponding callback function for processing according to the readable or writable type of the event. The basic flow of aeProcessEvents function call is as follows:

ae.c view in file

/* Process every pending time event, then every pending file event
 * (that may be registered by time event callbacks just processed).
 * Without special flags the function sleeps until some file event
 * fires, or when the next time event occurs (if any).
 *
 * If flags is 0, the function does nothing and returns.
 * if flags has AE_ALL_EVENTS set, all the kind of events are processed.
 * if flags has AE_FILE_EVENTS set, file events are processed.
 * if flags has AE_TIME_EVENTS set, time events are processed.
 * if flags has AE_DONT_WAIT set the function returns ASAP until all
 * the events that's possible to process without to wait are processed.
 * if flags has AE_CALL_AFTER_SLEEP set, the aftersleep callback is called.
 * if flags has AE_CALL_BEFORE_SLEEP set, the beforesleep callback is called.
 *
 * The function returns the number of events processed. */
int aeProcessEvents(aeEventLoop *eventLoop, int flags)
{
    int processed = 0, numevents;

    /* If there is no event handling, return immediately*/
    /* Nothing to do? return ASAP */
    if (!(flags & AE_TIME_EVENTS) && !(flags & AE_FILE_EVENTS)) return 0;

    /* Note that we want to call select() even if there are no
     * file events to process as long as we want to process time
     * events, in order to sleep until the next time event is ready
     * to fire. */
    /*If an IO event occurs or an emergency time event occurs, start processing*/
    // Note that even if there is no file event to process, as long as we want to process the time event, we also want to call select() to sleep before the next event is ready to trigger
    if (eventLoop->maxfd != -1 ||
        ((flags & AE_TIME_EVENTS) && !(flags & AE_DONT_WAIT))) {
        int j;
        struct timeval tv, *tvp;
        int64_t usUntilTimer = -1;

        if (flags & AE_TIME_EVENTS && !(flags & AE_DONT_WAIT))
            usUntilTimer = usUntilEarliestTimer(eventLoop);

        if (usUntilTimer >= 0) {
            tv.tv_sec = usUntilTimer / 1000000;
            tv.tv_usec = usUntilTimer % 1000000;
            tvp = &tv;
        } else {
            /* If we have to check for events but need to return
             * ASAP because of AE_DONT_WAIT we need to set the timeout
             * to zero */
            if (flags & AE_DONT_WAIT) {
                tv.tv_sec = tv.tv_usec = 0;
                tvp = &tv;
            } else {
                /* Otherwise we can block */
                tvp = NULL; /* wait forever */
            }
        }

        if (eventLoop->flags & AE_DONT_WAIT) {
            tv.tv_sec = tv.tv_usec = 0;
            tvp = &tv;
        }

        if (eventLoop->beforesleep != NULL && flags & AE_CALL_BEFORE_SLEEP)
            eventLoop->beforesleep(eventLoop);

        /* Call the multiplexing API, will return only on timeout or when
         * some event fires. */
        // Call aeApiPoll to get the ready descriptor
        numevents = aeApiPoll(eventLoop, tvp);

        /* After sleep callback. */
        if (eventLoop->aftersleep != NULL && flags & AE_CALL_AFTER_SLEEP)
            eventLoop->aftersleep(eventLoop);

        for (j = 0; j < numevents; j++) {
            aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
            int mask = eventLoop->fired[j].mask;
            int fd = eventLoop->fired[j].fd;
            int fired = 0; /* Number of events fired for current fd. */

            /* Normally we execute the readable event first, and the writable
             * event later. This is useful as sometimes we may be able
             * to serve the reply of a query immediately after processing the
             * query.
             *
             * However if AE_BARRIER is set in the mask, our application is
             * asking us to do the reverse: never fire the writable event
             * after the readable. In such a case, we invert the calls.
             * This is useful when, for instance, we want to do things
             * in the beforeSleep() hook, like fsyncing a file to disk,
             * before replying to a client. */
            int invert = fe->mask & AE_BARRIER;

            /* Note the "fe->mask & mask & ..." code: maybe an already
             * processed event removed an element that fired and we still
             * didn't processed, so we check if the event is still valid.
             *
             * Fire the readable event if the call sequence is not
             * inverted. */
            // If a readable event is triggered, call the read event callback processing function set during event registration
            if (!invert && fe->mask & mask & AE_READABLE) {
                fe->rfileProc(eventLoop,fd,fe->clientData,mask);
                fired++;
                fe = &eventLoop->events[fd]; /* Refresh in case of resize. */
            }

            // If a writable event is triggered, call the write event callback processing function set during event registration
            /* Fire the writable event. */
            if (fe->mask & mask & AE_WRITABLE) {
                if (!fired || fe->wfileProc != fe->rfileProc) {
                    fe->wfileProc(eventLoop,fd,fe->clientData,mask);
                    fired++;
                }
            }

            /* If we have to invert the call, fire the readable event now
             * after the writable one. */
            if (invert) {
                fe = &eventLoop->events[fd]; /* Refresh in case of resize. */
                if ((fe->mask & mask & AE_READABLE) &&
                    (!fired || fe->wfileProc != fe->rfileProc))
                {
                    fe->rfileProc(eventLoop,fd,fe->clientData,mask);
                    fired++;
                }
            }

            processed++;
        }
    }
    /* Check time events */
    /* Check whether there are time events. If so, call the processTimeEvents function for processing */
    if (flags & AE_TIME_EVENTS)
        processed += processTimeEvents(eventLoop);
    /* Returns the file or time that has been processed*/
    return processed; /* return the number of processed file/time events */
}

Here, we have learned about the creation function aeCreateFileEvent of IO events, as well as the corresponding read-write events and their processing functions when processing client requests. Next, let's look at how time events are created and processed in the event driven framework.

Time event handling

In fact, the processing of time events is simpler than that IO events have readable, writable and barrier types, and different types of IO events have different callback functions. Next, let's learn its definition, creation, callback function and trigger processing respectively.

Time event definition

First, let's look at the structure definition of time events. The code is as follows:

ae.h view in file

/* Time event structure */
typedef struct aeTimeEvent {
    // Time event ID
    long long id; /* time event identifier. */
    // Microsecond timestamp of event arrival
    monotime when;
    // Processing function after time event triggering
    aeTimeProc *timeProc;
    // Handler after event
    aeEventFinalizerProc *finalizerProc;
    // Event related private data
    void *clientData;
    // Forward pointer of time event linked list
    struct aeTimeEvent *prev;
    // Backward pointer of time event linked list
    struct aeTimeEvent *next;
    int refcount; /* refcount to prevent timer events from being
         * freed in recursive time event calls. */
} aeTimeEvent;

The main variables in the time event structure include the time stamp when the time event recorded in microseconds is triggered, and the processing function * timeProc after the time event is triggered. In addition, the structure of time events also includes forward and backward pointers * prev and * next, which indicates that time events are organized in the form of a linked list.

After understanding the definition of time event structure, let's take a look at how time events are created.

Time event creation

Similar to the aeCreateFileEvent function used for IO event creation, the time event creation function is the aeCreateTimeEvent function. The prototype definition of this function is as follows:

ae.c view in file

long long aeCreateTimeEvent(aeEventLoop *eventLoop, long long milliseconds,
        aeTimeProc *proc, void *clientData,
        aeEventFinalizerProc *finalizerProc)
{
    long long id = eventLoop->timeEventNextId++;
    aeTimeEvent *te;

    te = zmalloc(sizeof(*te));
    if (te == NULL) return AE_ERR;
    te->id = id;
    te->when = getMonotonicUs() + milliseconds * 1000;
    te->timeProc = proc;
    te->finalizerProc = finalizerProc;
    te->clientData = clientData;
    te->prev = NULL;
    te->next = eventLoop->timeEventHead;
    te->refcount = 0;
    if (te->next)
        te->next->prev = te;
    eventLoop->timeEventHead = te;
    return id;
}

Among its parameters, we need to focus on two, so that we can understand the processing of time events. One is milliseconds, which is the time between the trigger time of the created time event and the current time, expressed in milliseconds. The other is * proc, which is the callback function after the created time event is triggered.

The execution logic of aeCreateTimeEvent function is not complicated. It mainly creates a time event variable te, initializes it, and inserts it into the time event chain list in the eventLoop of the framework loop process structure. During this process, aeCreateTimeEvent function will call aeAddMillisecondsToNow function to calculate the specific trigger timestamp of the created time event according to the passed milliseconds parameter and assign it to te.

In fact, during the initialization of Redis server, in addition to creating monitored IO events, it will also call aeCreateTimeEvent function to create time events. The following code shows the call of initServer function to aeCreateTimeEvent function:

initServer() {
    ...
    //Create time event
    if (aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL) == AE_ERR){
    ... //Error message
    }
}

From the code, we can see that the callback function after the time event is triggered is serverCron. So next, let's look at the serverCron function.

Time event callback function

The serverCron function is in server C file.

  • On the one hand, it will call some functions in sequence to execute some background tasks after the time event is triggered. For example, the serverCron function will check whether there is a process end signal, and if so, execute the server shutdown operation. serverCron will call the databaseCron function to process expired key s or rehash. You can refer to the code given below:
...
//If the process end signal is received, execute the server shutdown operation
 if (server.shutdown_asap) {
        if (prepareForShutdown(SHUTDOWN_NOFLAGS) == C_OK) exit(0);
        ...
 }
...
clientCron();  //Perform asynchronous operations on the client
databaseCron(); //Perform background operations on the database
...
  • On the other hand, the serverCron function also performs some tasks periodically at different frequencies by executing the macro run_with_period.

run_ with_ The period macro is defined as follows. The macro definition will be based on the Redis instance configuration file Redis The hz value defined in conf is used to judge whether the timestamp represented by the parameter ms has arrived. Once it arrives, serverCron can perform the corresponding tasks.

server.h view in file

/* Using the following macro you can run code inside serverCron() with the
 * specified period, specified in milliseconds.
 * The actual resolution depends on server.hz. */
#define run_with_period(_ms_) if ((_ms_ <= 1000/server.hz) || !(server.cronloops%((_ms_)/(1000/server.hz))))

For example, the serverCron function will check the AOF file for write errors once a second. If so, serverCron will call the flushAppendOnlyFile function to brush back the cached data of the AOF file again. The following code shows this periodic task:

serverCron() {
   ...
   //Execute once every 1 second to check whether there is a write error in AOF
   run_with_period(1000) {
        if (server.aof_last_write_status == C_ERR)
            flushAppendOnlyFile(0);
    }
   ...
}

If you want to know more about periodic tasks, you can read the serverCron function in detail to run_ with_ The period macro defines the block of code contained.

Well, after understanding the callback function serverCron after the time event is triggered, let's finally look at how the time event is triggered.

Trigger processing of time events

In fact, the detection and triggering of time events is relatively simple. The aeMain function of the event driven framework will call the aeProcessEvents function repeatedly to process various events. At the end of the process, the aeProcessEvents function will call the processTimeEvents function to process the corresponding task.

aeProcessEvents(){
	...
    //Detect whether the time event is triggered
    if (flags & AE_TIME_EVENTS)
            processed += processTimeEvents(eventLoop);
    ...
}

Then, for the proecessTimeEvent function, its basic process is to take each event from the time event chain list one by one, and then judge whether the trigger timestamp of the event has been met according to the current time. If it is satisfied, the callback function corresponding to the event will be called for processing. In this way, the periodic task can be executed in the aeProcessEvents function that is executed repeatedly.

The following code shows the basic flow of the processTimeEvents function. You can take another look.

ae.c view in file

/* Process time events */
static int processTimeEvents(aeEventLoop *eventLoop) {
    int processed = 0;
    aeTimeEvent *te;
    long long maxId;

    // Extract events from the time event list
    te = eventLoop->timeEventHead;
    maxId = eventLoop->timeEventNextId-1;
    // Get current time
    monotime now = getMonotonicUs();
    while(te) {
        long long id;

        /* Remove events scheduled for deletion. */
        // Delete scheduled events
        if (te->id == AE_DELETED_EVENT_ID) {
            aeTimeEvent *next = te->next;
            /* If a reference exists for this timer event,
             * don't free it. This is currently incremented
             * for recursive timerProc calls */
            // If there is a reference to this timer event, do not release it. Because it will increase due to recursive timerProc calls
            if (te->refcount) {
                te = next;
                continue;
            }
            if (te->prev)
                te->prev->next = te->next;
            else
                eventLoop->timeEventHead = te->next;
            if (te->next)
                te->next->prev = te->prev;
            if (te->finalizerProc) {
                te->finalizerProc(eventLoop, te->clientData);
                now = getMonotonicUs();
            }
            zfree(te);
            te = next;
            continue;
        }

        /* Make sure we don't process time events created by time events in
         * this iteration. Note that this check is currently useless: we always
         * add new timers on the head, however if we change the implementation
         * detail, this check may be useful again: we keep it here for future
         * defense. */
        if (te->id > maxId) {
            te = te->next;
            continue;
        }

        // The time of the time event is less than the current point in time
        if (te->when <= now) {
            int retval;

            id = te->id;
            te->refcount++;
            retval = te->timeProc(eventLoop, id, te->clientData);
            te->refcount--;
            processed++;
            now = getMonotonicUs();
            if (retval != AE_NOMORE) {
                te->when = now + retval * 1000;
            } else {
                te->id = AE_DELETED_EVENT_ID;
            }
        }
        // Get next time event
        te = te->next;
    }
    return processed;
}

In this lesson, I will introduce you to two types of events in Redis event driven framework: IO events and time events.

For IO events, it can be further divided into readable, writable and barrier events. Because readable and writable events are widely used in Redis and client communication and request processing, today we focus on these two IO events. After the Redis server creates a Socket, it will register readable events and use the acceptTCPHandler callback function to process the connection request of the client.

After the connection between the server and the client is established, the server will listen for readable events on the connected socket and use the readQueryFromClient function to process the client's read-write requests. Here, you need to pay more attention to whether the request sent by the client is a read or write operation. For the server, it is necessary to read the client's request and parse it. Therefore, the server registers readable events on the connected socket of the client. When the instance needs to write back data to the client, the instance will register a writable event in the event driven framework and use sendReplyToClient as a callback function to write the data in the buffer back to the client.

I have summarized a table so that you can review the correspondence between IO events and corresponding sockets and callback functions.

Then, for time events, it is mainly used to register some periodically executed tasks in the event driven framework for background processing by Redis server. The callback function of time event is serverCron function. You can read further to understand the specific tasks.

When Redis calls aeApiCreate and aeApiAddEvent, what conditions are used to determine which file IO multiplexing function to call?

At AE C, according to different platforms, first define the encapsulated IO multiplexing functions to be imported. aeApiCreate and aeApiAddEvent functions are defined in the corresponding files of each platform. During execution, the function logic of the corresponding platform will be executed.

/* Include the best multiplexing layer supported by this system.
 * The following should be ordered by performances, descending. */
// The following shall be arranged in descending order according to performance
#ifdef HAVE_EVPORT
#include "ae_evport.c" // Solaris
#else
    #ifdef HAVE_EPOLL
    #include "ae_epoll.c" // Linux
    #else
        #ifdef HAVE_KQUEUE
        #include "ae_kqueue.c" // MacOS
        #else
        #include "ae_select.c" // windows
        #endif
    #endif
#endif

Topics: Database Redis