[Special Topic on Redis source code analysis] from the essence analysis, why is the data you wrote in Redis missing?

Posted by mtucker6784 on Fri, 12 Nov 2021 21:03:21 +0100

Introduction to Redis database

As a mature data storage middleware, Redis provides perfect data management functions, such as data expiration mentioned earlier and the data evict strategy we want to talk about today.

Data locality principle

The principle of locality, which runs through computer science, can clearly tell you that there are two phenomena of locality in cache scenarios,

  1. The higher the probability that the latest data will be accessed next time.
  2. The more data is accessed, the higher the probability of being accessed next time. Here we can simply think that the higher the probability of being visited, the greater the value.

Based on the above two phenomena, we can specify two strategies

  1. Eliminate the earliest unreachable data, LRU(Least Recently Used).
  2. LFU(Least Frequently Used) is eliminated.

In addition to LRU and LFU, it can also be eliminated at random. This is to treat the data equally, select a part at random and eliminate it. In fact, Redis implements the above three strategies. You can configure an elimination strategy according to specific data.

In addition to the above three strategies, Redis also provides a TTL elimination strategy for expired data, which is actually to eliminate the smallest data in the remaining TTL. In addition, it should be noted that the obsolescence strategy of Redis can be configured on global or expired data.

Redis problem background

We sometimes encounter such things. When we want to write some data to redis, we query again and find that the data is missing. What's the matter? The data is obviously out of date. Why does it still occupy memory?

  • We know that Redis is mainly based on memory for high-performance and highly concurrent read and write operations.

  • However, the memory is limited. For example, redis can only use 10G. What will you do if you write 20G data into it? Of course, 10G data will be killed, and then 10G data will be retained. What data did you kill? What data is retained? This should be selected according to the elimination mechanism of redis. The data is obviously expired and still occupies this memory. These are determined by the expiration policy of redis.

Expiration policy of Redis

There are two expiration policies for Redis: periodic deletion and lazy deletion:

  • Periodically delete: by default, Redis randomly selects some key s with expiration time set every 100ms to check whether they have expired. If they have expired, they will be deleted. (cannot be completely deleted)

  • Lazy deletion: when querying data directly, redis will first check whether some of the data has expired. If it has expired, it will be deleted. (data cannot be completely deleted)

Redis's memory deletion strategy

There are some problems with regular deletion and lazy deletion. If you miss many expired keys in regular deletion, and then you don't check them in time, you don't go through lazy deletion. At this time, a large number of expired keys may accumulate in memory, resulting in the depletion of redis memory blocks

Redis memory elimination mechanism

Data obsolescence in Redis actually refers to cleaning up some data to save memory space when memory space is insufficient. Although Redis has an expired policy, it can clean up data beyond the expiration date.

What if the storage space is not enough after the expired data is cleaned up? Is it possible to delete some more data? In the cache scenario, the answer to this question is yes, because even if the data cannot be found in Redis, it can also be found from the cached data source.

Therefore, in some specific business scenarios, we can discard some old data in Redis to make room for new data.

The memory elimination mechanism includes the following ways:

As we all know, redis is an in memory database, and all key value pairs are stored in memory. When there is more data, some key value pairs will be eliminated due to limited memory, so that the memory has enough space to save new key value pairs. In redis, the memory usage is limited by setting server.maxmemory (server.maxmemory is 0, and memory is not limited). When it reaches server.maxmemory, the elimination mechanism will be triggered.

  • noeviction: when the memory is insufficient to hold the newly written data, the new write operation will report an error, which is generally not used.
  • Allkeys LRU: when the memory is insufficient to accommodate the newly written data, remove the least recently used key (the most commonly used key) in the key space.
  • All keys random: when the memory is insufficient to accommodate the newly written data, a key is randomly removed from the key space (rarely used).
  • Volatile LRU: when the memory is insufficient to accommodate the newly written data, remove the least recently used key (rarely used) in the key space with the expiration time set.
  • Volatile random: when the memory is insufficient to hold the newly written data, a key is randomly removed from the key space with the expiration time set.
  • Volatile TTL: when the memory is insufficient to accommodate the newly written data, the keys with earlier expiration time are removed first in the key space with expiration time set.
How to set the memory elimination mechanism:

In redis.conf:

  • maxmemory 100mb maximum memory setting, if 0 represents infinite;
  • maxmemory-policy: Allkeys-lru

Redis checks whether the memory used exceeds server.maxmemory every time it executes a command from the client. If it exceeds server.maxmemory, it will eliminate the data.

int processCommand(client *c) {
	......//server.maxmemory is 0, which means there is no memory limit
	if (server.maxmemory) {
	//Judge the memory and eliminate the memory
        int retval = freeMemoryIfNeeded();
        ......
    }
    ......
}
When does evict execute

Every time Redis processes a command, it will check the memory space and try to execute evict. In some cases, evict does not need to be executed. This can be seen from issafetoperformeevictions.

static int isSafeToPerformEvictions(void) {
    /* There is no lua script execution timeout or data timeout */
    if (server.lua_timedout || server.loading) return 0;

    /* Only the master needs to do evict */
    if (server.masterhost && server.repl_slave_ignore_maxmemory) return 0;

    /* When the client is paused, evict is not required because the data will not change */
    if (checkClientPauseTimeoutAndReturnIfPaused()) return 0;

    return 1;
}

Perform a recycle eviction operation

int performEvictions(void) {
    if (!isSafeToPerformEvictions()) return EVICT_OK;

    int keys_freed = 0;
    size_t mem_reported, mem_tofree;
    long long mem_freed; /* May be negative */
    mstime_t latency, eviction_latency;
    long long delta;
    int slaves = listLength(server.slaves);
    int result = EVICT_FAIL;

    if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK)
        return EVICT_OK;

    if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
        return EVICT_FAIL;  /* We need to free memory, but policy forbids. */

    unsigned long eviction_time_limit_us = evictionTimeLimitUs();

    mem_freed = 0;

    latencyStartMonitor(latency);

    monotime evictionTimer;
    elapsedStart(&evictionTimer);

    while (mem_freed < (long long)mem_tofree) {
        int j, k, i;
        static unsigned int next_db = 0;
        sds bestkey = NULL;
        int bestdbid;
        redisDb *db;
        dict *dict;
        dictEntry *de;

        if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
            server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
        {
            struct evictionPoolEntry *pool = EvictionPoolLRU;

            while(bestkey == NULL) {
                unsigned long total_keys = 0, keys;

                /* We don't want to make local-db choices when expiring keys,
                 * so to start populate the eviction pool sampling keys from
                 * every DB. 
                 * First, sample the key from dict and put it into the pool */
                for (i = 0; i < server.dbnum; i++) {
                    db = server.db+i;
                    dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                            db->dict : db->expires;
                    if ((keys = dictSize(dict)) != 0) {
                        evictionPoolPopulate(i, dict, db->dict, pool);
                        total_keys += keys;
                    }
                }
                if (!total_keys) break; /* No keys to evict. */

                /* Select the most suitable key from the pool */
                for (k = EVPOOL_SIZE-1; k >= 0; k--) {
                    if (pool[k].key == NULL) continue;
                    bestdbid = pool[k].dbid;

                    if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
                        de = dictFind(server.db[pool[k].dbid].dict,
                            pool[k].key);
                    } else {
                        de = dictFind(server.db[pool[k].dbid].expires,
                            pool[k].key);
                    }

                    /* Remove from obsolescence pool */
                    if (pool[k].key != pool[k].cached)
                        sdsfree(pool[k].key);
                    pool[k].key = NULL;
                    pool[k].idle = 0;

                    /* If the key exists, is our pick. Otherwise it is
                     * a ghost and we need to try the next element. */
                    if (de) {
                        bestkey = dictGetKey(de);
                        break;
                    } else {
                        /* Ghost... Iterate again. */
                    }
                }
            }
        }

        /* volatile-random and allkeys-random strategy */
        else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
                 server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
        {
            /* When randomly eliminated, we use the static variable next_db to store which db is currently executed*/
            for (i = 0; i < server.dbnum; i++) {
                j = (++next_db) % server.dbnum;
                db = server.db+j;
                dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
                        db->dict : db->expires;
                if (dictSize(dict) != 0) {
                    de = dictGetRandomKey(dict);
                    bestkey = dictGetKey(de);
                    bestdbid = j;
                    break;
                }
            }
        }

        /* Remove the selected key from dict */
        if (bestkey) {
            db = server.db+bestdbid;
            robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
            propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
            /*We separately calculate the amount of memory released by db*Delete(). In fact, the memory required for AOF and replica propagation may be larger than the memory we are releasing (delete key)
            ,It would be interesting if we thought about that. The same is true for CSC failure messages generated by signalModifiedKey.
            Because AOF and output buffer memory will eventually be released, we only need to care about the memory used by the key space.*/
            delta = (long long) zmalloc_used_memory();
            latencyStartMonitor(eviction_latency);
            if (server.lazyfree_lazy_eviction)
                dbAsyncDelete(db,keyobj);
            else
                dbSyncDelete(db,keyobj);
            latencyEndMonitor(eviction_latency);
            latencyAddSampleIfNeeded("eviction-del",eviction_latency);
            delta -= (long long) zmalloc_used_memory();
            mem_freed += delta;
            server.stat_evictedkeys++;
            signalModifiedKey(NULL,db,keyobj);
            notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
                keyobj, db->id);
            decrRefCount(keyobj);
            keys_freed++;

            if (keys_freed % 16 == 0) {
                /*When the memory to be freed starts to be large enough, we may spend too much time here to transfer the data to the copy fast enough, so we will force the transfer in the loop.*/
                if (slaves) flushSlavesOutputBuffers();

                /*Usually our stop condition is to release a fixed, pre calculated amount of memory. However, when we * delete an object in another thread,
                It is best to * check whether the target * memory has been reached from time to time, because the amount of "MEM \ u free" is only calculated * in the dbAsyncDelete() call,
                Threads can * always free memory.*/
                if (server.lazyfree_lazy_eviction) {
                    if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                        break;
                    }
                }

                /*After a period of time, exit the loop as soon as possible - even if the memory limit * has not been reached. If we suddenly need to release a lot of memory, don't spend too much time here.*/
                if (elapsedUs(evictionTimer) > eviction_time_limit_us) {
                    // We still need to free memory - start eviction timer proc
                    if (!isEvictionProcRunning) {
                        isEvictionProcRunning = 1;
                        aeCreateTimeEvent(server.el, 0,
                                evictionTimeProc, NULL, NULL);
                    }
                    break;
                }
            }
        } else {
            goto cant_free; /* nothing to free... */
        }
    }
    /* at this point, the memory is OK, or we have reached the time limit */
    result = (isEvictionProcRunning) ? EVICT_RUNNING : EVICT_OK;

cant_free:
    if (result == EVICT_FAIL) {
        /* At this point, we have run out of evictable items.  It's possible
         * that some items are being freed in the lazyfree thread.  Perform a
         * short wait here if such jobs exist, but don't wait long.  */
        if (bioPendingJobsOfType(BIO_LAZY_FREE)) {
            usleep(eviction_time_limit_us);
            if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                result = EVICT_OK;
            }
        }
    }

    latencyEndMonitor(latency);
    latencyAddSampleIfNeeded("eviction-cycle",latency);
    return result;
}
Release resources if needed
int freeMemoryIfNeeded(void) {
    //Get redis memory usage
    mem_reported = zmalloc_used_memory();
    if (mem_reported <= server.maxmemory) return C_OK; 
    mem_used = mem_reported;
    if (slaves) {
        listRewind(server.slaves,&li);
        //Subtract the output buffer of the slave
        while((ln = listNext(&li))) {
            ......
        }
    }
     //Memory usage of aof buffer
    if (server.aof_state != AOF_OFF) {
        mem_used -= sdslen(server.aof_buf);
        mem_used -= aofRewriteBufferSize();
    }
    /* Check if we are still over the memory limit. */
    if (mem_used <= server.maxmemory) return C_OK;
    /* Compute how much memory we need to free. */
    mem_tofree = mem_used - server.maxmemory;
    mem_freed = 0;
    if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
        goto cant_free; /* Prohibition of expulsion data */
    //Data expulsion
    while (mem_freed < mem_tofree) {
    	......
    	sds bestkey = NULL;
        if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
            server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
        {   //ttl or lru elimination mechanism
            struct evictionPoolEntry *pool = EvictionPoolLRU;
            while(bestkey == NULL) {
                unsigned long total_keys = 0, keys;
                for (i = 0; i < server.dbnum; i++) {
                    db = server.db+i;
                    dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                            db->dict : db->expires;
                    if ((keys = dictSize(dict)) != 0) {
                        evictionPoolPopulate(i, dict, db->dict, pool);
                        //pool the evictionPool built according to the mechanism
                    }
                }/*Select a key value that still exists in the database from back to front in the evictionPool to evict*/
                for (k = EVPOOL_SIZE-1; k >= 0; k--) {
                    if (pool[k].key == NULL) continue;
                    bestdbid = pool[k].dbid;
                    if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
                        de = dictFind(server.db[pool[k].dbid].dict,
                            pool[k].key);
                    } else {
                        de = dictFind(server.db[pool[k].dbid].expires,
                            pool[k].key);
                    }
                    ......
                    if (de) {
                        bestkey = dictGetKey(de);
                        break;
                    } else {
                        /* Ghost... Iterate again. */
                    }
                }
            }
        }
        else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
                 server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
        {   /* Select a key value pair randomly from DB - > dict or DB - > expires to eliminate*/
            for (i = 0; i < server.dbnum; i++) {
                j = (++next_db) % server.dbnum;
                db = server.db+j;
                dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
                        db->dict : db->expires;
                if (dictSize(dict) != 0) {
                    de = dictGetRandomKey(dict);
                    bestkey = dictGetKey(de);
                    bestdbid = j;
                    break;
                }
            }
        }//Expel the selected key value pair
        if (bestkey) {
            db = server.db+bestdbid;
            robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
            propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
            delta = (long long) zmalloc_used_memory();
            if (server.lazyfree_lazy_eviction)
                dbAsyncDelete(db,keyobj);
            else
                dbSyncDelete(db,keyobj);
            delta -= (long long) zmalloc_used_memory();
            mem_freed += delta;
            server.stat_evictedkeys++;
            decrRefCount(keyobj);
            keys_freed++;
            if (slaves) flushSlavesOutputBuffers();
        }
 
    }
    return C_OK;
cant_free://Lazy release of memory space
    while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
        if (((mem_reported - zmalloc_used_memory()) + mem_freed) >= mem_tofree)
            break;
        usleep(1000);
    }
    return C_ERR;
}

According to the elimination mechanism, key value pairs are selected from randomly selected key value pairs to build the evictionPool

  • 1) LRU data elimination mechanism: randomly select several key value pairs in the data set, and select LRU the largest part of the key value pairs to build the evictionPool.

The essence of LRU is to eliminate the data that has not been accessed for the longest time. One implementation method is to use the linked list. If the data is accessed, move it to the head of the linked list, and the tail of the chain must be the data that has not been accessed for the longest time. However, the query time complexity of the single linked list is O(n), so the hash table is generally used to speed up the query of data, such as LinkedHashMap in Java. However, redis does not adopt this strategy. Redis simply records the latest access timestamp of each Key and selects the earliest data by timestamp sorting. Of course, it is too slow to sort all the data. Therefore, redis selects a batch of data each time and then implements the elimination strategy from this batch of data. This has the advantage of high performance, but the disadvantage is that it is not necessarily the global optimization, but only the local optimization.

There is a 24 bit lru field in the redisObject. These 24 bits save the time stamp (seconds) of data access. Of course, 24 bits cannot save the complete unix time stamp. There will be a reincarnation in less than 200 days. Of course, this is enough.

robj *lookupKey(redisDb *db, robj *key, int flags) {
    dictEntry *de = dictFind(db->dict,key->ptr);
    if (de) {
        robj *val = dictGetVal(de);
        if (!hasActiveChildProcess() && !(flags & LOOKUP_NOTOUCH)){
            if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
                updateLFU(val);
            } else {
                val->lru = LRU_CLOCK();  // The LRU timestamp is updated here  
            }
        }
        return val;
    } else {
        return NULL;
    }
}
  • 2) LFU data elimination mechanism: randomly select several key value pairs in the data set, and select the smallest part of the key value pairs of LFU to build the evictionPool.

The lru field will also be used by the LFU, so you can see in the lookupkey above that the lru will also be updated when using the LFU policy. LFU in Redis appears a little later and was introduced in Redis 4.0, so lru field is reused here. There is only one way to implement lru, which is to record the number of times the key is accessed. However, there is a problem to be considered in the implementation of lru. Although LFU eliminates data according to the access frequency, new data may come at any time in Redis, and its old data may be accessed more times. The current low number of new data accesses does not mean that there will be less visits in the future. If this is not considered, new data may be eliminated as soon as it comes, This is obviously unreasonable.

In order to solve the above problems, Redis divides 24 bits into two parts, a high 16 bit timestamp (minute level) and a low 8-bit counter. Each new data counter initially has a certain value, so as to ensure that it can go out of the novice village, and then the count value will decay over time, so as to ensure that the old but not commonly used data can be eliminated. Let's take a look at the specific implementation code.

The counter has only 8 binary bits. How can it count to 255 at best? Of course, Redis does not use exact counting, but approximate counting. The specific implementation is counter probabilistic growth. The larger the counter value, the slower the growth rate. The specific growth logic is as follows:

/* Update the counter of the lfu. The counter is not an accurate value, but the probability increases. The larger the counter, the slower the growth rate
 * It can only reflect the popularity of a certain time window, but cannot reflect the specific number of visits */
uint8_t LFULogIncr(uint8_t counter) {
    if (counter == 255) return 255;
    double r = (double)rand()/RAND_MAX;
    double baseval = counter - LFU_INIT_VAL; // LFU_INIT_VAL is 5
    if (baseval < 0) baseval = 0;
    double p = 1.0/(baseval*server.lfu_log_factor+1);  // server.lfu_log_factor can be configured. The default value is 10 
    if (r < p) counter++;
    return counter;
}

LFU counter attenuation: if the counter keeps growing, even if the growth rate is very slow, it will increase to the maximum value of 255 one day, resulting in the inability to filter the data. Therefore, an attenuation strategy should be added to it. The idea is that the counter will decay over time. The specific code is as follows:

/* lfu counter Attenuation logic, lfu_decay_time refers to how long counter decays 1, such as lfu_decay_time == 10
 * Indicates that counter decays once every 10 minutes, but LFU_ decay_ Counter does not decay when time is 0 */
unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    if (num_periods)
        counter = (num_periods > counter) ? 0 : counter - num_periods;
    return counter;
}
  • 3) TTL data elimination mechanism: randomly select several key value pairs from the data set with expiration time, and select the largest part of TTL key value pairs to build the evictionPool.
void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
    int j, k, count;
    dictEntry *samples[server.maxmemory_samples];
    //Randomly select key value pairs from the dataset sampledict
    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
    for (j = 0; j < count; j++) {
        de = samples[j];
        key = dictGetKey(de);
        if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {
            if (sampledict != keydict) de = dictFind(keydict, key);
            o = dictGetVal(de);
        }
        if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
            idle = estimateObjectIdleTime(o);//LRU mechanism, calculate LRU value
        } else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
            idle = 255-LFUDecrAndReturn(o);//LFU mechanism, calculate LFU value
        } else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
            idle = ULLONG_MAX - (long)dictGetVal(de);//TTL mechanism, calculate TTL value
        }
        k = 0;
        //Insert key value pairs into the pool according to the size of the idle (the mechanism of insertion sorting), but only keep the evpool with the largest idle_ Size
        while (k < EVPOOL_SIZE &&pool[k].key &&pool[k].idle < idle) 
        	k++;
        if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
            continue;
        } else if (k < EVPOOL_SIZE && pool[k].key == NULL) {
            /* Inserting into empty position. No setup needed before insert. */
        } else {
            if (pool[EVPOOL_SIZE-1].key == NULL) {
                sds cached = pool[EVPOOL_SIZE-1].cached;
                memmove(pool+k+1,pool+k,sizeof(pool[0])*(EVPOOL_SIZE-k-1));
                pool[k].cached = cached;
            } else {
                k--;
                sds cached = pool[0].cached; /* Save SDS before overwriting. */
                if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
                memmove(pool,pool+1,sizeof(pool[0])*k);
                pool[k].cached = cached;
            }
        }
        int klen = sdslen(key);
        if (klen > EVPOOL_CACHED_SDS_SIZE) {
            pool[k].key = sdsdup(key);
        } else {
            memcpy(pool[k].cached,key,klen+1);
            sdssetlen(pool[k].cached,klen);
            pool[k].key = pool[k].cached;
        }
        pool[k].idle = idle;
        pool[k].dbid = dbid;
    }
}

Reference learning

Topics: Redis Next.js delta sds