Redis source code reading: dictionary dict implementation

Posted by robpoe on Sun, 12 May 2019 10:49:44 +0200

Redis source code reading: dictionary dict implementation

Code version: Branch 5.0
Github address: Poke me

Main Data Structures of Dictionaries

Overall, Redis's dictionary uses hash tables as the underlying implementation. A dictionary contains multiple hash table nodes, in which key-value pairs are stored. Specific structure, from bottom to top, is defined as follows

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    unsigned long iterators; /* number of iterators currently running */
} dict;

typedef struct dictIterator {
    dict *d;
    long index;
    int table, safe;
    dictEntry *entry, *nextEntry;
    /* unsafe iterator fingerprint for misuse detection. */
    long long fingerprint;
} dictIterator;
  • dictEntry actually defines a data structure for storing key-value pairs. It contains a void pointer that can be converted to any type of pointer for storing key values. The v attribute can be a pointer, a 64-bit integer or a double-precision floating-point number. next pointer points to another hash table node. When the key value conflicts, the nodes with the same key value can form a linked list.

  • The data structure dictType stores attributes related to a particular dictionary, i.e. function pointers that handle key values.

  • dictht data structure defines the structure of hash table

    • dictEntry **table: Points to a set of hash nodes
    • Size: The size of the hash table
    • sizemask: A mask, always equal to size-1, is used to calculate the position of the key with the hash value.
    • Used: Represents the number of currently used
  • dict is the dictionary structure

    • type: A function that points to a dictType pointer and stores processing keys
    • privdata: Used by functions in type
    • ht[2]: Each dictionary structure contains two hash tables for rehashing, one new table and one old table.
    • Rehashidx: The index in the rehash process can represent the current progress, and -1 represents that it has not started yet.
    • Iterators represent the number of iterators currently running

Creating dictionaries and hashing algorithms

Create dictionary

Several functions related to dictionary creation are as follows: allocating space to dict, passing in dictType and privData parameters, and initializing the values in the structure in turn.

/* Create a new hash table */
dict *dictCreate(dictType *type,
        void *privDataPtr)
{
    dict *d = zmalloc(sizeof(*d));

    _dictInit(d,type,privDataPtr);
    return d;
}

/* Initialize the hash table */
int _dictInit(dict *d, dictType *type,
        void *privDataPtr)
{
    _dictReset(&d->ht[0]);
    _dictReset(&d->ht[1]);
    d->type = type;
    d->privdata = privDataPtr;
    d->rehashidx = -1;
    d->iterators = 0;
    return DICT_OK;
}

/* Reset a hash table already initialized with ht_init().
 * NOTE: This function should only be called by ht_destroy(). */
static void _dictReset(dictht *ht)
{
    ht->table = NULL;
    ht->size = 0;
    ht->sizemask = 0;
    ht->used = 0;
}

Hash algorithm

When a pair of key value pairs are added to the dictionary, Redis generates the corresponding hash value based on the key value, which together with the sizemask in dictht determines the location where the key value should be stored. It is noteworthy that the current version of EDIS uses a hash algorithm called Siphash algorithm. Compared with Murmur Hash2 algorithm, it has more advantages in performance. See in detail: Siphash Wiki The implementation of Siphash in Redis is in the siphash.c file, so we only need to care about how to use it, so we haven't studied the implementation of this algorithm in depth.

/* Add an element to the target hash table */
int dictAdd(dict *d, void *key, void *val)
{
    dictEntry *entry = dictAddRaw(d,key,NULL);

    if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val);
    return DICT_OK;
}

/* Low level add or find:
 * This function adds the entry but instead of setting a value returns the
 * dictEntry structure to the user, that will make sure to fill the value
 * field as he wishes.
 *
 * This function is also directly exposed to the user API to be called
 * mainly in order to store non-pointers inside the hash value, example:
 *
 * entry = dictAddRaw(dict,mykey,NULL);
 * if (entry != NULL) dictSetSignedIntegerVal(entry,1000);
 *
 * Return values:
 *
 * If key already exists NULL is returned, and "*existing" is populated
 * with the existing entry if existing is not NULL.
 *
 * If key was added, the hash entry is returned to be manipulated by the caller.
 */
dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
    long index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    /* Get the index of the new element, or -1 if
     * the element already exists. */
    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
        return NULL;

    /* Allocate the memory and store the new entry.
     * Insert the element in top, with the assumption that in a database
     * system it is more likely that recently added entries are accessed
     * more frequently. */
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    entry = zmalloc(sizeof(*entry));
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    /* Set the hash entry fields. */
    dictSetKey(d, entry, key);
    return entry;
}

#define dictSetVal(d, entry, _val_) do { \
    if ((d)->type->valDup) \
        (entry)->v.val = (d)->type->valDup((d)->privdata, _val_); \
    else \
        (entry)->v.val = (_val_); \
} while(0)

#define dictSetKey(d, entry, _key_) do { \
    if ((d)->type->keyDup) \
        (entry)->key = (d)->type->keyDup((d)->privdata, _key_); \
    else \
        (entry)->key = (_key_); \
} while(0)

/* Returns the index of a free slot that can be populated with
 * a hash entry for the given 'key'.
 * If the key already exists, -1 is returned
 * and the optional output parameter may be filled.
 *
 * Note that if we are in the process of rehashing the hash table, the
 * index is always returned in the context of the second (new) hash table. */
static long _dictKeyIndex(dict *d, const void *key, uint64_t hash, dictEntry **existing)
{
    unsigned long idx, table;
    dictEntry *he;
    if (existing) *existing = NULL;

    /* Expand the hash table if needed */
    if (_dictExpandIfNeeded(d) == DICT_ERR)
        return -1;
    for (table = 0; table <= 1; table++) {
        idx = hash & d->ht[table].sizemask;
        /* Search if this slot does not already contain the given key */
        he = d->ht[table].table[idx];
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key)) {
                if (existing) *existing = he;
                return -1;
            }
            he = he->next;
        }
        if (!dictIsRehashing(d)) break;
    }
    return idx;
}

#define dictHashKey(d, key) (d)->type->hashFunction(key)

In dict.c/_dictKeyIndex function, IDX = hash & D - > HT [table]. sizemask can get the key value corresponding to the sequence number that should be stored. After obtaining the sequence, it can add elements to the dictionary. When adding, it will detect whether the sequence number is rehash and whether the sequence number is duplicated. If rehash is being added, it will add to the first dictht, otherwise add to the second. The solution of sequence number conflict, that is hash conflict, uses open-chain method. The dictEntry of the same sequence number forms a linked list, and every time it adds elements to the head of the list, so the complexity of adding elements is O(1).

    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    entry = zmalloc(sizeof(*entry));
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

rehash process

The process of rehash is rather complicated, which is divided into several main parts:

  • Conditions for triggering rehash: Restricted by the size of size, when the elements in the hash table reach a certain level, many hash conflicts will inevitably occur. The use of open-chain method actually reduces the efficiency of key-value access. Load factor can quantify the degree of hash value conflict. Load factor is defined as: load_factor=ht[0]/used/ht[0]. Sizeload_factor=ht[0].[ 0]. used / HT [0]. size. According to the description in Redis Design and Implementation, when any of the following conditions are met, the expansion or contraction of the dictionary is initiated:

    • Extension: The server is not performing BGSAVE or BGREWRITEAOF, and the load factor of the hash table is greater than or equal to 1
    • Extension: The server is executing BGSAVE or BGREWRITEAOF, and the load silver of the hash table is greater than or equal to 5
    • Shrinkage: Load factor less than 0.1
  • Progressive rehash: It was mentioned before that dict structure contains two dict HT structures: one is the original duplicated hash table and the other is the blank hash table before rehash starts; when rehash is executed, in order to avoid hang ing for a long time, Redis adopts a progressive rehash, transferring n sequences to ht[1] at a time until ht[0] is finally executed. After the key value on the dict is transferred to ht[1], ht[0] and ht[1] are replaced, and the rehashidx of dict is reset to -1.

  • The effect of rehash on other operations of dictionary: in the process of rehash, when adding new key-value pairs, only ht[1] is operated on, while deletion, search and update are operated on both tables at the same time.

Traversal of Dictionaries

There is another function worth discussing here, that is dictScan function, dict in the process of operation, any operation should pay attention to the rehash process, how to traverse the hash table safely and completely in the rehash process, it takes time to consider, we can see the implementation of redis:

unsigned long dictScan(dict *d,
                       unsigned long v,
                       dictScanFunction *fn,
                       dictScanBucketFunction* bucketfn,
                       void *privdata)
{
    dictht *t0, *t1;
    const dictEntry *de, *next;
    unsigned long m0, m1;

    if (dictSize(d) == 0) return 0;

    if (!dictIsRehashing(d)) {
        t0 = &(d->ht[0]);
        m0 = t0->sizemask;

        /* Emit entries at cursor */
        if (bucketfn) bucketfn(privdata, &t0->table[v & m0]);
        de = t0->table[v & m0];
        while (de) {
            next = de->next;
            fn(privdata, de);
            de = next;
        }

        /* Set unmasked bits so incrementing the reversed cursor
         * operates on the masked bits */
        v |= ~m0;

        /* Increment the reverse cursor */
        v = rev(v);
        v++;
        v = rev(v);

    } else {
        t0 = &d->ht[0];
        t1 = &d->ht[1];

        /* Make sure t0 is the smaller and t1 is the bigger table */
        if (t0->size > t1->size) {
            t0 = &d->ht[1];
            t1 = &d->ht[0];
        }

        m0 = t0->sizemask;
        m1 = t1->sizemask;

        /* Emit entries at cursor */
        if (bucketfn) bucketfn(privdata, &t0->table[v & m0]);
        de = t0->table[v & m0];
        while (de) {
            next = de->next;
            fn(privdata, de);
            de = next;
        }

        /* Iterate over indices in larger table that are the expansion
         * of the index pointed to by the cursor in the smaller table */
        do {
            /* Emit entries at cursor */
            if (bucketfn) bucketfn(privdata, &t1->table[v & m1]);
            de = t1->table[v & m1];
            while (de) {
                next = de->next;
                fn(privdata, de);
                de = next;
            }

            /* Increment the reverse cursor not covered by the smaller mask.*/
            v |= ~m1;
            v = rev(v);
            v++;
            v = rev(v);

            /* Continue while bits covered by mask difference is non-zero */
        } while (v & (m0 ^ m1));
    }

    return v;
}

As for this function, there are dozens of lines of comments on the source code alone. If you analyze them in detail, you can actually write a blog alone. This is a better analysis blog I have seen. Traversal of Redis Dictionary Simply put, unlike sequential traversal, Redis's traversal method is reverse binary iteration (reverse binary iteration). This article is estimated to be unable to write. In a word, this method can ensure that unmodified data can be traversed correctly, and the modified data can minimize the probability of repeated traversal or omission.

Other API functions

There are other common functions in dictionaries, which are easy to understand, let alone talk about.

Topics: Redis github Attribute Database