Redis from mastery to entry -- detailed explanation of data type Zset implementation source code

Posted by jarosciak on Sun, 06 Feb 2022 07:04:19 +0100

Introduction to Zset

Redis ordered collection, like Set collection, is also a collection of string elements, and duplicate members are not allowed.
The difference is that each element is associated with a score of type double. redis sorts the members of the collection from small to large through scores.
Members of an ordered set are unique, but scores can be repeated. In the case of repeated scores, the inserted scores are listed later

Because the insertion method of Zset is to find the first data with a score greater than the current inserted data, and then insert it to its left to become its precursor node.

Zset common operations

  • ZADD key score member [[score member]...]: adds elements with scores to the ordered set key
  • ZREM key member [member...]: deletes elements from the ordered set key
  • ZSCORE key member: returns the score of the element member in the ordered set key
  • ZINCRBY key increment member: add increment to the score of the element member in the ordered set key
  • ZCARD key: returns the number of elements in the ordered set key
  • Zrange key start stop [with scores]: get the elements of the ordered set key from start subscript to stop subscript in positive order
  • Zrevrange key start stop [with scores]: get the elements of the ordered set key from start subscript to stop subscript in reverse order
  • ZUNIONSTORE destkey numkeys key [key...]: Union calculation
  • ZINTERSTORE destkey numkeys key [key...]: intersection calculation

Application scenario

  • Various real-time or historical rankings

Zset implementation

ZSet is an ordered and automatically de duplicated set data type. The bottom layer of ZSet data structure is implemented as dictionary (dict) + zskipplist. When there is less data, it is stored in ziplist encoding structure.

  • Zset Max ziplist entries: set the maximum number of elements supported under ziplist encoding to 128 by default. If it exceeds, the dictionary (dict) + zskiplist mode will be used;
  • Zset Max ziplist value: set the maximum number of bytes of a single element supported under ziplist encoding to 64 bytes by default. If it exceeds, the dictionary (dict) + zskiplist mode will be used;

Source code reading

/* ZADD Method to determine whether Zset exists and choose which data structure code block to use*/
zobj = lookupKeyWrite(c->db,key);
//Query whether the corresponding key exists in the corresponding db. If it does not exist, create it
if (zobj == NULL) {
    if (xx) goto reply_to_client; /* No key + XX option: nothing to do. */

    /*
     * If zset_max_ziplist_entries ==0 or element length > Zset_ max_ ziplist_ value
     * 		Then directly create the skiplist data structure
     * 		Otherwise, create a ziplist compressed list data structure
     */ 
    if (server.zset_max_ziplist_entries == 0 ||
        server.zset_max_ziplist_value < sdslen(c->argv[scoreidx+1]->ptr))
    {
        zobj = createZsetObject();
    } else {
        zobj = createZsetZiplistObject();
    }
    // Associate objects to db 
    dbAdd(c->db,key,zobj);
}  else {
    if (zobj->type != OBJ_ZSET) {
        addReply(c,shared.wrongtypeerr);
        goto cleanup;
    }
}
// Process all elements 
for (j = 0; j < elements; j++) {
    double newscore;
    // Score 
    score = scores[j];

    int retflags = flags;
    // element 
    ele = c->argv[scoreidx+1+j*2]->ptr;

    /*
     * Add element to zobj
     * The zsetAdd method of adding elements will judge whether the encoding code of the current ordered set is ziplist or zset
     * 		If it is a ziplist, you need to judge whether the number and size of elements exceed the threshold when adding elements. If they exceed the threshold, they will be converted to zset coding
     */
    int retval = zsetAdd(zobj, score, ele, &retflags, &newscore);
   //...... Omit subsequent codes
}

Zset - ziplist implementation

When the underlying implementation of Zset is ziplist, the number of entries in ziplist is always an even number, which is stored in the form of member score... Member score, and each insertion is guaranteed to be orderly. When querying, directly traverse the ziplost and compare the value of score to obtain member.

There is no need to read the source code of ziplist. For more information, please check the specific implementation of ziplist through the portal: Detailed explanation of data type List implementation source code

Illustration Zset ziplist

Suppose my zset contains many elements, and the smallest two are:
Zadd Zset ziplist 101 Li Si
Zadd Zset ziplist 99 Zhang San

Zset -- implementation of Dictionary (dict) + zskiplist

In fact, Zset can be regarded as a hash set sorted according to value, so the implementation of Dictionary (dict) in Zset is basically the same as that of hash Dictionary (dict). In order to quickly obtain score according to member, it can also be used to judge whether member exists. However, an additional zskiplist needs to be maintained for sorting.

When it comes to the search efficiency of ordered sets, when the linked list is too deep, there must be the problem of slow traversal. Therefore, Redis chooses to trade space for time and adopts the method of zskiplist to achieve an efficiency close to binary search.

Source code reading

// Data structure of zset dict + zskiplist
typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

// zskiplist jump table data structure
typedef struct zskiplist {
    struct zskiplistNode *header, *tail;	//Head and tail pointers. It is convenient to traverse in positive or reverse order
    unsigned long length;		//Total number of elements
    /*
     * The level of the node with the largest level in the jump table is saved in the member variable.
     * But excluding the head node, the level of the head node is always the maximum value: ZSKIPLIST_MAXLEVEL = 32. 
     * level The value of is dynamically adjusted at any time with the insertion and deletion of nodes in the jump table
     */ 
    int level;
} zskiplist;

/* 
 * ZSET Use a dedicated version of Skiplist
 * The following is the data structure of each node of zskiplist
 * 	You should pay attention to the level [] here, which is a flexible array
 * 	Therefore, when moving from the high-level index layer to the lower level, the pointer drawn in many blogs does not point to the next level!!!
 */
typedef struct zskiplistNode {
    sds ele;			//member element
    double score;		//Score, used for sorting, can be the same
    struct zskiplistNode *backward;		//Pointer to the precursor node
    struct zskiplistLevel {
    	//Subsequent node pointer, which will only point to the next node at the same level as the current level. (that is, the core index of the jump table)
        struct zskiplistNode *forward;	
        unsigned long span;		//The span from the current node to the next node
    } level[];			//Index array, which is a flexible array
} zskiplistNode;

Graphic zskiplist

Your praise is the biggest driving force for my creation. If it's good, can I have a three company

Topics: Redis data structure