ziplist source code analysis of Redis

Posted by gazolinia on Sun, 19 Apr 2020 14:58:37 +0200

1, ziplist introduction

From the previous analysis, we know that ziplist (compressed list) is used for the underlying storage of quicklist. Since the compressed list itself has many contents, we reopened this article. Before the official source code, we should first look at the characteristics of ziplist:

ziplist is a two-way list with special encoding to save storage space.
ziplist allows both string and integer types to be stored, and integer numbers are encoded as real integer numbers instead of string sequences (space saving).
ziplist list supports the time complexity of push and pop operations in the head and tail in the constant range of O(1), but each operation involves memory reallocation, especially in the head operation, which involves a large segment of memory movement operation, increasing the complexity of the operation.

The bold parts above will be reflected in the following code analysis (ziplist.h and ziplist.c).

2, ziplist data structure

Let's take a look at the structure of ziplist:

The above diagram shows the overall structure of ziplist. Because the length of ziplist and entry is variable, there is no definition of these two interfaces in the code. Here is a schematic structure definition for easy understanding

struct ziplist{

unsigned int zlbytes; // The length bytes of ziplist, including the header, all entries, and zipend.
unsigned int zloffset; // Offset from the ziplist header pointer to the last entry for fast reverse queries
unsigned short int zllength; // Number of entry elements
T[] entry;              // Element value
unsigned char zlend;   // ziplist terminator, fixed to 0xFF

}

struct entry{
char[var] prevlen; / / the byte length value of the previous entry.
char[var] encoding; / / element encoding type
char[] content; / / element content
}

The reading and assignment of ziplist variables in the code are realized through macros, as follows:

define ZIPLIST_BYTES(zl) (((uint32_t)(zl)))

define ZIPLIST_TAIL_OFFSET(zl) (((uint32_t)((zl)+sizeof(uint32_t))))

define ZIPLIST_LENGTH(zl) (((uint16_t)((zl)+sizeof(uint32_t)*2)))

define ZIPLIST_HEADER_SIZE (sizeof(uint32_t)*2+sizeof(uint16_t))

define ZIPLIST_END_SIZE (sizeof(uint8_t))

define ZIPLIST_ENTRY_HEAD(zl) ((zl)+ZIPLIST_HEADER_SIZE)

define ZIPLIST_ENTRY_TAIL(zl) ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))

define ZIPLIST_ENTRY_END(zl) ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)

The structure of the entry is a little more complicated. prevlen and encoding are specially coded to save space. The essence of ziplist is also embodied here.

First, look at the source code of prevlen assignment method:

/* Encode the length of the previous entry and write it to "p". Return the

number of bytes needed to encode this length if "p" is NULL.
Write the length code of the previous entry into the prevLen field of the current entry. Coding rules:
1. If len < 254, prevLen takes one byte and writes the first byte of the current entry.
2. If len > = 254, prevLen takes up five bytes, the first byte is written to 254, and the second to fifth bytes are written to the actual length.
  */

static unsigned int zipPrevEncodeLength(unsigned char *p, unsigned int len) {

if (p == NULL) { // At this time, only the storage length required by len is calculated
    return (len < ZIP_BIGLEN) ? 1 : sizeof(len)+1;
} else {
    if (len < ZIP_BIGLEN) {
        p[0] = len;
        return 1;
    } else {
        p[0] = ZIP_BIGLEN;
        memcpy(p+1,&len,sizeof(len));
        memrev32ifbe(p+1);
        return 1+sizeof(len);
    }
}

}

Next, we will analyze the encoding field. redis makes different encoding (long long type and String type) according to the value of the storage element. Long long type encoding is also to save space, which is carried out in the zipTryEncoding method:

/* Check if string pointed to by 'entry' can be encoded as an integer.

Stores the integer value in 'v' and its encoding in 'encoding'.
When the storage content can be converted to long long type, encoding takes up one byte, in which the first two bits are fixed to be 1, and the last six bits are different according to the value value size, as follows:
1. Ox1100000 indicates that the content content is int16 and the length is 2 bytes.
2. Ox1100000 indicates that the content content is int32 and the length is 4 bytes.
3. Ox1100000 indicates that the content content is int64 and the length is 8 bytes.
4. Ox1110000 indicates that the content content is int24 and the length is 3 bytes.
5. OX11111110 indicates that the content content is int8 and the length is 1 byte.
6. Ox1111111 indicates the end of ziplist.
7. 0x111xxxx represents a very small number and stores the value of 0-12. Since neither 0000 nor 1111 can be used, its actual value will be 1 to 13. After obtaining the four digit value, the program needs to subtract 1 to calculate the correct value. For example, if the last four digits are 0001 = 1, then the value returned by the program will be 1-1 = 0.
  */

static int zipTryEncoding(unsigned char entry, unsigned int entrylen, long long v, unsigned char *encoding) {

long long value;

if (entrylen >= 32 || entrylen == 0) return 0;
if (string2ll((char*)entry,entrylen,&value)) {
    /* Great, the string can be encoded. Check what's the smallest
     * of our encoding types that can hold this value. */
    if (value >= 0 && value <= 12) {
        *encoding = ZIP_INT_IMM_MIN+value;
    } else if (value >= INT8_MIN && value <= INT8_MAX) {
        *encoding = ZIP_INT_8B;
    } else if (value >= INT16_MIN && value <= INT16_MAX) {
        *encoding = ZIP_INT_16B;
    } else if (value >= INT24_MIN && value <= INT24_MAX) {
        *encoding = ZIP_INT_24B;
    } else if (value >= INT32_MIN && value <= INT32_MAX) {
        *encoding = ZIP_INT_32B;
    } else {
        *encoding = ZIP_INT_64B;
    }
    *v = value;
    return 1;
}
return 0;

}

The above method defines whether it can be encoded as long long type. If not, it can be encoded as String type and assigned a value. The encoding code is in zipEncodeLength method:

/* Encode the length 'rawlen' writing it in 'p'. If p is NULL it just returns

the amount of bytes required to encode such a length.
This method encodes and assigns a value when encoding is of String type (if the entry content can be converted to long long type, encode in zipTryEncoding method), and encodes encoding value according to String of different length, as follows:
1. The first two bits 00 of 0x00xxxxx represent a string with a maximum length of 63, the last six bits represent the actual string length, and encoding takes 1 byte.
2. The first two bits 01 of 0x01xxxxx xxxxx represent a medium length string (greater than 63 but less than or equal to 16383), the last 14 bits represent the actual length, and encoding takes two bytes.
3. Ox10000000 xxxxxxxxx XXXXXXXX XXXXXXXX represents an extra large string, the first byte is fixed 128(0X80), the last four bytes store the actual length, and encoding takes up 5 bytes.
  */

static unsigned int zipEncodeLength(unsigned char *p, unsigned char encoding, unsigned int rawlen) {

unsigned char len = 1, buf[5];

if (ZIP_IS_STR(encoding)) {
    /* Although encoding is given it may not be set for strings,
     * so we determine it here using the raw length. */
    if (rawlen <= 0x3f) {
        if (!p) return len;
        buf[0] = ZIP_STR_06B | rawlen;
    } else if (rawlen <= 0x3fff) {
        len += 1;
        if (!p) return len;
        buf[0] = ZIP_STR_14B | ((rawlen >> 8) & 0x3f);
        buf[1] = rawlen & 0xff;
    } else {
        len += 4;
        if (!p) return len;
        buf[0] = ZIP_STR_32B;
        buf[1] = (rawlen >> 24) & 0xff;
        buf[2] = (rawlen >> 16) & 0xff;
        buf[3] = (rawlen >> 8) & 0xff;
        buf[4] = rawlen & 0xff;
    }
} else {
    /* Implies integer encoding, so length is always 1. */
    if (!p) return len;
    buf[0] = encoding;
}

/* Store this length at p */
memcpy(p,buf,len);
return len;

}

3, ziplist addition, deletion, modification and query

Create ziplist

When executing the lpush command, if the current quicklistNode is new, you need to create a new ziplist:

/* Add new entry to head node of quicklist.
*

Returns 0 if used existing head.
Returns 1 if new head created.
Add a new element to the quicklist header node:

If the new element is added in the head, return 0, otherwise return 1
*/
int quicklistPushHead(quicklist quicklist, void value, size_t sz) {

quicklistNode *orig_head = quicklist->head;
// If the head is not empty and the space size meets the storage requirements of the new element, the new element is added to the head, otherwise a quicklistNode is added
if (likely(
        _quicklistNodeAllowInsert(quicklist->head, quicklist->fill, sz))) {
    quicklist->head->zl =
        ziplistPush(quicklist->head->zl, value, sz, ZIPLIST_HEAD);
    quicklistNodeUpdateSz(quicklist->head);
} else {
    // Create a new quicklistNode
    quicklistNode *node = quicklistCreateNode();
    // Add new elements to the new ziplist
    node->zl = ziplistPush(ziplistNew(), value, sz, ZIPLIST_HEAD);
    // Update the length of ziplist to sz field of quicklistNode
    quicklistNodeUpdateSz(node);
    // Add the new node to the quicklist, that is, before the original head
    _quicklistInsertNodeBefore(quicklist, quicklist->head, node);
}
quicklist->count++;
quicklist->head->count++;
return (orig_head != quicklist->head);

}

/ Create a new empty ziplist. /
unsigned char *ziplistNew(void) {

unsigned int bytes = ZIPLIST_HEADER_SIZE+1;
unsigned char *zl = zmalloc(bytes);
ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
ZIPLIST_LENGTH(zl) = 0;
zl[bytes-1] = ZIP_END;
return zl;

}

Add entry

The code to add the entry is in the ziplistPush method:

unsigned char ziplistPush(unsigned char zl, unsigned char *s, unsigned int slen, int where) {

unsigned char *p;
p = (where == ZIPLIST_HEAD) ? ZIPLIST_ENTRY_HEAD(zl) : ZIPLIST_ENTRY_END(zl);
return __ziplistInsert(zl,p,s,slen);

}

/Insert item at "p". zl/
static unsigned char __ziplistInsert(unsigned char zl, unsigned char p, unsigned char s, unsigned int slen) {

size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;
unsigned int prevlensize, prevlen = 0;
size_t offset;
int nextdiff = 0;
unsigned char encoding = 0;
long long value = 123456789; /* initialized to avoid warning. Using a value
                                that is easy to see if for some reason
                                we use it uninitialized. */
zlentry tail;

/* Find out prevlen for the entry that is inserted. */
if (p[0] != ZIP_END) {
    ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
} else {
    // When the previous operation removes the element from the tail, the ziplist'entry'tail pointer migrates forward, and ptail [0]! = zip'end
    unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
    if (ptail[0] != ZIP_END) {
        prevlen = zipRawEntryLength(ptail);
    }
}

/* See if the entry can be encoded */
// Check whether the value of entry can be encoded as long long type, and if so, save the value in value,
// And save the required minimum byte length in encoding
if (zipTryEncoding(s,slen,&value,&encoding)) {
    /* 'encoding' is set to the appropriate integer encoding */
    reqlen = zipIntSize(encoding);
} else {
    /* 'encoding' is untouched, however zipEncodeLength will use the
     * string length to figure out how to encode it. */
    reqlen = slen;
}
/* We need space for both the length of the previous entry and
 * the length of the payload. */
reqlen += zipPrevEncodeLength(NULL,prevlen);
reqlen += zipEncodeLength(NULL,encoding,slen);

/* When the insert position is not equal to the tail, we need to
 * make sure that the next entry can hold this entry's length in
 * its prevlen field. */
nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;

// reqlen is the required size of zlentry, nextdiff is the difference between the storage space required by prelen in the original entry and prelen in the new entry.
/* Store offset because a realloc may change the address of zl. */
offset = p-zl;
zl = ziplistResize(zl,curlen+reqlen+nextdiff);
p = zl+offset;

/* Apply memory move when necessary and update tail offset. */
if (p[0] != ZIP_END) {
    /* Subtract one because of the ZIP_END bytes */
    // Move the original data backward to make room for writing a new zlentry
    memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);

    /* Encode this entry's raw length in the next entry. */
    // The length of the new entry is written to the prelen of the next zlntry
    zipPrevEncodeLength(p+reqlen,reqlen);

    /* Update offset for tail */
    // Update ziplist? Tail? Offset to point to the original tail entry.
    ZIPLIST_TAIL_OFFSET(zl) =
        intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

    /* When the tail contains more than one entry, we need to take
     * "nextdiff" in account as well. Otherwise, a change in the
     * size of prevlen doesn't have an effect on the *tail* offset. */
    zipEntry(p+reqlen, &tail);
    // If the entry of the original insertion position is not the last tail element, you need to adjust the ziplist_tail_offsetvalue (increase nextdiff)
    if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
        ZIPLIST_TAIL_OFFSET(zl) =
            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
    }
} else {
    /* This element will be the new tail. */
    // Zip list "tail" offset points to the newly added entry, that is, the newly added entry is the tail element
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
}

/* When nextdiff != 0, the raw length of the next entry has changed, so
 * we need to cascade the update throughout the ziplist */
if (nextdiff != 0) {
    // If nextdiff is not 0, you need to update the prelen in the subsequent entry circularly. In the worst case, all entries need to be updated once
    offset = p-zl;
    zl = __ziplistCascadeUpdate(zl,p+reqlen);
    p = zl+offset;
}

/* Write the entry */
// Assign value to new entry
p += zipPrevEncodeLength(p,prevlen);
p += zipEncodeLength(p,encoding,slen);
if (ZIP_IS_STR(encoding)) {
    memcpy(p,s,slen);
} else {
    zipSaveInteger(p,value,encoding);
}
ZIPLIST_INCR_LENGTH(zl,1);
return zl;

}

From the above code, it can be seen that if you want to add elements in the header, you need to move all elements in the current ziplist back a certain distance by executing memmove method. The consumption is still large.

Delete entry

The deletion operation is implemented in the ziplistDelete method. Its logic is just the opposite of adding, so it will not be repeated.

At this point, the main code analysis of ziplist is over. From the code, we can see that the implementation of ziplist is very delicate, saving storage space as much as possible. However, there will be a lot of memory movement operations in the head operation, which consumes a lot. In the tail operation, there is no memory movement, and the efficiency is much higher.

The content of this article refers to Qian wenpin's Redis in depth Adventure: core principles and application practice. Thank you!

https://github.com/tomliugen
Original address https://www.cnblogs.com/xinghebuluo/p/12727840.html

Topics: Database encoding Redis less github

Programmer Think

ziplist source code analysis of Redis

define ZIPLIST_BYTES(zl) (((uint32_t)(zl)))

define ZIPLIST_TAIL_OFFSET(zl) (((uint32_t)((zl)+sizeof(uint32_t))))

define ZIPLIST_LENGTH(zl) (((uint16_t)((zl)+sizeof(uint32_t)*2)))

define ZIPLIST_HEADER_SIZE (sizeof(uint32_t)*2+sizeof(uint16_t))

define ZIPLIST_END_SIZE (sizeof(uint8_t))

define ZIPLIST_ENTRY_HEAD(zl) ((zl)+ZIPLIST_HEADER_SIZE)

define ZIPLIST_ENTRY_TAIL(zl) ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))

define ZIPLIST_ENTRY_END(zl) ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)

Hot Topics