Redis string

Posted by somethingorothe on Tue, 11 Jan 2022 19:31:46 +0100

character string

The standard form of string in C language is NULL(0x \ 0 in hexadecimal) as the terminator. If you want to obtain the length of string, you need to use strlen standard library function. The algorithm complexity of this function is O(n), and you need to traverse the whole string. Redis can't afford such a slow speed,

Therefore, the string in Redis exists in memory in the form of byte array, called SDS (Simple Dynamic String), which is a byte array with length information.

struct SDS<T> {
    T capacity; // Array capacity
    T len; // Array length
    byte flags; // Special flag bit
    byte[] content; // Array contents
}

The structure of SDS is much like ArrayList in Java. The content array stores real content, but this array generally has more redundant space than the actual length of the content,

capacity represents the length of the array, and len represents the real string length. Because the string can be modified in redis, it needs to support the append operation,

If the array has no redundant space, you must re open a new array when append, and then copy the contents of the old array, which is a relatively large overhead.

Generic T is used in SDS instead of int directly. When the string is short, len and capacity can be represented by byte and short to save space. Therefore, strings of different lengths will be represented by different structures.

Code to append string:

sds sdscatlen(sds s, const void *t, size_t len) {
    // Original string length
    size_t curlen = sdslen(s);
    // Adjust the space as needed. If the capacity is not enough to accommodate the additional contents, the byte array will be reallocated and the contents of the original string will be copied to the new array
    s = sdsMakeRoomFor(s,len);
    // insufficient memory
    if (s == NULL) return NULL;
    // Append the contents of the target string to the byte array
    memcpy(s+curlen, t, len);
    // Set the appended length value
    sdssetlen(s, curlen+len);
    // End string with \ 0
    s[curlen+len] = '\0';
    return s;
}

Redis stipulates that the maximum string length cannot exceed 512MB. When creating a string, len and capacity are the same length, and redundant space will not be allocated, because in most cases, the append of the string will not be used.

embstr and raw

Redis strings can be stored in two ways. The length is very short. They are stored in the form of embstr. When the byte length exceeds 44, they are stored in the form of raw.

44 characters:

127.0.0.1:6379> set hello abcdefghijklmnopqrstuvwxyz012345678912345678

OK

127.0.0.1:6379> debug object hello

Value at:0xffff9f531e40 refcount:1 encoding:embstr serializedlength:45 lru:14436129 lru_seconds_idle:5

45 characters:

127.0.0.1:6379> set hello abcdefghijklmnopqrstuvwxyz0123456789123456789

OK

127.0.0.1:6379> debug object hello

Value at:0xffff9f18d2e0 refcount:1 encoding:raw serializedlength:46 lru:14436171 lru_seconds_idle:2

In the contents at the two ends above, encoding represents the storage form. You can see that 44 bytes are embstr and 45 bytes are raw.

Object header

Each Redis object has a header structure:

struct RedisObject { 
    int4 type ;         // 4bits
    int4 encoding ;     // 4bits
    int24 lru ;         // 24bits
    int32 refcount ;    // 4bytes 
    void *ptr;          // 8bytes 
} robj ;

Each object has type, storage form encoding, and lru information of the object,

In addition, refcount represents the reference count of the object. When it is 0, the object is destroyed and recycled, and the ptr pointer points to the specific storage location of the object content (body),

The object header will occupy 16 bytes of storage space.

When the string content is small, the memory occupation of SDS structure is as follows:

struct SDS {
    int8 capacity; //1byte
    int8 len; //1byte
    int8 flags; //1byte
    byte[] content; //Content array with length of capacity
}

That is, 16 bytes of the SDS object header structure plus 3 bytes in the SDS structure means that a string occupies at least 19 bytes,

When embstr is stored in memory, the location is together with the object header. malloc method is used to allocate the space of object header + object at one time, but raw is not. malloc allocation needs to be performed twice, and the object header and content are not continuous in memory.

Because the memory allocators jemalloc and tcmalloc allocate memory size in 2 / 4 / 8 / 16 / 32 / 64 bytes, jemalloc generally allocates 32 bytes to accommodate an embstr string. If the string is longer, it is 64 bytes,

If the total string exceeds 64 bytes, Redis thinks that it is not suitable to use embstr for storage, so it's time to use raw.

Because the Redis object header takes up 16 bytes, plus the total of 3 bytes of the three attributes, there are 64 - 19 = 45 bytes left. The string needs to be terminated with NULL, and NULL takes up another byte. In the end, only 44 bytes are left for storing the content of the string. This is why more than 44 bytes of encoding has become raw.

Capacity expansion strategy

Before the string is less than 1MB, the expansion space will be doubled directly. After exceeding, in order to avoid waste, only 1MB of redundant space will be added each time.

Topics: Java C Redis Algorithm nosql