prometheus source code analysis: compression, writing and reading of t/v data

Posted by blueguitar on Thu, 06 Jan 2022 14:25:47 +0100

The indicator t/v data in prometheus is saved in block/chunks, and the label data is saved in block/index.

For t/v data, prometheus adopts the compression method of Facebook Gorilla paper:

Timestamp: compress the time value of the timing point in delta of delta mode;
Value: compress the value of the timing point in XOR mode;

According to the above compression method, a 16byte timing point can be compressed into 1.37byte, and the compression rate is very high.

Compression of time sequence point t/v

1)timestamp compression

In terms of timing, the timestamp difference between two adjacent points is generally fixed. If you pull every 60s, the timestamp difference is generally 60s, for example

p1: 10:00:00，p2: 10:01:00，p3: 10:01:59，p4: 10:03:00，p5: 10:04:00，p6: 10:05:00
The difference of time stamp is: 60s, 59s, 61s, 60s, 60s;

Gorilla's paper uses delta of delta compression to compress timestamp:

The timestamp t0 of the first timing point is completely stored;
The timestamp T1 of the second timing point stores delta=t1-t0;
For the subsequent timestamp TN, first calculate the dod value: delta=(tn - tn-1) - (tn-1 - tn-2);
- If dod=0, use 1bit = "0" to store the timestamp;
- If dod=[-8191, 8192], first store "10" as the identification, and then store the dod value in 14bit;
- If dod=[-65535, 65536], first store "110" as the identification, and then store the dod value in 17bit;
- If dod=[-524287, 524288], first store "1110" as the identification, and then store the dod value in 20bit;
- If dod > 524288, first store "1111" as the identification, and then store the dod value in 64bit;

In practice, it is found that 95% of timestamp s can be stored according to the case of dod=0.

2)value compression

Gorilla's paper compresses the timing point value based on:

The value value of adjacent timing points will not change significantly;
value is mostly a floating-point number. When two values are very close, the sign bit, exponential bit and the first few bits of the mantissa of the two floating-point numbers are the same;

Compression algorithm for value:

The value value of the first timing point is not compressed and is saved directly;
Starting from the second point, XOR its value with the previous value;
- If the XOR operation result is "0", it means that the two values are the same, and only the "0" value of 1 bit can be stored;
- Otherwise, store the 1bit value "1";
  - If the non-0 part of the XOR result is included in the previous XOR result, write the 1bit value "0" and store it in the non-0 part of the XOR;
  - Otherwise, write the 1bit value "1", use 5bit to store the number of the first value 0 in XOR, 6bit to store the length of the middle non-0, and finally store the middle non-0 bit;

The data shows that about 60% of the value values are stored in only 1 bit, 30% of the value values fall into the "10" range, and the remaining 10% of the value values fall into the "11" range.

3) Compression example

Input sequence value

10:00:00    3.1
10:01:01    3.2
10:02:00    3.0
10:02:59    3.2
10:03:00    3.1

Then deposit

10:00:00     3.1
61           3.2 xor 3.1
-2(59-61)    3.0 xor 3.2
0(59-59)     3.2 xor 3.0
2(61-59)     3.1 xor 3.2

Source code analysis of writing t/v

xorAppender is responsible for writing the value of t/v, t=int64, v=float64

// tsdb/chunkenc/xor.go
func (a *xorAppender) Append(t int64, v float64) {
    var tDelta uint64
    num := binary.BigEndian.Uint16(a.b.bytes())

    //At the first point, the values of t1 and v1 are fully recorded
    if num == 0 {
        buf := make([]byte, binary.MaxVarintLen64)
        for _, b := range buf[:binary.PutVarint(buf, t)] {
            a.b.writeByte(b)        //Write the value of t1
        }
        a.b.writeBits(math.Float64bits(v), 64)  //Write value of v1
    } else if num == 1 {    //Second point
        tDelta = uint64(t - a.t)
        buf := make([]byte, binary.MaxVarintLen64)
        for _, b := range buf[:binary.PutUvarint(buf, tDelta)] {
            a.b.writeByte(b)    //Write tDeleta=t2-t1
        }
        a.writeVDelta(v)        //Write the value of v2^v1
    } else {    //Third and subsequent points
        tDelta = uint64(t - a.t)
        dod := int64(tDelta - a.tDelta)    //Calculate dod

        // Gorilla has a max resolution of seconds, Prometheus milliseconds.
        // Thus we use higher value range steps with larger bit size.
        switch {
        case dod == 0:
            a.b.writeBit(zero)    //Write 0
        case bitRange(dod, 14):    //dod = [- 81918192], first store 10 as the identification, and then store the value of dod in 14bit
            a.b.writeBits(0x02, 2) // '10'
            a.b.writeBits(uint64(dod), 14)
        case bitRange(dod, 17):    //dod = [- 6553565536], first store 110 as the identification, and then store the value of the dod in 17bit
            a.b.writeBits(0x06, 3) // '110'
            a.b.writeBits(uint64(dod), 17)
        case bitRange(dod, 20):    //dod = [- 524287524288], first store 1110 as the identification, and then store the value of the dod in 20bit
            a.b.writeBits(0x0e, 4) // '1110'
            a.b.writeBits(uint64(dod), 20)    
        default:        //dod > 524288, first store 1111 as the identification, and then store the value of the dod in 64bit
            a.b.writeBits(0x0f, 4) // '1111'
            a.b.writeBits(uint64(dod), 64)
        }
        a.writeVDelta(v)    //Write vn^vn-1
    }
    a.t = t    //Last t written
    a.v = v    //Last v written
    binary.BigEndian.PutUint16(a.b.bytes(), num+1)
    a.tDelta = tDelta    //Last tDelta written
}

Take another look at the source code of writing VDelta using xor:

// tsdb/chunkenc/xor.go
func (a *xorAppender) writeVDelta(v float64) {
    vDelta := math.Float64bits(v) ^ math.Float64bits(a.v)    //The current value is xor compared with the previous value

    if vDelta == 0 {        //xor=0, just store it in 1bit'0 '
        a.b.writeBit(zero)
        return
    }
    a.b.writeBit(one)    //Store control bit '1' first

    leading := uint8(bits.LeadingZeros64(vDelta))          //Calculate the number of leading zeros of vdelta
    trailing := uint8(bits.TrailingZeros64(vDelta))        //Calculate the number of zeros after vdelta

    // Clamp number of leading zeros to avoid overflow when encoding.
    if leading >= 32 {
        leading = 31
    }

    if a.leading != 0xff && leading >= a.leading && trailing >= a.trailing {
        a.b.writeBit(zero)
        a.b.writeBits(vDelta>>a.trailing, 64-int(a.leading)-int(a.trailing))
    } else {
        a.leading, a.trailing = leading, trailing

        a.b.writeBit(one)
        a.b.writeBits(uint64(leading), 5)

        // Note that if leading == trailing == 0, then sigbits == 64.  But that value doesn't actually fit into the 6 bits we have.
        // Luckily, we never need to encode 0 significant bits, since that would put us in the other case (vdelta == 0).
        // So instead we write out a 0 and adjust it back to 64 on unpacking.
        sigbits := 64 - leading - trailing
        a.b.writeBits(uint64(sigbits), 6)
        a.b.writeBits(vDelta>>trailing, int(sigbits))
    }
}

Read t/v source code analysis

xorIterator is responsible for reading t/v data: basically, it is the reverse process of the write process

// tsdb/chunkenc/xor.go
func (it *xorIterator) Next() bool {
    if it.err != nil || it.numRead == it.numTotal {
        return false
    }
    //Read the first point
    if it.numRead == 0 {
        t, err := binary.ReadVarint(&it.br)    //time original value reading
        if err != nil {
            it.err = err
            return false
        }
        v, err := it.br.readBits(64)    //Value original value reading
        if err != nil {
            it.err = err
            return false
        }
        it.t = t
        it.val = math.Float64frombits(v)

        it.numRead++    //Read quantity + 1
        return true
    }
    //Read the second point
    if it.numRead == 1 {
        tDelta, err := binary.ReadUvarint(&it.br)    //Read tDelta
        if err != nil {
            it.err = err
            return false
        }
        it.tDelta = tDelta
        it.t = it.t + int64(it.tDelta)    //Calculate time

        return it.readValue()        //Read xor and calculate the original value
    }
    //Read point 3 and beyond
    var d byte
    //Read prefix, up to 4bit
    // read delta-of-delta
    for i := 0; i < 4; i++ {
        d <<= 1
        bit, err := it.br.readBit()
        if err != nil {
            it.err = err
            return false
        }
        if bit == zero {
            break
        }
        d |= 1
    }
    var sz uint8
    var dod int64
    switch d {
    case 0x00:
        // dod == 0 / / prefix = 0
    case 0x02:
        sz = 14    //Prefix = 10, save dod with 14bit
    case 0x06:     //Prefix = 110, save dod with 17bit
        sz = 17
    case 0x0e:    //Prefix = 1110, save dod with 20bit
        sz = 20
    case 0x0f:    //Prefix = 1111, save dod with 64bit
        bits, err := it.br.readBits(64)
        if err != nil {
            it.err = err
            return false
        }
        dod = int64(bits)
    }

    if sz != 0 {
        bits, err := it.br.readBits(int(sz))
        if err != nil {
            it.err = err
            return false
        }
        if bits > (1 << (sz - 1)) {
            // or something
            bits = bits - (1 << sz)
        }
        dod = int64(bits)    //Read and calculate the value of dod
    }

    it.tDelta = uint64(int64(it.tDelta) + dod)    //Calculate tdelta
    it.t = it.t + int64(it.tDelta)    //Calculate time

    return it.readValue()    //Read the value of xor
}

Take another look at the process of reading xor value: xor the previous value with the value of xor

// tsdb/chunkenc/xor.go
func (it *xorIterator) readValue() bool {
    bit, err := it.br.readBit()    //Read the first bit
    if err != nil {
        it.err = err
        return false
    }

    if bit == zero {    //If the first bit=0, the value remains unchanged (so it does not need to be updated)
        // it.val = it.val
    } else {
        bit, err := it.br.readBit()
        if err != nil {
            it.err = err
            return false
        }
        if bit == zero {
            // reuse leading/trailing zero bits
            // it.leading, it.trailing = it.leading, it.trailing
        } else {
            bits, err := it.br.readBits(5)
            if err != nil {
                it.err = err
                return false
            }
            it.leading = uint8(bits)

            bits, err = it.br.readBits(6)
            if err != nil {
                it.err = err
                return false
            }
            mbits := uint8(bits)
            // 0 significant bits here means we overflowed and we actually need 64; see comment in encoder
            if mbits == 0 {
                mbits = 64
            }
            it.trailing = 64 - it.leading - mbits
        }

        mbits := int(64 - it.leading - it.trailing)
        bits, err := it.br.readBits(mbits)
        if err != nil {
            it.err = err
            return false
        }
        vbits := math.Float64bits(it.val)        //Get the last value
        vbits ^= (bits << it.trailing)           //xor with the value of xor to obtain the local value
        it.val = math.Float64frombits(vbits)     // v1^v2=xor, then v2=v1^xor
    }

    it.numRead++
    return true
}

Topics: Prometheus

Programmer Think