The indicator t/v data in prometheus is saved in block/chunks, and the label data is saved in block/index.
For t/v data, prometheus adopts the compression method of Facebook Gorilla paper:
- Timestamp: compress the time value of the timing point in delta of delta mode;
- Value: compress the value of the timing point in XOR mode;
According to the above compression method, a 16byte timing point can be compressed into 1.37byte, and the compression rate is very high.
Compression of time sequence point t/v
1)timestamp compression
In terms of timing, the timestamp difference between two adjacent points is generally fixed. If you pull every 60s, the timestamp difference is generally 60s, for example
- p1: 10:00:00,p2: 10:01:00,p3: 10:01:59,p4: 10:03:00,p5: 10:04:00,p6: 10:05:00
- The difference of time stamp is: 60s, 59s, 61s, 60s, 60s;
Gorilla's paper uses delta of delta compression to compress timestamp:
- The timestamp t0 of the first timing point is completely stored;
- The timestamp T1 of the second timing point stores delta=t1-t0;
For the subsequent timestamp TN, first calculate the dod value: delta=(tn - tn-1) - (tn-1 - tn-2);
- If dod=0, use 1bit = "0" to store the timestamp;
- If dod=[-8191, 8192], first store "10" as the identification, and then store the dod value in 14bit;
- If dod=[-65535, 65536], first store "110" as the identification, and then store the dod value in 17bit;
- If dod=[-524287, 524288], first store "1110" as the identification, and then store the dod value in 20bit;
- If dod > 524288, first store "1111" as the identification, and then store the dod value in 64bit;
In practice, it is found that 95% of timestamp s can be stored according to the case of dod=0.
2)value compression
Gorilla's paper compresses the timing point value based on:
- The value value of adjacent timing points will not change significantly;
- value is mostly a floating-point number. When two values are very close, the sign bit, exponential bit and the first few bits of the mantissa of the two floating-point numbers are the same;
Compression algorithm for value:
- The value value of the first timing point is not compressed and is saved directly;
Starting from the second point, XOR its value with the previous value;
- If the XOR operation result is "0", it means that the two values are the same, and only the "0" value of 1 bit can be stored;
Otherwise, store the 1bit value "1";
- If the non-0 part of the XOR result is included in the previous XOR result, write the 1bit value "0" and store it in the non-0 part of the XOR;
- Otherwise, write the 1bit value "1", use 5bit to store the number of the first value 0 in XOR, 6bit to store the length of the middle non-0, and finally store the middle non-0 bit;
The data shows that about 60% of the value values are stored in only 1 bit, 30% of the value values fall into the "10" range, and the remaining 10% of the value values fall into the "11" range.
3) Compression example
Input sequence value
10:00:00 3.1 10:01:01 3.2 10:02:00 3.0 10:02:59 3.2 10:03:00 3.1
Then deposit
10:00:00 3.1 61 3.2 xor 3.1 -2(59-61) 3.0 xor 3.2 0(59-59) 3.2 xor 3.0 2(61-59) 3.1 xor 3.2
Source code analysis of writing t/v
xorAppender is responsible for writing the value of t/v, t=int64, v=float64
// tsdb/chunkenc/xor.go func (a *xorAppender) Append(t int64, v float64) { var tDelta uint64 num := binary.BigEndian.Uint16(a.b.bytes()) //At the first point, the values of t1 and v1 are fully recorded if num == 0 { buf := make([]byte, binary.MaxVarintLen64) for _, b := range buf[:binary.PutVarint(buf, t)] { a.b.writeByte(b) //Write the value of t1 } a.b.writeBits(math.Float64bits(v), 64) //Write value of v1 } else if num == 1 { //Second point tDelta = uint64(t - a.t) buf := make([]byte, binary.MaxVarintLen64) for _, b := range buf[:binary.PutUvarint(buf, tDelta)] { a.b.writeByte(b) //Write tDeleta=t2-t1 } a.writeVDelta(v) //Write the value of v2^v1 } else { //Third and subsequent points tDelta = uint64(t - a.t) dod := int64(tDelta - a.tDelta) //Calculate dod // Gorilla has a max resolution of seconds, Prometheus milliseconds. // Thus we use higher value range steps with larger bit size. switch { case dod == 0: a.b.writeBit(zero) //Write 0 case bitRange(dod, 14): //dod = [- 81918192], first store 10 as the identification, and then store the value of dod in 14bit a.b.writeBits(0x02, 2) // '10' a.b.writeBits(uint64(dod), 14) case bitRange(dod, 17): //dod = [- 6553565536], first store 110 as the identification, and then store the value of the dod in 17bit a.b.writeBits(0x06, 3) // '110' a.b.writeBits(uint64(dod), 17) case bitRange(dod, 20): //dod = [- 524287524288], first store 1110 as the identification, and then store the value of the dod in 20bit a.b.writeBits(0x0e, 4) // '1110' a.b.writeBits(uint64(dod), 20) default: //dod > 524288, first store 1111 as the identification, and then store the value of the dod in 64bit a.b.writeBits(0x0f, 4) // '1111' a.b.writeBits(uint64(dod), 64) } a.writeVDelta(v) //Write vn^vn-1 } a.t = t //Last t written a.v = v //Last v written binary.BigEndian.PutUint16(a.b.bytes(), num+1) a.tDelta = tDelta //Last tDelta written }
Take another look at the source code of writing VDelta using xor:
// tsdb/chunkenc/xor.go func (a *xorAppender) writeVDelta(v float64) { vDelta := math.Float64bits(v) ^ math.Float64bits(a.v) //The current value is xor compared with the previous value if vDelta == 0 { //xor=0, just store it in 1bit'0 ' a.b.writeBit(zero) return } a.b.writeBit(one) //Store control bit '1' first leading := uint8(bits.LeadingZeros64(vDelta)) //Calculate the number of leading zeros of vdelta trailing := uint8(bits.TrailingZeros64(vDelta)) //Calculate the number of zeros after vdelta // Clamp number of leading zeros to avoid overflow when encoding. if leading >= 32 { leading = 31 } if a.leading != 0xff && leading >= a.leading && trailing >= a.trailing { a.b.writeBit(zero) a.b.writeBits(vDelta>>a.trailing, 64-int(a.leading)-int(a.trailing)) } else { a.leading, a.trailing = leading, trailing a.b.writeBit(one) a.b.writeBits(uint64(leading), 5) // Note that if leading == trailing == 0, then sigbits == 64. But that value doesn't actually fit into the 6 bits we have. // Luckily, we never need to encode 0 significant bits, since that would put us in the other case (vdelta == 0). // So instead we write out a 0 and adjust it back to 64 on unpacking. sigbits := 64 - leading - trailing a.b.writeBits(uint64(sigbits), 6) a.b.writeBits(vDelta>>trailing, int(sigbits)) } }
Read t/v source code analysis
xorIterator is responsible for reading t/v data: basically, it is the reverse process of the write process
// tsdb/chunkenc/xor.go func (it *xorIterator) Next() bool { if it.err != nil || it.numRead == it.numTotal { return false } //Read the first point if it.numRead == 0 { t, err := binary.ReadVarint(&it.br) //time original value reading if err != nil { it.err = err return false } v, err := it.br.readBits(64) //Value original value reading if err != nil { it.err = err return false } it.t = t it.val = math.Float64frombits(v) it.numRead++ //Read quantity + 1 return true } //Read the second point if it.numRead == 1 { tDelta, err := binary.ReadUvarint(&it.br) //Read tDelta if err != nil { it.err = err return false } it.tDelta = tDelta it.t = it.t + int64(it.tDelta) //Calculate time return it.readValue() //Read xor and calculate the original value } //Read point 3 and beyond var d byte //Read prefix, up to 4bit // read delta-of-delta for i := 0; i < 4; i++ { d <<= 1 bit, err := it.br.readBit() if err != nil { it.err = err return false } if bit == zero { break } d |= 1 } var sz uint8 var dod int64 switch d { case 0x00: // dod == 0 / / prefix = 0 case 0x02: sz = 14 //Prefix = 10, save dod with 14bit case 0x06: //Prefix = 110, save dod with 17bit sz = 17 case 0x0e: //Prefix = 1110, save dod with 20bit sz = 20 case 0x0f: //Prefix = 1111, save dod with 64bit bits, err := it.br.readBits(64) if err != nil { it.err = err return false } dod = int64(bits) } if sz != 0 { bits, err := it.br.readBits(int(sz)) if err != nil { it.err = err return false } if bits > (1 << (sz - 1)) { // or something bits = bits - (1 << sz) } dod = int64(bits) //Read and calculate the value of dod } it.tDelta = uint64(int64(it.tDelta) + dod) //Calculate tdelta it.t = it.t + int64(it.tDelta) //Calculate time return it.readValue() //Read the value of xor }
Take another look at the process of reading xor value: xor the previous value with the value of xor
// tsdb/chunkenc/xor.go func (it *xorIterator) readValue() bool { bit, err := it.br.readBit() //Read the first bit if err != nil { it.err = err return false } if bit == zero { //If the first bit=0, the value remains unchanged (so it does not need to be updated) // it.val = it.val } else { bit, err := it.br.readBit() if err != nil { it.err = err return false } if bit == zero { // reuse leading/trailing zero bits // it.leading, it.trailing = it.leading, it.trailing } else { bits, err := it.br.readBits(5) if err != nil { it.err = err return false } it.leading = uint8(bits) bits, err = it.br.readBits(6) if err != nil { it.err = err return false } mbits := uint8(bits) // 0 significant bits here means we overflowed and we actually need 64; see comment in encoder if mbits == 0 { mbits = 64 } it.trailing = 64 - it.leading - mbits } mbits := int(64 - it.leading - it.trailing) bits, err := it.br.readBits(mbits) if err != nil { it.err = err return false } vbits := math.Float64bits(it.val) //Get the last value vbits ^= (bits << it.trailing) //xor with the value of xor to obtain the local value it.val = math.Float64frombits(vbits) // v1^v2=xor, then v2=v1^xor } it.numRead++ return true }