promethues source code analysis: head block

Posted by oeb on Tue, 04 Jan 2022 05:33:04 +0100

What is a Head block?

v2. Before 19, the index data of the last 2hour was stored in memory.
v2.19. Head block is introduced. The latest indicator data is stored in memory. When the head block is full, the data is stored in disk and referenced through mmap.
The Head block consists of several chunks. The head chunk is a memChunk that receives timing writes.

When writing timing data, it returns write success after writing head chunk and wal.

What is mmap?

Reading and writing of ordinary files:

  • Read the file into kernal space first;
  • The file content is copied to the value user space;
  • Contents of user operation documents;

File reading and writing in mmap mode:

  • After the file is map ped to kernel space, the user space can read and write;
  • Compared with ordinary file reading and writing, it reduces one system call and one file copy;
  • In the scenario where multiple processes share the same file in a read-only manner, a lot of memory will be saved;

Life cycle of Head block

1) Initial state

After the timing data is written to head chunk and wal, it returns write success.

2) head chunk is full

headChunk saves the data of the latest 120 points for each series;

const samplesPerChunk = 120

If the slice interval = 15s, headChunk will store 30min index data;
When the head chunk is full, a new head chunk acceptance indicator is generated, as shown in the following figure:

Meanwhile, the original head chunk is flush ed to disk, and mmap refers to it:

3) The chunks of mmap are full

When the chunks of mmap reach 3 / 2 of the chunkrange (2Our), as shown in the following figure:

The data of chunkrange (2Our) in mmap will be persisted to block, and checkpoint & clean up wal log will be generated at the same time.

Source code analysis of Head block

Each memSeries structure contains a headChunk, which stores the data of one series in mem:

// prometheus/tsdb/head.go
// memSeries is the in-memory representation of a series.
type memSeries struct {
    ...
    ref           uint64
    lset          labels.Labels
    ...
    headChunk     *memChunk
}

type memChunk struct {
    chunk            chunkenc.Chunk
    minTime, maxTime int64
}

To add metric data to memSeries:

// prometheus/tsdb/head.go
// append adds the sample (t, v) to the series.
func (s *memSeries) append(t int64, v float64, appendID uint64, chunkDiskMapper *chunks.ChunkDiskMapper) (sampleInOrder, chunkCreated bool) {
    // 1 chunk up to 120 sample s
    const samplesPerChunk = 120
    numSamples := c.chunk.NumSamples()
    // If we reach 25% of a chunk's desired sample count, set a definitive time
    // at which to start the next chunk.
    // At 1 / 4, recalculate nextat (time after 120 o'clock)
    if numSamples == samplesPerChunk/4 {
        s.nextAt = computeChunkEndTime(c.minTime, c.maxTime, s.nextAt)
    }
    // Arrival time, create a new headChunk
    if t >= s.nextAt {
        c = s.cutNewHeadChunk(t, chunkDiskMapper)
        chunkCreated = true
    }
    // Insert t/v data into headChunk
    s.app.Append(t, v)
    ......
}

When nextAt is reached, write the old headChunk data and create a new headChunk:

// prometheus/tsdb/head.go
func (s *memSeries) cutNewHeadChunk(mint int64, chunkDiskMapper *chunks.ChunkDiskMapper) *memChunk {
    // Write mmap
    s.mmapCurrentHeadChunk(chunkDiskMapper)

    // New headChunk
    s.headChunk = &memChunk{
        chunk:   chunkenc.NewXORChunk(),
        minTime: mint,
        maxTime: math.MinInt64,
    }
    s.nextAt = rangeForTimestamp(mint, s.chunkRange)
    app, err := s.headChunk.chunk.Appender()
    s.app = app
    return s.headChunk
}

Write headChunk to mmap:

// prometheus/tsdb/head.go
func (s *memSeries) mmapCurrentHeadChunk(chunkDiskMapper *chunks.ChunkDiskMapper) {
    chunkRef, err := chunkDiskMapper.WriteChunk(s.ref, s.headChunk.minTime, s.headChunk.maxTime, s.headChunk.chunk)
    s.mmappedChunks = append(s.mmappedChunks, &mmappedChunk{
        ref:        chunkRef,
        numSamples: uint16(s.headChunk.chunk.NumSamples()),
        minTime:    s.headChunk.minTime,
        maxTime:    s.headChunk.maxTime,
    })
}

// prometheus/tsdb/chunks/head_chunks.go
// WriteChunk writes the chunk to the disk.
func (cdm *ChunkDiskMapper) WriteChunk(seriesRef uint64, mint, maxt int64, chk chunkenc.Chunk) (chkRef uint64, err error) {
    ....
    // Write header information
    if err := cdm.writeAndAppendToCRC32(cdm.byteBuf[:bytesWritten]); err != nil {
        return 0, err
    }
    // Write chunk data
    if err := cdm.writeAndAppendToCRC32(chk.Bytes()); err != nil {
        return 0, err
    }
    if err := cdm.writeCRC32(); err != nil {
        return 0, err
    }
    // writeBufferSize=4M
        // If it exceeds 4M, flush directly to disk
    if len(chk.Bytes())+MaxHeadChunkMetaSize >= writeBufferSize {
        if err := cdm.flushBuffer(); err != nil {
            return 0, err
        }
    }

    return chkRef, nil
}

Benefits of Head block

prometheus at 2.19 The release note of 0 mentioned:

You can see the benefits of Head block:

  • Reduced memory usage in user mode:

    • The chunks of the last 2hour exist in memory;
    • After the head block is introduced, chunks are referenced by mmap and do not occupy user memory;
  • The data recovery speed of prometheus instance restart is improved:

    • If there is no head block, you need to replay all wal to memory during recovery;
    • After having a head block, you only need to read the mmap chunks during recovery, and then replay the wal without mmap;

reference resources:

1.https://ganeshvernekar.com/bl...
2.https://ganeshvernekar.com/bl...

Topics: Prometheus