prometheus source code analysis: index inverted index

Posted by tkj on Sat, 01 Jan 2022 23:46:19 +0100

Inverted index

Indexes in prometheus tsdb are organized as inverted index es:

  • Assign 1 id to each series

    • Query series with seriesId, which is the forward index. Query time complexity = O(1);
  • Construct index of label

    • If seriesId={2,5,10,29} contains label: app='nginx ';
    • Then, for app='nginx", {2,5,10,29} is its inverted index;

For example, for seriesId=5:

// seriesId=5
{
    __name__ = "request_total",
    pod="nginx-1",
    path="/api/v1/status",
    status="200",
    method="GET"
}

Then, for:

  • status="200": its inverted index = {1,2,5,...}
  • method="GET": its inverted index = {2,3,4,5,6,9,...}

Overall source code framework

In memory, use headIndexReader to organize label s in memory block into inverted indexes;
In block, use blockIndexReader to read the index file in the block directory and organize the label s in it into inverted indexes;
headIndexReader and blockIndexReader inherit from indexReader and provide:

  • LabelNames(): query all label keys;
  • LabelValues(name): query the values corresponding to the label key;
  • Postings(): query the [] seriesId corresponding to label key/value;

The blockquery constructs different indexreaders according to different blocks to read the Label index; Blockquery uses Postings() to get [] seriesId, and then uses chunkReader to finally read the timing data (t/v).

Inverted index in memory

Data structure:

// tsdb/index/postings.go
type MemPostings struct {
    mtx     sync.RWMutex

    // label key --> []labelValue
    values  map[string]stringset // Label names to possible values.

    // map[labelName]map[labelValue]postingsList
    // labelName --> labelValue --> []posting
    m       map[string]map[string][]uint64
    ordered bool
}

// tsdb/head.go
// Head handles reads and writes of time series data within a time window.
type Head struct {
    ......
    postings *index.MemPostings // Postings lists for terms.
}

1 - insertion of memory inverted index

The entry is to insert timing data:

  • If lset is already in the series, it returns directly;
  • Otherwise, get a seriesId:

    • Insert label key/value into h.values;
    • Insert label key/value and seriesId into h.posts (large map);
// tsdb/head.go
func (a *headAppender) Add(lset labels.Labels, t int64, v float64) (uint64, error) {
    ......
    s, created, err := a.head.getOrCreate(lset.Hash(), lset)
        ......
}

func (h *Head) getOrCreate(hash uint64, lset labels.Labels) (*memSeries, bool, error) {
    s := h.series.getByHash(hash, lset)
    // Already, go straight back
    if s != nil {
        return s, false, nil
    }
    id := atomic.AddUint64(&h.lastSeriesID, 1)
    return h.getOrCreateWithID(id, hash, lset)
}

Insert into h.values and h.postings:

// tsdb/head.go
func (h *Head) getOrCreateWithID(id, hash uint64, lset labels.Labels) (*memSeries, bool, error) {
    s := newMemSeries(lset, id, h.chunkRange, &h.memChunkPool)
    ......
    // Insert label key/value into h.values
    for _, l := range lset {
        valset, ok := h.values[l.Name]
        if !ok {
            valset = stringset{}
            h.values[l.Name] = valset
        }
        // Insert key,value
        valset.set(l.Value)
        ......
    }
    // id=seriesId
    // Insert key/value/seriesId into h.postings
    h.postings.Add(id, lset)
    return s, true, nil
}

2 - query of memory inverted index

Mainly in headIndexReader:

  • Query all lablenames through LableNames();
  • Query the labelValues corresponding to labelName through LabelValues(name);
  • Query the [] seriesId corresponding to key and value through postings, and finally use seriesId+chunkReader to query the final timing data (t/v);

Both LableNames() and LabelValues(name) use head values:

// tsdb/head.go
func (h *headIndexReader) LabelNames() ([]string, error) {
    labelNames := make([]string, 0, len(h.head.values))
    // Read h.head values
    for name := range h.head.values {
        if name == "" {
            continue
        }
        labelNames = append(labelNames, name)
    }
    sort.Strings(labelNames)
    return labelNames, nil
}
// tsdb/head.go
func (h *headIndexReader) LabelValues(name string) ([]string, error) {
    sl := make([]string, 0, len(h.head.values[name]))
    // Read h.head values
    for s := range h.head.values[name] {
        sl = append(sl, s)
    }
    return sl, nil
}

Postings() provides the function of querying the [] seriesId corresponding to key/values:

// tsdb/head.go
// Postings returns the postings list iterator for the label pairs.
func (h *headIndexReader) Postings(name string, values ...string) (index.Postings, error) {
    res := make([]index.Postings, 0, len(values))
    for _, value := range values {
        res = append(res, h.head.postings.Get(name, value))
    }
    return index.Merge(res...), nil
}

Inverted index in block

Data structure:

// tsdb/index/index.go
type Reader struct {
    ......
    // labelName--> labelValue + offset
    postings map[string][]postingOffset
    ......
}

The inverted index in the block is obtained from the index file in the read disk block.

Query LabelNames():

  • The specific reading is performed by the reader Lablenames() implementation;
  • blockIndexReader.LabelNames() finally calls reader LabelNames();
// tsdb/index/index.go
// LabelNames returns all the unique label names present in the index.
func (r *Reader) LabelNames() ([]string, error) {
    labelNames := make([]string, 0, len(r.postings))
    // Read r.postings
    for name := range r.postings {
        ......
        labelNames = append(labelNames, name)
    }
    sort.Strings(labelNames)
    return labelNames, nil
}

// tsdb/block.go
func (r blockIndexReader) LabelNames() ([]string, error) {
    return r.b.LabelNames()
}

Query LabelValues(name):

  • The specific reading is performed by the reader Labelvalues read TOC Postingstable implementation;
  • blockIndexReader.LabelValues() finally calls reader LabelValues();
// tsdb/index/index.go
// LabelValues returns value tuples that exist for the given label name.
func (r *Reader) LabelValues(name string) ([]string, error) {
    ......
    e, ok := r.postings[name]
    values := make([]string, 0, len(e)*symbolFactor)
    // Read TOC PostingsTable
    d := encoding.NewDecbufAt(r.b, int(r.toc.PostingsTable), nil)
    d.Skip(e[0].off)
    lastVal := e[len(e)-1].value

    for d.Err() == nil {
        ......
        s := yoloString(d.UvarintBytes()) //Label value.
        values = append(values, s)
    }
    return values, nil
}

// tsdb/block.go
func (r blockIndexReader) LabelValues(name string) ([]string, error) {
    st, err := r.ir.LabelValues(name)
    return st, errors.Wrapf(err, "block: %s", r.b.Meta().ULID)
}

Query Postings():

  • The specific reading is performed by the reader Postings() read TOC Postingstable implementation;
  • blockIndexReader.Postings() finally calls reader Postings();
// tsdb/index/index.go
func (r *Reader) Postings(name string, values ...string) (Postings, error) {
    .....
    e, ok := r.postings[name]
    res := make([]Postings, 0, len(values))
    for valueIndex < len(values) && values[valueIndex] < e[0].value {
        // Discard values before the start.
        valueIndex++
    }
    for valueIndex < len(values) {
        value := values[valueIndex]
        i := sort.Search(len(e), func(i int) bool { return e[i].value >= value })
        d := encoding.NewDecbufAt(r.b, int(r.toc.PostingsTable), nil)
        d.Skip(e[i].off)
        for d.Err() == nil {
            ......
            d2 := encoding.NewDecbufAt(r.b, int(postingsOff), castagnoliTable)
            _, p, err := r.dec.Postings(d2.Get())
            res = append(res, p)
        }
    }
    return Merge(res...), nil
}

// tsdb/block.go
func (r blockIndexReader) Postings(name string, values ...string) (index.Postings, error) {
    p, err := r.ir.Postings(name, values...)
    if err != nil {
        return p, errors.Wrapf(err, "block: %s", r.b.Meta().ULID)
    }
    return p, nil
}

Postings() is used when querying

The query process of memory and block using Postings() is similar, except that different indexreaders are used.

1) Query entry: load memory block and disk block to construct blockquery

// tsdb/db.go
func (db *DB) Querier(_ context.Context, mint, maxt int64) (storage.Querier, error) {
    var blocks []BlockReader
    // Disk block
    for _, b := range db.blocks {
        if b.OverlapsClosedInterval(mint, maxt) {
            blocks = append(blocks, b)
            blockMetas = append(blockMetas, b.Meta())
        }
    }
    // Memory block
    if maxt >= db.head.MinTime() {
        blocks = append(blocks, &RangeHead{
            head: db.head,
            mint: mint,
            maxt: maxt,
        })
    }
    blockQueriers := make([]storage.Querier, 0, len(blocks))
    for _, b := range blocks {
        q, err := NewBlockQuerier(b, mint, maxt)
        if err == nil {
            blockQueriers = append(blockQueriers, q)
            continue
        }
    }
    return &querier{
        blocks: blockQueriers,
    }, nil
}

It can be seen that:

  • For memory block, use RangeHead structure;
  • For disk block, use block structure;

2) Construct blockquery

// tsdb/querier.go
// NewBlockQuerier returns a querier against the reader.
func NewBlockQuerier(b BlockReader, mint, maxt int64) (storage.Querier, error) {
    // Construct headIndexReader
    indexr, err := b.Index()
    if err != nil {
        return nil, errors.Wrapf(err, "open index reader")
    }
    chunkr, err := b.Chunks()
    .....
    return &blockQuerier{
        mint:       mint,
        maxt:       maxt,
        index:      indexr,
        chunks:     chunkr,
        tombstones: tombsr,
    }, nil
}

Of the above codes, the most important ones are:

// Different indexreaders are constructed according to different block s
indexr, err := b.Index()

For memory block(RangeHead): the final construction is headIndexReader

// tsdb/head.go
func (h *RangeHead) Index() (IndexReader, error) {
    return h.head.indexRange(h.mint, h.maxt), nil
}

func (h *Head) indexRange(mint, maxt int64) *headIndexReader {
    if hmin := h.MinTime(); hmin > mint {
        mint = hmin
    }
    return &headIndexReader{head: h, mint: mint, maxt: maxt}
}

For disk block(Block): the final construction is blockIndexReader

// tsdb/block.go
// Index returns a new IndexReader against the block data.
func (pb *Block) Index() (IndexReader, error) {
    if err := pb.startRead(); err != nil {
        return nil, err
    }
    return blockIndexReader{ir: pb.indexr, b: pb}, nil
}

3) BlockQuerier uses indexReader to query postings information
Query serieset

func (q *blockQuerier) Select(sortSeries bool, hints *storage.SelectHints, ms ...*labels.Matcher) storage.SeriesSet {
    ......
    if sortSeries {
        base, err = LookupChunkSeriesSorted(q.index, q.tombstones, ms...)
    } else {
        base, err = LookupChunkSeries(q.index, q.tombstones, ms...)
    }
    ......
}

func lookupChunkSeries(sorted bool, ir IndexReader, tr tombstones.Reader, ms ...*labels.Matcher) (storage.DeprecatedChunkSeriesSet, error) {
    ......
    // Indexreader. Will eventually be called Postings()
    p, err := PostingsForMatchers(ir, ms...)
    ......
}

PostingsForMatchers will eventually call indexreader Postings().

reference resources

1.https://ganeshvernekar.com/bl...

Topics: Prometheus