golang source code reading: how to deal with the priority of collaboration in VictoriaMetrics

Posted by dinsdale on Thu, 27 Jan 2022 21:36:10 +0100

Python wechat ordering applet course video

https://edu.csdn.net/course/detail/36074

Python actual combat quantitative transaction financial management system

https://edu.csdn.net/course/detail/35475
In reading VictoriaMetrics When I read the source code of, I read such an ordinary paragraph:

// AddRows adds the given mrs to s.
func (s *Storage) AddRows(mrs []MetricRow, precisionBits uint8) error {
	if len(mrs) == 0 {
		return nil
	}

	// Limit the number of concurrent goroutines that may add rows to the storage.
	// This should prevent from out of memory errors and CPU trashing when too many
	// goroutines call AddRows.
	select {
	case addRowsConcurrencyCh <- struct{}{}:
	default:
		// Sleep for a while until giving up
		atomic.AddUint64(&s.addRowsConcurrencyLimitReached, 1)
		t := timerpool.Get(addRowsTimeout)

		// Prioritize data ingestion over concurrent searches.
		storagepacelimiter.Search.Inc()

		select {
		case addRowsConcurrencyCh <- struct{}{}:
			timerpool.Put(t)
			storagepacelimiter.Search.Dec()
		case <-t.C:
			timerpool.Put(t)
			storagepacelimiter.Search.Dec()
			atomic.AddUint64(&s.addRowsConcurrencyLimitTimeout, 1)
			atomic.AddUint64(&s.addRowsConcurrencyDroppedRows, uint64(len(mrs)))
			return fmt.Errorf("cannot add %d rows to storage in %s, since it is overloaded with %d concurrent writers; add more CPUs or reduce load",
				len(mrs), addRowsTimeout, cap(addRowsConcurrencyCh))
		}
	}

After looking carefully, it's amazing. It feels like——

1. Background

In the VM storage component, as a storage node, it is responsible for both data writing and data query. Obviously, the work of data writing is very important, and the priority of query is lower than that of writing.

When I encounter this problem, my first reaction is: don't you just set the number of written processes to be more than the number of queried processes? Set the priority as high as you want.

How naive!

Physical core is the real limitation of performance. No matter how many coprocesses you have, theoretically, there are at most N cores, and only n coprocesses are in execution state.
Collaborative scheduling is not without cost. The more collaborative processes, the more CPU time will be spent on collaborative scheduling. For CPU intensive services, the part where the number of calculated processes exceeds the number of physical cores is blind.
Assuming that the number of written processes is twice the number of read processes, the number of scheduled writes is twice the number of reads in terms of probability; However, the calculation amount of read and write is not equal. If a query has a large amount of data, the overall CPU time of the read process will be more than that of the write process, which may eventually lead to write timeout failure. The correct way is to let the read association process actively give up CPU resources through the mechanism.

So here I directly summarize the processing ideas of VM storage in collaborative process control, and then analyze the source code section by section:

Distinguish between IO collaboration and computational collaboration.

After receiving the data, the IO process will transfer it to the calculation process through channel.

The number of calculated co processes is equal to the number of cores.

The number of coprocessors processing the insert operation is equal to the number of CPU cores, and the length of the channel receiving the task is also equal to the number of CPU cores.
Processing query_ The number of CO processes of query operations such as range is twice the number of CPU cores. It is speculated that this is because some read operations may cause page missing interrupt in mmap area memory, and then cause IO blocking. But in any case, the number of processes is still very restrained.

Before the insert process executes business logic, a struct {} is written in a queued channel whose length is equal to the number of CPU cores. Successful writing proves that the number of simultaneous write operations is less than the number of cores, and writing is allowed.

If writing to the queue fails, it proves that an insert process has not been scheduled in time. You need to notify the select process to actively give up CPU resources.
4. Whenever an insert operation is blocked, the count will be accumulated through atomic operation. This count represents how many insert operations are waiting.

If the insert operation is queued successfully, the counter is decremented by one. When the counter is 0, broadcast() is initiated through the condition variable to wake up the waiting select operation.
5. In the select process, every 4095 block s scanned will check whether there is an insert operation waiting. If so, call the condition variable cond Wait() enters the wait and gives way to the process scheduling.

(the source code is located at: https://github.com/VictoriaMetrics/VictoriaMetrics)

2. Analysis of insert operation source code

2.1 creation of work coordination process

lib/protoparser/common/unmarshal_work.go:24

// StartUnmarshalWorkers starts unmarshal workers.
func StartUnmarshalWorkers() {
	if unmarshalWorkCh != nil {
		logger.Panicf("BUG: it looks like startUnmarshalWorkers() has been alread called without stopUnmarshalWorkers()")
	}
	gomaxprocs := cgroup.AvailableCPUs()   //Get the number of physical cores
	unmarshalWorkCh = make(chan UnmarshalWork, gomaxprocs)  //Create a channel with the length equal to the number of cores
	unmarshalWorkersWG.Add(gomaxprocs)
	for i := 0; i < gomaxprocs; i++ {
		go func() {  // Start N processes, and the number is equal to the number of cores
			defer unmarshalWorkersWG.Done()
			for uw := range unmarshalWorkCh {
				uw.Unmarshal()  // Call the specific business processing function here
			}
		}()
	}
}

After the IO collaboration obtains the data, throw the request into unmarshalWorkCh:

// ScheduleUnmarshalWork schedules uw to run in the worker pool.
//
// It is expected that StartUnmarshalWorkers is already called.
func ScheduleUnmarshalWork(uw UnmarshalWork) {
	unmarshalWorkCh <- uw
}

2.2 concurrency check of insert process

lib/storage/storage.go:1617

First, a channel for managing write concurrency is created:

var (
	// Limit the concurrency for data ingestion to GOMAXPROCS, since this operation
	// is CPU bound, so there is no sense in running more than GOMAXPROCS concurrent
	// goroutines on data ingestion path.
	addRowsConcurrencyCh = make(chan struct{}, cgroup.AvailableCPUs())
	addRowsTimeout       = 30 * time.Second
)

The length of the queue is the number of CPU cores. Assuming that there are 10 cores, the maximum number of write operations is 10 concurrent.

The following is the processing of write Concurrency: lib/storage/storage.go:1529

// AddRows adds the given mrs to s.
func (s *Storage) AddRows(mrs []MetricRow, precisionBits uint8) error {
	if len(mrs) == 0 {
		return nil
	}

	// Limit the number of concurrent goroutines that may add rows to the storage.
	// This should prevent from out of memory errors and CPU trashing when too many
	// goroutines call AddRows.
	select {
	case addRowsConcurrencyCh <- struct{}{}:    //If the channel is written successfully, the concurrency is less than the maximum number of cores. Then go to insert logic.
	default:  //If the channel insertion fails, it indicates that the co process of an insert operation is blocked. At this time, you need to notify the select coordinator to give up.
		// Sleep for a while until giving up
		atomic.AddUint64(&s.addRowsConcurrencyLimitReached, 1)
		t := timerpool.Get(addRowsTimeout)

		// Prioritize data ingestion over concurrent searches.
		storagepacelimiter.Search.Inc() // There is an atomic accumulation variable in the pacelimiter, which indicates how many insert operations are waiting

		select {
		case addRowsConcurrencyCh <- struct{}{}:  //Wait for a successful queue entry event within the timeout period.
			timerpool.Put(t)  //Put the timer back into the object pool to reduce GC
			storagepacelimiter.Search.Dec()  // The insert operation can be scheduled smoothly. The number of waiting atoms is reduced by one.
            // When the waiting quantity is 0, call cond Broadcast() to notify the select process to start working.
		case <-t.C:  //Wait 30 seconds
			timerpool.Put(t)
			storagepacelimiter.Search.Dec()
			atomic.AddUint64(&s.addRowsConcurrencyLimitTimeout, 1)
			atomic.AddUint64(&s.addRowsConcurrencyDroppedRows, uint64(len(mrs)))
			return fmt.Errorf("cannot add %d rows to storage in %s, since it is overloaded with %d concurrent writers; add more CPUs or reduce load",
				len(mrs), addRowsTimeout, cap(addRowsConcurrencyCh))
            // After waiting for 30 seconds, there is still no CPU resource, so you can only report an error
		}
	}
    // Here is the specific insertion logic
    <-addRowsConcurrencyCh  // After the execution of the insert logic is completed, get out of the queue

	return firstErr
}

3. select operation source code analysis

The select request does not distinguish between the IO process and the calculation process, because there are usually few query requests and the package is very small.

3.1 channel for querying concurrency limits

lib/storage/storage.go:1097

var (
	// Limit the concurrency for TSID searches to GOMAXPROCS*2, since this operation
	// is CPU bound and sometimes disk IO bound, so there is no sense in running more
	// than GOMAXPROCS*2 concurrent goroutines for TSID searches.
	searchTSIDsConcurrencyCh = make(chan struct{}, cgroup.AvailableCPUs()*2)
)

The number of concurrent queries is limited to twice the CPU core.

The processing code of query restriction is as follows: lib/storage/storage.go:1056

// searchTSIDs returns sorted TSIDs for the given tfss and the given tr.
func (s *Storage) searchTSIDs(tfss []*TagFilters, tr TimeRange, maxMetrics int, deadline uint64) ([]TSID, error) {
	// Do not cache tfss -> tsids here, since the caching is performed
	// on idb level.

	// Limit the number of concurrent goroutines that may search TSIDS in the storage.
	// This should prevent from out of memory errors and CPU trashing when too many
	// goroutines call searchTSIDs.
	select {
	case searchTSIDsConcurrencyCh <- struct{}{}:  //The processing idea is the same as that of insert concurrency limitation. Only after joining the team successfully can you enter the query logic
	default:
		// Sleep for a while until giving up
		atomic.AddUint64(&s.searchTSIDsConcurrencyLimitReached, 1)
		currentTime := fasttime.UnixTimestamp()
		timeoutSecs := uint64(0)
		if currentTime < deadline {
			timeoutSecs = deadline - currentTime  //Different from the timeout processing of insert, each query may have a different timeout time
		}
		timeout := time.Second * time.Duration(timeoutSecs)
		t := timerpool.Get(timeout)
		select {
		case searchTSIDsConcurrencyCh <- struct{}{}:
			timerpool.Put(t)
		case <-t.C:
			timerpool.Put(t)
			atomic.AddUint64(&s.searchTSIDsConcurrencyLimitTimeout, 1)
			return nil, fmt.Errorf("cannot search for tsids, since more than %d concurrent searches are performed during %.3f secs; add more CPUs or reduce query load",
				cap(searchTSIDsConcurrencyCh), timeout.Seconds())
		}
	}

3.2 implementation of active surrender of select collaboration

lib/storage/search.go:188

// NextMetricBlock proceeds to the next MetricBlockRef.
func (s *Search) NextMetricBlock() bool {
	if s.err != nil {
		return false
	}
	for s.ts.NextBlock() {
		if s.loops&paceLimiterSlowIterationsMask == 0 {  //After each 4095 execution, check whether there is an insert process waiting
			if err := checkSearchDeadlineAndPace(s.deadline); err != nil {
                // If there is an insert coroutine waiting, block it with a conditional variable in the WaitIfNeeded() method: cond Wait()
				s.err = err
				return false
			}
		}
		s.loops++
        //...
    }
    //...
}

Implementation details of WaitIfNeeded() method: lib/pacelimiter/pacelimiter.go:43

// WaitIfNeeded blocks while the number of Inc calls is bigger than the number of Dec calls.
func (pl *PaceLimiter) WaitIfNeeded() {
	if atomic.LoadInt32(&pl.n) <= 0 {
		// Fast path - there is no need in lock.
		return
	}
	// Slow path - wait until Dec is called.
	pl.mu.Lock()
	for atomic.LoadInt32(&pl.n) > 0 {  // n represents the number of high priority processes
		pl.delaysTotal++
		pl.cond.Wait()   // When n==0, trigger pl.cond Broadcast() to reschedule low priority processes
	}
	pl.mu.Unlock()
}

4. Summary

The number of key computing processes revolves around the number of available physical CPU cores. If the number of physical cores exceeds the number of coroutines, CPU resources will only be wasted on the coroutine scheduler.
Distinguish between high priority and low priority processes. Low priority processes should be able to give up actively.
A queue is used to represent the number of scheduled key processes. When the queue is blocked, it proves that there are key processes that are not scheduled. At this time, it is necessary to trigger the corresponding coordination mechanism. It feels like encapsulating some capabilities based on the golang scheduler.

Anyway, thank you valyala Great God, later we can directly import these codes to copy the homework.

Topics: Go Back-end computer

Programmer Think