Go WaitGroup source code analysis

Posted by ladokha on Sat, 08 Jan 2022 20:20:45 +0100

WaitGroup is a concurrency control method often used in development. Its source code is in Src / sync / WaitGroup In the go file, 1 Structure and 4 methods are defined:

  • WaitGroup {}: structure.
  • state(): the internal method is called in Add() and Wait().
  • Add(): adds the number of tasks.
  • Done(): completing a task is actually Add(-1).
  • Wait(): block waiting for all tasks to complete.

The following source code is based on go version 1.17.5 and has been deleted.

$ go version
go version go1.17.5 darwin/amd64

Before learning, you can understand some concepts:

  • Refer to the related contents of structure alignment Previous notes.
  • There are two semaphore functions:
    runtime_Semacquire means to add a semaphore and suspend the current goroutine. Used in Wait().
    runtime_Semrelease means to reduce a semaphore and wake up one of the waiting goroutine s on sema. Used in Add().
  • unsafe.Pointer is used to convert various pointers to each other;
    uintptr is a built-in type of golang. It can store pointer integers. Its underlying type is int, which can be combined with unsafe Point to point conversion.

1, Structure

1.1 composition of state1 array

type WaitGroup struct {
    // Indicates that 'WaitGroup' cannot be copied and can only be passed by pointer to ensure global uniqueness.
    noCopy noCopy
    // state1 = state(*unit64) + sema(*unit32)
    // state = counter + waiter
    state1 [3]uint32
}

state1 is a uint32 array, which contains the total number of counter s, the waiting number of waiter s and semaphores of semaphores, where:

  • counter: the count value of the sub goroutine set through Add().
  • Waiters: the number of waiter s caught blocking through Wait().
  • sema: semaphore.

1.2 location of state and sema

In fact, counter and water are used together as a 64 bit integer, so the state1 array can be regarded as composed of * unit64 state and * unit32 sema, that is:

state1 = state + sema,
among state = counter + waiter. 

4-byte alignment in 32-bit system and 8-byte alignment in 64 bit system. The following internal method state() is used to judge.

The state() method takes out the state stored in the state1 array. The return value statep is the state of the counter, that is, the whole of counter and water, and semap is the semaphore.

func (wg *WaitGroup) state() (statep *uint64, semap *uint32) {
    // Determine whether 64 bit alignment
    if uintptr(unsafe.Pointer(&wg.state1))%8 == 0 {
        return (*uint64)(unsafe.Pointer(&wg.state1)), &wg.state1[2]
    } else {
        return (*uint64)(unsafe.Pointer(&wg.state1[1])), &wg.state1[0]
    }
}

In state(), after converting the address allocated by the runtime into uintptr, and then%8, judge whether the result is equal to 0. If it is 0, it indicates that the allocated address is 64 bit aligned.

  • If it is 64 bit aligned, the first two bits of the array are state and the last bit is sema;
  • If it is not 64 bit aligned, the first bit is sema (32-bit) and the last two bits are state.
Alignment state[0] state[1] state[2]
64 bit waiter counter sema
32 bit sema waiter counter

When we initialize a waitGroup object, its counter value, water value and sema value are all 0.

1.3 why is the state1 array so designed

Why design counter and water as a whole? This is because atomic is used for state 64 operations, such as:

  • Add()
state := atomic.AddUint64(statep, uint64(delta)<<32)
  • Wait()
state := atomic.LoadUint64(statep)
if atomic.CompareAndSwapUint64(statep, state, state+1) {}

To ensure the 64 bit atomicity of state, it is necessary to ensure that the data is read into memory at one time, and to ensure this one-time, it is necessary to ensure that state is 64 bit aligned.

2, Add() function

Using 64 bit atomic addition, add delta to the counter (delta may be negative). When the counter becomes zero, wake up the waiting goroutine through the semaphore. Here, Add() is analyzed in several steps:

  • step 1: get the pointers corresponding to counter, water, and sema, and add delta to counter.
// Get pointers to statep and semap, that is, counter, water and sema
statep, semap := wg.state() 
// Move the delta left by 32 bits and add it to the state, that is, add the waiting couter using atoms and add Delta
state := atomic.AddUint64(statep, uint64(delta)<<32)
v := int32(state >> 32) // The lower 32 bits are couter, that is, increased. Note that this is converted to int32 type
w := uint32(state)      // The upper 32 bits are waiter
  • step 2: counter is not allowed to be negative. Otherwise, panic is reported.
if v < 0 {
    panic("sync: negative WaitGroup counter")
}

counter is the number of active goroutine s, which must be greater than 0. If it is negative, there are two cases:

The first is Add(), where delta is directly negative. After atomic addition, counter is less than 0, which is generally not written in this way;
The second is to execute Done(), that is, when Add(-1) is executed, the previous goroutine is reduced to 0. Before the execution is completed, it is suspended, and another Done() comes, and the logic goes wrong.

  • step 3: Wait has been executed. Add is not allowed at this time.
if w != 0 && delta > 0 && v == int32(delta) {
    panic("sync: WaitGroup misuse: Add called concurrently with Wait")
}

Water is the number of goroutine s waiting. There are only two operations: adding 1 and setting zero, so it must be greater than or equal to 0. When adding (n) for the first time, counter=n, water = 0, w= 0 indicates that Wait has been executed;

Delta > 0 indicates that this is an addition operation. If v == int32(delta), that is, v + delta == delta, deduces v=0, it may be the first time to Add() or execute Add(-1) to reduce V to 0, that is, Wait first and then Add.

  • step 4: if counter > 0 or water = 0, return directly.
if v > 0 || w == 0 {
    return
}

After accumulation, counter > = 0.

If the counter is positive, it means there is no need to release the semaphore and exit directly;
If the waiter is 0, it means that there is no waiting person, and there is no need to release the semaphore. Exit directly.

  • step 5: check whether WaitGroup is abused, that is, Add cannot be called concurrently with Wait.
if *statep != state {
    panic("sync: WaitGroup misuse: Add called concurrently with Wait")    
}

After execution, counter = 0 & & water > 0 indicates that the previous Done has been completed, the counter is cleared, and it is time to release the signal to wake up all goroutine s in the wait. If the state status changes at this time, it indicates that someone has modified it and added it, reporting panic.

This step of judgment is equivalent to a lock to ensure that WaitGroup is not abused.

  • step 6: release all queued waiter s.
*statep = 0
    for ; w != 0; w-- {
        runtime_Semrelease(semap, false, 0)
}

If it is executed here, it must be a negative delta operation. counter=0 and water > 0 indicate that the task has been completed, there is no active goroutine, and the semaphore needs to be released. Set all States to 0 and release all blocked waiter s.

3, Wait() function

The main goroutine executing the Wait() function will add 1 to the wait value and block. Wait until the value is 0 before continuing to execute subsequent code.

func (wg *WaitGroup) Wait() {
    // Get pointers to statep and semap, that is, counter, water and sema
    statep, semap := wg.state()
    
    for {// Note that this is in an endless loop 
        state := atomic.LoadUint64(statep)// Atomic operation
        v := int32(state >> 32) // couter
        w := uint32(state)      // waiter
        
        // If the counter is 0, it means that all goroutine s exit without waiting
        if v == 0 {
            return
        }
        
        // Add waiter for CAS operation
        if atomic.CompareAndSwapUint64(statep, state, state+1) {
            // Once semaphore sema is greater than 0, the current goroutine is suspended
            runtime_Semacquire(semap)
            
            // The Add() function will set the counter and water to 0 before triggering the semaphore, so * statep must be 0 at this time. If * statep is not 0, it means that the WaitGroup has been reused before the waiter completes the Wait() and performs the Add() or Wait() operation.
            if *statep != 0 {
                panic("sync: WaitGroup is reused before previous Wait has returned")
            }
            return
        }
    }
}

4, Competitive analysis

In Add() and Wait(), there is data competition for the operation of state data:

write read
Add() Add delta to counter Finally, when the signal is released, you need to read waiter and sema
Wait() CAS operation, add 1 to waiter and increase sema phore Read counter and return directly if it is 0

To solve the data competition, you can lock the state1 array before the operation and release the lock after the operation. This certainly has no security problem, but it is inefficient.

Data competition is solved in the source code without using locks. It is solved in several cases:

  • Add and add concurrent

If multiple Add numbers are added at the same time, only Add numbers, whether positive or negative. As long as the counter is greater than 0, return directly. Because it is atomic addition, there is always a sequence to ensure that it will not be lost.

if v > 0 || w == 0 {
    return
}

If the counter is equal to 0 after adding a negative number, the signal will be released at this time. Other adds cannot be allowed to change this data at the same time.

if w != 0 && delta > 0 && v == int32(delta) {
    panic("sync: WaitGroup misuse: Add called concurrently with Wait")
}
  • Add and Wait concurrent
    If the counter is equal to 0 after Add plus a negative number, the signal is released at this time, and Wait is not allowed to modify the data. If Wait reads state first and then changes state, panic will appear.
if *statep != state {
    panic("sync: WaitGroup misuse: Add called concurrently with Wait")
}

5, Case analysis

func main() {
    var wg sync.WaitGroup...............①

    wg.Add(2)...........................②
    
    go func() { 
        fmt.Println(1)
        wg.Done().......................③
    }()

	go func() {
        fmt.Println(2)
        wg.Done().......................④
    }()
	
    wg.Wait()...........................⑤
    
	fmt.Println("all work done!")
}

After 1 and 2 are executed, 3, 4 and 5 are executed randomly.

  • Assuming that it is executed in the order of [1, 2, 3, 4, 5], the values of counter and water change as follows:
    ① counter=0, water = 0 / / the default value of initialization is 0
    ② counter=2, water = 0 / / the atomic addition operation adds 2 to the counter
    ③ counter=1, water = 0 / / complete a Done, subtract 1 from the counter, and the counter changes from 2 to 1
    ④ counter=0, water = 0 / / another Done is completed. Subtract 1 from the counter and the counter becomes 0. If V > 0 or w=0 is satisfied, return directly without sending a signal
    ⑤ counter=0, water = 0 / / because v=0, return directly without CAS operation

  • Assuming that it is executed in the order of [1, 2, 5, 3 and 4], the values of counter and water change as follows:
    ① counter=0, water = 0 / / the default value of initialization is 0
    ② counter=2, water = 0 / / the atomic addition operation adds 2 to the counter
    ⑤ counter=2, water = 1 / / CAS adds 1 to the water, so the water changes from 0 to 2
    ③ counter=1, water = 1, complete a Done, subtract 1 from the counter, and the counter changes from 2 to 1
    ④ counter=0, water = 1, complete another Done, subtract 1 from the counter, the counter becomes 0, send a signal to inform the water that it is no longer blocked, and main continues to execute

Topics: Go