balancer for grpc source code reading

Posted by kamsmartx on Wed, 19 Jan 2022 19:59:54 +0100

Balancer

gRPC balancer

background

Next, in the previous article, "Resolver of gRPC plug-in programming", gRPC parses the target into Resolver After the target is set, it passes the Resolver Builder. Build method call
resolver.ClientConn.UpdateState(State) error method. What does this method do? Let's move on to the source code in this article.

UpdateState

The call to UpdateState calls grpc ClientConn. Updateresolverstate method, which mainly does the following work:

  • ServiceConfig processing
  • BalancerWrapper create
  • Call balancer The updateclientconnstate method performs load balancing logical updates
func (cc *ClientConn) updateResolverState(s resolver.State, err error) error {
    ...
    cc.maybeApplyDefaultServiceConfig(s.Addresses)
    ...
    cc.applyServiceConfigAndBalancer(sc, configSelector, s.Addresses)
    ...
    // reference: balancer_conn_wrappers.go:164
    // bw.updateClientConnState -> ccBalancerWrapper.updateClientConnState
    bw.updateClientConnState(&balancer.ClientConnState{ResolverState: s, BalancerConfig: balCfg})
    ...
}

reminder

Here, we will focus on understanding the main idea of gRPC process without deducting too many details, such as GRPCLB processing, error processing, ServiceConfigSelector processing, etc. you can view the source code.

bw. The essence of the updateclientconnstate call is ccbalancerwrapper updateClientConnState
And ccbalancerwrapper Updateclientconnstate does one thing, calling balancer Balancer. Updateclientconnstate method

func (ccb *ccBalancerWrapper) updateClientConnState(ccs *balancer.ClientConnState) error {
    ccb.balancerMu.Lock()
    defer ccb.balancerMu.Unlock()
    return ccb.balancer.UpdateClientConnState(*ccs)
}

Here, we want to see that there are two ways to balance the source logic

  • Self implemented balancer Balancer
  • balancer provided by gRPC

In order to read the source code, we first read one of the several balancers provided by gRPC for process understanding, and then introduce how to customize a balancer

gRPC Balancer

gRPC provides several load balancing processes, as follows:

  • grpclb
  • rls
  • roundrobin
  • weightroundrobin
  • weighttarget

For better understanding, let's pick a simple load balancer roundrobin and continue reading.

Where does load balancing come from? Through the front CC From the source code of maybeapplydefaultserviceconfig (s.addresses) method, we can see that the balancer Balancer by balancer Builder
Yes, let's take a look at the balancer Builder interface

// Builder creates a balancer.
type Builder interface {
    // Build creates a new balancer with the ClientConn.
    Build(cc ClientConn, opts BuildOptions) Balancer
    // Name returns the name of balancers built by this builder.
    // It will be used to pick balancers (for example in service config).
    Name() string
}

roundrobin

roundrobin is the built-in load balancer of gRPC. Like resolver, it provides extensions through plug-in programming. We can see from the source code that,
roundrobin checks the balancer. In the init function Builder is registered, where baseBuilder is the balancer Implementation of builder,
As we know above, balancer Balancer by balancer Builder. Build, through basebuilder Build method we know the of gRPC
The bottom layer of balancer is implemented by baseBalancer. Some source codes are as follows:

roundrobin.go

// newBuilder creates a new roundrobin balancer builder.
func newBuilder() balancer.Builder {
    return base.NewBalancerBuilder(Name, &rrPickerBuilder{}, base.Config{HealthCheck: true})
}

func init() {
    balancer.Register(newBuilder())
}

balancer.go

func (bb *baseBuilder) Build(cc balancer.ClientConn, opt balancer.BuildOptions) balancer.Balancer {
    bal := &baseBalancer{
        cc:            cc,
        pickerBuilder: bb.pickerBuilder,
    
        subConns: resolver.NewAddressMap(),
        scStates: make(map[balancer.SubConn]connectivity.State),
        csEvltr:  &balancer.ConnectivityStateEvaluator{},
        config:   bb.config,
    }
    bal.picker = NewErrPicker(balancer.ErrNoSubConnAvailable)
    return bal
}

The last method along the UpdateState link is CCB balancer. Updateclientconnstate (* CCS) calls to read. In fact, it finally comes
baseBalancer. For the updateclientconnstate method, let's check the source code:

func (b *baseBalancer) UpdateClientConnState(s balancer.ClientConnState) error {
    ...
    addrsSet := resolver.NewAddressMap()
    for _, a := range s.ResolverState.Addresses {
        addrsSet.Set(a, nil)
        if _, ok := b.subConns.Get(a); !ok {
            sc, err := b.cc.NewSubConn([]resolver.Address{a}, balancer.NewSubConnOptions{HealthCheckEnabled: b.config.HealthCheck})
            if err != nil {
                logger.Warningf("base.baseBalancer: failed to create new SubConn: %v", err)
                continue
            }
            b.subConns.Set(a, sc)
            b.scStates[sc] = connectivity.Idle
            b.csEvltr.RecordTransition(connectivity.Shutdown, connectivity.Idle)
            sc.Connect()
        }
    }
    for _, a := range b.subConns.Keys() {
        sci, _ := b.subConns.Get(a)
        sc := sci.(balancer.SubConn)
        if _, ok := addrsSet.Get(a); !ok {
            b.cc.RemoveSubConn(sc)
            b.subConns.Delete(a)
        }
    }
    if len(s.ResolverState.Addresses) == 0 {
        b.ResolverError(errors.New("produced zero addresses"))
        return balancer.ErrBadResolverState
    }
    return nil
}

From the source code, this method does the following things:

  • Newsubconn and Connect to the new endpoint
  • Remove the old nonexistent endpoint and its Conn information

In general, it is to update the link information available in the load balancer.

balancer.ClientConn.NewSubConn

balancer.ClientConn is an interface, which represents a link of gRPC, and ccBalancerWrapper is its implementation class. First look at the declaration of the interface:

type ClientConn interface {
    // The NewSubConn balancer calls NewSubConn to create a new SubConn, which will not block and wait for the connection to be established,
    // SubConn's behavior can be controlled through NewSubConnOptions.
    NewSubConn([]resolver.Address, NewSubConnOptions) (SubConn, error)

    // RemoveSubConn removes a SubConn from a ClientConn. SubConn will close.
    RemoveSubConn(SubConn)
    // Updateaddress updates the address used in the incoming SubConn, and gRPC checks whether the address of the current connection is still in the new list. If it exists, it will remain connected,
    // Otherwise, the connection will close normally and a new connection will be created.
    // This triggers a state transition for SubConn.
    
    UpdateAddresses(SubConn, []resolver.Address)
    
    // UpdateState notifies gRPC balancer that the internal state has changed.
    // gRPC will update the connection status of ClientConn and call Pick on the new Picker to select a new SubConn.
    UpdateState(State)
    
    // The balancer calls ResolveNow to notify gRPC for name resolution.
    ResolveNow(resolver.ResolveNowOptions)
    
    // Target returns the dialing target of this ClientConn.
    // Deprecated: use Target field in BuildOptions instead
    Target() string
}

Take another look at the creation of ccBalancerWrapper:

func newCCBalancerWrapper(cc *ClientConn, b balancer.Builder, bopts balancer.BuildOptions) *ccBalancerWrapper {
    ccb := &ccBalancerWrapper{
        cc:       cc,
        updateCh: buffer.NewUnbounded(),
        closed:   grpcsync.NewEvent(),
        done:     grpcsync.NewEvent(),
        subConns: make(map[*acBalancerWrapper]struct{}),
    }
    go ccb.watcher()
    ccb.balancer = b.Build(ccb, bopts)
    _, ccb.hasExitIdle = ccb.balancer.(balancer.ExitIdler)
    return ccb
}

be careful

Remember go CCB The line of watcher () will return to this method later.

baseBalancer. In updateclientconnstate, the newly added endpoint is processed with NewSubConn and Connect. Let's see what the NewSubConn method does first,
Go to ccbalancerwrapper In the newsubconn method:

func (ccb *ccBalancerWrapper) NewSubConn(addrs []resolver.Address, opts balancer.NewSubConnOptions) (balancer.SubConn, error) {
    if len(addrs) <= 0 {
        return nil, fmt.Errorf("grpc: cannot create SubConn with empty address list")
    }
    ccb.mu.Lock()
    defer ccb.mu.Unlock()
    if ccb.subConns == nil {
        return nil, fmt.Errorf("grpc: ClientConn balancer wrapper was closed")
    }
    ac, err := ccb.cc.newAddrConn(addrs, opts)
    if err != nil {
        return nil, err
    }
    acbw := &acBalancerWrapper{ac: ac}
    acbw.ac.mu.Lock()
    ac.acbw = acbw
    acbw.ac.mu.Unlock()
    ccb.subConns[acbw] = struct{}{}
    return acbw, nil
}

It can be seen from this method that it is mainly through GPRC ClientConn. Newaddrconn creates an addrConn object and creates a
balancer. The implementation class object of subconn is acBalancerWrapper, which is added to ccbalancerwrapper Management in subconns.

explain

Therefore, basebalancer Updateclientconnstate determines whether the address after the address change is a new one
ccBalancerWrapper.subConns to compare.

Next, let's continue to see what Connect has done. A balancer has been created through acBalancerWrapper Subconn implementation object, and then use this object to
After the Connect method call, we come to acbalancerwrapper In the Connect() method:

func (acbw *acBalancerWrapper) Connect() {
    acbw.mu.Lock()
    defer acbw.mu.Unlock()
    go acbw.ac.connect()
}
func (ac *addrConn) connect() error {
    ac.mu.Lock()
    if ac.state == connectivity.Shutdown {
        ac.mu.Unlock()
        return errConnClosing
    }
    if ac.state != connectivity.Idle {
        ac.mu.Unlock()
        return nil
    }
    ac.updateConnectivityState(connectivity.Connecting, nil)
    ac.mu.Unlock()
    
    ac.resetTransport()
    return nil
}

ac.updateConnectivityState updates the link state. ac.resetTransport mainly works from resolver Create a link in the address list according to, and also call ac.updateConnectivityState to update the status. The specific source code can be read by yourself,
Let's move on to the ac.updateConnectivityState method, which actually calls grpc ClientConn. The handlesubconnstatechange method finally returns to ccbalancerwrapper In the handlesubconnstatechange method, the method call chain is as follows:

ac.updateConnectivityState -> grpc.ClientConn.handleSubConnStateChange -> ccBalancerWrapper.handleSubConnStateChange

Let's take a look at the last method, ccbalancerwrapper Source code of handlesubconnstatechange:

func (ccb *ccBalancerWrapper) handleSubConnStateChange(sc balancer.SubConn, s connectivity.State, err error) {
    if sc == nil {
        return
    }
    ccb.updateCh.Put(&scStateUpdate{
        sc:    sc,
        state: s,
        err:   err,
    })
}

This method puts a balancer Subconn and connectivity The state is dropped into a slice, and then another goroutine is controlled to fetch data through one channel

func (b *Unbounded) Put(t interface{}) {
    b.mu.Lock()
    if len(b.backlog) == 0 {
        select {
        case b.c <- t:
            b.mu.Unlock()
            return
        default:
        }
    }
    b.backlog = append(b.backlog, t)
    b.mu.Unlock()
}

After the data here is written, where is it read? This goes back to the goroutine call that we need to remember above. Remember, try to recall, yes, go CCB watcher()

Let's take a look at the watcher method. It can be seen from the above that the data we write is the scStateUpdate object. Therefore, the following source code only depends on the case of obtaining the object, and the code that does not need attention for the time being is omitted:

func (ccb *ccBalancerWrapper) watcher() {
    for {
        select {
        case t := <-ccb.updateCh.Get():
            ccb.updateCh.Load()
            if ccb.closed.HasFired() {
                break
            }
            switch u := t.(type) {
            case *scStateUpdate:
                ccb.balancerMu.Lock()
                ccb.balancer.UpdateSubConnState(u.sc, balancer.SubConnState{ConnectivityState: u.state, ConnectionError: u.err})
                ccb.balancerMu.Unlock()
            case ...:
                ...
            default:
                logger.Errorf("ccBalancerWrapper.watcher: unknown update %+v, type %T", t, t)
            }
        case <-ccb.closed.Done():
        }
        ...
    }
}

According to the source code, it finally calls balancer Balancer. The updatesubconnstate method is viewed with the roundrobin load balancer. As we know above, the final implementation class of the balancer of gRPC is
baseBalancer, so balancer Balancer. Updatesubconnstate finally falls to the baseBalancer On the updatesubconnstate method,

func (b *baseBalancer) UpdateSubConnState(sc balancer.SubConn, state balancer.SubConnState) {
    s := state.ConnectivityState
    ...
    oldS, ok := b.scStates[sc]
    if !ok {
        ...
        return
    }
    if oldS == connectivity.TransientFailure &&
        (s == connectivity.Connecting || s == connectivity.Idle) {
        if s == connectivity.Idle {
            sc.Connect()
        }
        return
    }
    b.scStates[sc] = s
    switch s {
    case connectivity.Idle:
        sc.Connect()
    case connectivity.Shutdown:
        // When an address was removed by resolver, b called RemoveSubConn but
        // kept the sc's state in scStates. Remove state for this sc here.
        delete(b.scStates, sc)
    case connectivity.TransientFailure:
        // Save error to be reported via picker.
        b.connErr = state.ConnectionError
    }
    
    b.state = b.csEvltr.RecordTransition(oldS, s)
    ...
    if (s == connectivity.Ready) != (oldS == connectivity.Ready) ||
        b.state == connectivity.TransientFailure {
        b.regeneratePicker()
    }
    b.cc.UpdateState(balancer.State{ConnectivityState: b.state, Picker: b.picker})
}

In this method, there will only be state connectivity Ready SubConn goes down, and other statuses are either re initiated Connect or removed
The last line of code initiates the balancer ClientConn. Updatestate call because ccBalancerWrapper is balancer The implementation of clientconn, so it comes to
balancer. ClientConn. Under updatestate, this method does two things:

  • Update balancer Picker
  • Call grpc connectivityStateManager. Updatestate method, which releases a channel signal to inform goroutine to process information. We will talk about this goroutine later.

As mentioned above, where and when are the load algorithms called?
As can be seen from the above, basebalancer Updatesubconnstate updates a picker. Where does this picker come from? Looking back at the source code and the roundrobin load balancer, we can see that the picker is in
balancer. The implementation class of builder calls base Base. Passed in when newbalancerbuilder creates an instance Pickbuilder is constructed from the implementation class rrPickerBuilder. Take a look at rrPickerBuilder
It can be seen from the source code that the Pick method is the specific logic of the load algorithm for SubConn.

func (p *rrPicker) Pick(balancer.PickInfo) (balancer.PickResult, error) {
    p.mu.Lock()
    sc := p.subConns[p.next]
    p.next = (p.next + 1) % len(p.subConns)
    p.mu.Unlock()
    return balancer.PickResult{SubConn: sc}, nil
}

So when is the method called? The answer is given directly here in grpc When clientconn initiates an Invoke method call, it will be called through the call chain. We will read the source code there for analysis.

Custom load balancer

To customize the load balancer, you first need to understand the plug-in programming of gRPC. You can google this part by yourself.

environment

etcd
go

Load balancing objectives

Random selection

  1. Implement balancer Builder
    We won't implement its methods one by one, because the load balancer focuses on the load balancing algorithm, that is, the implementation of base Pickerbuilder, we directly use the base provided by gRPC Newbalancerbuilder to create a balancer Builder

    const Name = "random"
    
    func init() {
     balancer.Register(newBuilder())
    }
    
    func newBuilder() balancer.Builder {
     return base.NewBalancerBuilder(Name, &randomPickerBuilder{}, base.Config{HealthCheck: true})
    }
  2. Implement base PickerBuilder

    func (r *randomPickerBuilder) Build(info base.PickerBuildInfo) balancer.Picker {
     if len(info.ReadySCs) == 0 {
         return base.NewErrPicker(balancer.ErrNoSubConnAvailable)
     }
     readyScs := make([]Conn, 0, len(info.ReadySCs))
     for sc, info := range info.ReadySCs {
         readyScs = append(readyScs, Conn{
             SubConn:     sc,
             SubConnInfo: info,
         })
     }
     return &randomPicker{
         subConns: readyScs,
         r:        rand.New(rand.NewSource(time.Now().UnixNano())),
     }
    }
  3. Implement balancer Picker
    balancer.Picker is the logic we need to extend, that is, select an available SubConn from the SunConn list to create a link according to the desired load balancing algorithm.
func (r *randomPicker) Pick(_ balancer.PickInfo) (balancer.PickResult, error) {
    next := r.r.Int() % len(r.subConns)
    sc := r.subConns[next]
    fmt.Printf("picked: %+v\n", sc.SubConnInfo.Address.Addr)
    return balancer.PickResult{
        SubConn: sc.SubConn,
    }, nil
}
  1. Using a custom load balancer

    r := resolverBuilder.NewCustomBuilder(resolverBuilder.Scheme)
    options := []grpc.DialOption{grpc.WithInsecure(), grpc.WithResolvers(r), grpc.WithBalancerName(builder.Name)}
    conn, err := grpc.Dial(resolverBuilder.Format("grpc-server"), options...)

Demonstration effect

  1. Start multiple server instances. I started three here

    $ go run server.go -addr localhost:8888
    $ go run server.go -addr localhost:8889
    $ go run server.go -addr localhost:8890
  2. Start the client several times and observe the log output of Pick

    go run client.go
    endpoints:  [localhost:8888 localhost:8889 localhost:8888 localhost:8889 localhost:8890]
    picked: localhost:8888
    output:  hi
go run client.go
endpoints:  [localhost:8888 localhost:8889 localhost:8888 localhost:8889 localhost:8890]
picked: localhost:8890
output:  hi
go run client.go
endpoints:  [localhost:8888 localhost:8889 localhost:8888 localhost:8889 localhost:8890]
picked: localhost:8889
output:  hi

...

summary

Grpc obtains the endpoints of the instance of gRPC server through service discovery or direct connection, and then notifies the load balancer to update SubConn, create an instance of the newly added endpoint, and remove the abandoned endpoint,
Finally, the Ready SubConn is managed through status update. When gRPC calls Invoke (i.e. the client initiates the request), it will select a SubConn according to a load balancing algorithm through the Picker in the load balancer
Create a link. If the link is created successfully, no other SubConn attempts will be made. Otherwise, it will be retried according to a certain backoff algorithm until the backoff fails or the link is created successfully.

The core logic of the user-defined load balancer lies in the implementation of Picker. Select a SubConn from the SubConn list according to the load balancing algorithm to create a link. Like Resolver, the user-defined load balancer uses plug-in programming and provides expansion capability.

This source code reading is only to understand the calling process of gRPC. Many details are explained in the source code notes, which can deepen our understanding of gRPC. Therefore, after understanding the introduction of this article, you can read the source code again to deepen your understanding.

Source code

https://github.com/anqiansong...

Topics: Go source code analysis grpc