#IT star is not a dream

Posted by scheda on Mon, 17 Feb 2020 08:17:57 +0100

At present, k8s mainly supports two kinds of resources: CPU and memory. In order to support the scheduling and allocation of other types of hardware resources that users need to allocate on demand, k8s implements the device plugin framework for resource integration of other types of hardware. For example, now machine learning uses GPU and other resources. Today, let's look at its internal key implementation

1. Basic concepts

1.1 integration mode

1.1.1 DaemonSet and services

When we want to integrate local hardware resources, we can run a GRPC service on the current node through DaemonSet, through which we can report and allocate local hardware resources

1.1.2 service registration design

When the hardware service needs to communicate with kubelet, it needs to register first. The way of registration is to register the service through the most original socket file and inotify mechanism of Linux file system

1.2 plug in service awareness

1.2.1 Watcher

Watcher is mainly responsible for sensing the services registered on the current node. When new plug-in services are found to be registered, corresponding events will be generated and registered in the current kubelet

1.2.2 expected state and actual state

The status here mainly refers to whether registration is needed, because kubelet communicates with the corresponding plug-in service through the network. When there is a problem in the network or the corresponding plug-in service fails, the service registration may fail, but the socket of the corresponding service still exists, that is, the corresponding plug-in service still exists

At this time, there will be two states: expected state and actual state. Because socket exists, the expected state of the service actually needs to register the plug-in service, but in fact, for some reasons, the plug-in service has not completed the registration. In the future, it will continuously adjust the actual state through the expected state, so as to achieve consistency

1.2.3 Coordinator

The coordinator is the core to complete the operation between the two states. It can complete the consistency between the expected state and the actual state by calling the callback function of the corresponding plug-in, in fact, calling the corresponding grpc interface

1.2.4 plug in controller

For each type of plug-in, there will be a corresponding controller, which means to implement the corresponding device registration and de registration, and complete the allocation and collection of underlying resources (ListWatch)

2. Plug in service discovery

2.1 core data structure

type Watcher struct {
    // Path of socket registered by plug-in aware service
    path                string
    fs                  utilfs.Filesystem
    // inotify monitoring plug-in service socket changes
    fsWatcher           *fsnotify.Watcher
    stopped             chan struct{}
    // Store expected state
    desiredStateOfWorld cache.DesiredStateOfWorld
}

2.2 initialization

Initialization is to create the corresponding directory

func (w *Watcher) init() error {
    klog.V(4).Infof("Ensuring Plugin directory at %s ", w.path)

    if err := w.fs.MkdirAll(w.path, 0755); err != nil {
        return fmt.Errorf("error (re-)creating root %s: %v", w.path, err)
    }

    return nil
}

2.3 plug in service discovery core

    go func(fsWatcher *fsnotify.Watcher) {
        defer close(w.stopped)
        for {
            select {
            case event := <-fsWatcher.Events:
                //If the change of the file in the corresponding directory is found, the corresponding event will be triggered
                if event.Op&fsnotify.Create == fsnotify.Create {
                    err := w.handleCreateEvent(event)
                    if err != nil {
                        klog.Errorf("error %v when handling create event: %s", err, event)
                    }
                } else if event.Op&fsnotify.Remove == fsnotify.Remove {
                    w.handleDeleteEvent(event)
                }
                continue
            case err := <-fsWatcher.Errors:
                if err != nil {
                    klog.Errorf("fsWatcher received error: %v", err)
                }
                continue
            case <-stopCh:
                // In case of plugin watcher being stopped by plugin manager, stop
                // probing the creation/deletion of plugin sockets.
                // Also give all pending go routines a chance to complete
                select {
                case <-w.stopped:
                case <-time.After(11 * time.Second):
                    klog.Errorf("timeout on stopping watcher")
                }
                w.fsWatcher.Close()
                return
            }
        }
    }(fsWatcher)

2.4 compensation mechanism

In fact, the compensation mechanism is to re register the existing socket into the current kubelet when restarting the kubelet

func (w *Watcher) traversePluginDir(dir string) error {
    return w.fs.Walk(dir, func(path string, info os.FileInfo, err error) error {
        if err != nil {
            if path == dir {
                return fmt.Errorf("error accessing path: %s error: %v", path, err)
            }

            klog.Errorf("error accessing path: %s error: %v", path, err)
            return nil
        }

        switch mode := info.Mode(); {
        case mode.IsDir():
            if err := w.fsWatcher.Add(path); err != nil {
                return fmt.Errorf("failed to watch %s, err: %v", path, err)
            }
        case mode&os.ModeSocket != 0:
            event := fsnotify.Event{
                Name: path,
                Op:   fsnotify.Create,
            }
            //TODO: Handle errors by taking corrective measures
            if err := w.handleCreateEvent(event); err != nil {
                klog.Errorf("error %v when handling create event: %s", err, event)
            }
        default:
            klog.V(5).Infof("Ignoring file %s with mode %v", path, mode)
        }

        return nil
    })
}

2.5 register event callback

In fact, registration only needs to pass the perceived socket file path to the desired state for management

func (w *Watcher) handlePluginRegistration(socketPath string) error {
    if runtime.GOOS == "windows" {
        socketPath = util.NormalizePath(socketPath)
    }
    // Call expected state for update
    klog.V(2).Infof("Adding socket path or updating timestamp %s to desired state cache", socketPath)
    err := w.desiredStateOfWorld.AddOrUpdatePlugin(socketPath)
    if err != nil {
        return fmt.Errorf("error adding socket path %s or updating timestamp to desired state cache: %v", socketPath, err)
    }
    return nil
}

2.6 delete event callback

In fact, registration only needs to pass the perceived socket file path to the desired state for management

func (w *Watcher) handleDeleteEvent(event fsnotify.Event) {
    klog.V(6).Infof("Handling delete event: %v", event)

    socketPath := event.Name
    klog.V(2).Infof("Removing socket path %s from desired state cache", socketPath)
    w.desiredStateOfWorld.RemovePlugin(socketPath)
}

3. Expected state and actual state

3.1 plug in information

The plug-in information only stores the path of the corresponding socket and the latest update time

type PluginInfo struct {
    SocketPath string
    Timestamp  time.Time
}

3.2 expected state

The expected state and the actual state are the same in data structure, because in essence, they are only used to store the current state information of the plug-in, i.e. update time, which is not covered here

type desiredStateOfWorld struct {
    socketFileToInfo map[string]PluginInfo
    sync.RWMutex
}
type actualStateOfWorld struct {

    socketFileToInfo map[string]PluginInfo
    sync.RWMutex
}

4.OperationExecutor

At present, there are two types of plug-in management supported in k8s. One is DevicePlugin, which is the concept we mentioned in this paper. The other is CSIPlugin. In fact, the processing of each type of driver is different internally. In fact, before the operation, we need to know what type of driver the current driver is

The operation executor mainly does this. It generates different operations to be executed according to different plugin types, that is, the corresponding plugin type gets the corresponding handler, and then generates an operation to be executed

4.1 generate callback function of registration plug-in

4.1.1 connect the corresponding plug-in service through socket

    registerPluginFunc := func() error {
        client, conn, err := dial(socketPath, dialTimeoutDuration)
        if err != nil {
            return fmt.Errorf("RegisterPlugin error -- dial failed at socket %s, err: %v", socketPath, err)
        }
        defer conn.Close()

        ctx, cancel := context.WithTimeout(context.Background(), time.Second)
        defer cancel()

        infoResp, err := client.GetInfo(ctx, &registerapi.InfoRequest{})
        if err != nil {
            return fmt.Errorf("RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket %s, err: %v", socketPath, err)
        }

4.1.2 verifying services by plug-in type

        handler, ok := pluginHandlers[infoResp.Type]
        if !ok {
            if err := og.notifyPlugin(client, false, fmt.Sprintf("RegisterPlugin error -- no handler registered for plugin type: %s at socket %s", infoResp.Type, socketPath)); err != nil {
                return fmt.Errorf("RegisterPlugin error -- failed to send error at socket %s, err: %v", socketPath, err)
            }
            return fmt.Errorf("RegisterPlugin error -- no handler registered for plugin type: %s at socket %s", infoResp.Type, socketPath)
        }

        if infoResp.Endpoint == "" {
            infoResp.Endpoint = socketPath
        }
        if err := handler.ValidatePlugin(infoResp.Name, infoResp.Endpoint, infoResp.SupportedVersions); err != nil {
            if err = og.notifyPlugin(client, false, fmt.Sprintf("RegisterPlugin error -- plugin validation failed with err: %v", err)); err != nil {
                return fmt.Errorf("RegisterPlugin error -- failed to send error at socket %s, err: %v", socketPath, err)
            }
            return fmt.Errorf("RegisterPlugin error -- pluginHandler.ValidatePluginFunc failed")
        }

4.1.3 register the plug-in to the actual state

        err = actualStateOfWorldUpdater.AddPlugin(cache.PluginInfo{
            SocketPath: socketPath,
            Timestamp:  timestamp,
        })
        if err != nil {
            klog.Errorf("RegisterPlugin error -- failed to add plugin at socket %s, err: %v", socketPath, err)
        }
            // Calling the plug-in's registration callback function
        if err := handler.RegisterPlugin(infoResp.Name, infoResp.Endpoint, infoResp.SupportedVersions); err != nil {
            return og.notifyPlugin(client, false, fmt.Sprintf("RegisterPlugin error -- plugin registration failed with err: %v", err))
        }

4.1.4 notify corresponding service registration succeeded

        if err := og.notifyPlugin(client, true, ""); err != nil {
            return fmt.Errorf("RegisterPlugin error -- failed to send registration status at socket %s, err: %v", socketPath, err)
        }

4.2 build registered client through socket

func dial(unixSocketPath string, timeout time.Duration) (registerapi.RegistrationClient, *grpc.ClientConn, error) {
    ctx, cancel := context.WithTimeout(context.Background(), timeout)
    defer cancel()

    c, err := grpc.DialContext(ctx, unixSocketPath, grpc.WithInsecure(), grpc.WithBlock(),
        grpc.WithContextDialer(func(ctx context.Context, addr string) (net.Conn, error) {
            return (&net.Dialer{}).DialContext(ctx, "unix", addr)
        }),
    )

    if err != nil {
        return nil, nil, fmt.Errorf("failed to dial socket %s, err: %v", unixSocketPath, err)
    }

    return registerapi.NewRegistrationClient(c), c, nil
}

Let's go to here today. The next chapter will continue to introduce how to combine the above components and the implementation of the default callback management mechanism. Thank you for exploring here. Thanks for sharing. No money for reversing

k8s source reading e-book address: https://www.yuque.com/baxiaoshi/tyado3

Topics: socket kubelet inotify network