At present, k8s mainly supports two kinds of resources: CPU and memory. In order to support the scheduling and allocation of other types of hardware resources that users need to allocate on demand, k8s implements the device plugin framework for resource integration of other types of hardware. For example, now machine learning uses GPU and other resources. Today, let's look at its internal key implementation
1. Basic concepts
1.1 integration mode
1.1.1 DaemonSet and services
When we want to integrate local hardware resources, we can run a GRPC service on the current node through DaemonSet, through which we can report and allocate local hardware resources
1.1.2 service registration design
When the hardware service needs to communicate with kubelet, it needs to register first. The way of registration is to register the service through the most original socket file and inotify mechanism of Linux file system
1.2 plug in service awareness
1.2.1 Watcher
Watcher is mainly responsible for sensing the services registered on the current node. When new plug-in services are found to be registered, corresponding events will be generated and registered in the current kubelet
1.2.2 expected state and actual state
The status here mainly refers to whether registration is needed, because kubelet communicates with the corresponding plug-in service through the network. When there is a problem in the network or the corresponding plug-in service fails, the service registration may fail, but the socket of the corresponding service still exists, that is, the corresponding plug-in service still exists
At this time, there will be two states: expected state and actual state. Because socket exists, the expected state of the service actually needs to register the plug-in service, but in fact, for some reasons, the plug-in service has not completed the registration. In the future, it will continuously adjust the actual state through the expected state, so as to achieve consistency
1.2.3 Coordinator
The coordinator is the core to complete the operation between the two states. It can complete the consistency between the expected state and the actual state by calling the callback function of the corresponding plug-in, in fact, calling the corresponding grpc interface
1.2.4 plug in controller
For each type of plug-in, there will be a corresponding controller, which means to implement the corresponding device registration and de registration, and complete the allocation and collection of underlying resources (ListWatch)
2. Plug in service discovery
2.1 core data structure
type Watcher struct { // Path of socket registered by plug-in aware service path string fs utilfs.Filesystem // inotify monitoring plug-in service socket changes fsWatcher *fsnotify.Watcher stopped chan struct{} // Store expected state desiredStateOfWorld cache.DesiredStateOfWorld }
2.2 initialization
Initialization is to create the corresponding directory
func (w *Watcher) init() error { klog.V(4).Infof("Ensuring Plugin directory at %s ", w.path) if err := w.fs.MkdirAll(w.path, 0755); err != nil { return fmt.Errorf("error (re-)creating root %s: %v", w.path, err) } return nil }
2.3 plug in service discovery core
go func(fsWatcher *fsnotify.Watcher) { defer close(w.stopped) for { select { case event := <-fsWatcher.Events: //If the change of the file in the corresponding directory is found, the corresponding event will be triggered if event.Op&fsnotify.Create == fsnotify.Create { err := w.handleCreateEvent(event) if err != nil { klog.Errorf("error %v when handling create event: %s", err, event) } } else if event.Op&fsnotify.Remove == fsnotify.Remove { w.handleDeleteEvent(event) } continue case err := <-fsWatcher.Errors: if err != nil { klog.Errorf("fsWatcher received error: %v", err) } continue case <-stopCh: // In case of plugin watcher being stopped by plugin manager, stop // probing the creation/deletion of plugin sockets. // Also give all pending go routines a chance to complete select { case <-w.stopped: case <-time.After(11 * time.Second): klog.Errorf("timeout on stopping watcher") } w.fsWatcher.Close() return } } }(fsWatcher)
2.4 compensation mechanism
In fact, the compensation mechanism is to re register the existing socket into the current kubelet when restarting the kubelet
func (w *Watcher) traversePluginDir(dir string) error { return w.fs.Walk(dir, func(path string, info os.FileInfo, err error) error { if err != nil { if path == dir { return fmt.Errorf("error accessing path: %s error: %v", path, err) } klog.Errorf("error accessing path: %s error: %v", path, err) return nil } switch mode := info.Mode(); { case mode.IsDir(): if err := w.fsWatcher.Add(path); err != nil { return fmt.Errorf("failed to watch %s, err: %v", path, err) } case mode&os.ModeSocket != 0: event := fsnotify.Event{ Name: path, Op: fsnotify.Create, } //TODO: Handle errors by taking corrective measures if err := w.handleCreateEvent(event); err != nil { klog.Errorf("error %v when handling create event: %s", err, event) } default: klog.V(5).Infof("Ignoring file %s with mode %v", path, mode) } return nil }) }
2.5 register event callback
In fact, registration only needs to pass the perceived socket file path to the desired state for management
func (w *Watcher) handlePluginRegistration(socketPath string) error { if runtime.GOOS == "windows" { socketPath = util.NormalizePath(socketPath) } // Call expected state for update klog.V(2).Infof("Adding socket path or updating timestamp %s to desired state cache", socketPath) err := w.desiredStateOfWorld.AddOrUpdatePlugin(socketPath) if err != nil { return fmt.Errorf("error adding socket path %s or updating timestamp to desired state cache: %v", socketPath, err) } return nil }
2.6 delete event callback
In fact, registration only needs to pass the perceived socket file path to the desired state for management
func (w *Watcher) handleDeleteEvent(event fsnotify.Event) { klog.V(6).Infof("Handling delete event: %v", event) socketPath := event.Name klog.V(2).Infof("Removing socket path %s from desired state cache", socketPath) w.desiredStateOfWorld.RemovePlugin(socketPath) }
3. Expected state and actual state
3.1 plug in information
The plug-in information only stores the path of the corresponding socket and the latest update time
type PluginInfo struct { SocketPath string Timestamp time.Time }
3.2 expected state
The expected state and the actual state are the same in data structure, because in essence, they are only used to store the current state information of the plug-in, i.e. update time, which is not covered here
type desiredStateOfWorld struct { socketFileToInfo map[string]PluginInfo sync.RWMutex }
type actualStateOfWorld struct { socketFileToInfo map[string]PluginInfo sync.RWMutex }
4.OperationExecutor
At present, there are two types of plug-in management supported in k8s. One is DevicePlugin, which is the concept we mentioned in this paper. The other is CSIPlugin. In fact, the processing of each type of driver is different internally. In fact, before the operation, we need to know what type of driver the current driver is
The operation executor mainly does this. It generates different operations to be executed according to different plugin types, that is, the corresponding plugin type gets the corresponding handler, and then generates an operation to be executed
4.1 generate callback function of registration plug-in
4.1.1 connect the corresponding plug-in service through socket
registerPluginFunc := func() error { client, conn, err := dial(socketPath, dialTimeoutDuration) if err != nil { return fmt.Errorf("RegisterPlugin error -- dial failed at socket %s, err: %v", socketPath, err) } defer conn.Close() ctx, cancel := context.WithTimeout(context.Background(), time.Second) defer cancel() infoResp, err := client.GetInfo(ctx, ®isterapi.InfoRequest{}) if err != nil { return fmt.Errorf("RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket %s, err: %v", socketPath, err) }
4.1.2 verifying services by plug-in type
handler, ok := pluginHandlers[infoResp.Type] if !ok { if err := og.notifyPlugin(client, false, fmt.Sprintf("RegisterPlugin error -- no handler registered for plugin type: %s at socket %s", infoResp.Type, socketPath)); err != nil { return fmt.Errorf("RegisterPlugin error -- failed to send error at socket %s, err: %v", socketPath, err) } return fmt.Errorf("RegisterPlugin error -- no handler registered for plugin type: %s at socket %s", infoResp.Type, socketPath) } if infoResp.Endpoint == "" { infoResp.Endpoint = socketPath } if err := handler.ValidatePlugin(infoResp.Name, infoResp.Endpoint, infoResp.SupportedVersions); err != nil { if err = og.notifyPlugin(client, false, fmt.Sprintf("RegisterPlugin error -- plugin validation failed with err: %v", err)); err != nil { return fmt.Errorf("RegisterPlugin error -- failed to send error at socket %s, err: %v", socketPath, err) } return fmt.Errorf("RegisterPlugin error -- pluginHandler.ValidatePluginFunc failed") }
4.1.3 register the plug-in to the actual state
err = actualStateOfWorldUpdater.AddPlugin(cache.PluginInfo{ SocketPath: socketPath, Timestamp: timestamp, }) if err != nil { klog.Errorf("RegisterPlugin error -- failed to add plugin at socket %s, err: %v", socketPath, err) } // Calling the plug-in's registration callback function if err := handler.RegisterPlugin(infoResp.Name, infoResp.Endpoint, infoResp.SupportedVersions); err != nil { return og.notifyPlugin(client, false, fmt.Sprintf("RegisterPlugin error -- plugin registration failed with err: %v", err)) }
4.1.4 notify corresponding service registration succeeded
if err := og.notifyPlugin(client, true, ""); err != nil { return fmt.Errorf("RegisterPlugin error -- failed to send registration status at socket %s, err: %v", socketPath, err) }
4.2 build registered client through socket
func dial(unixSocketPath string, timeout time.Duration) (registerapi.RegistrationClient, *grpc.ClientConn, error) { ctx, cancel := context.WithTimeout(context.Background(), timeout) defer cancel() c, err := grpc.DialContext(ctx, unixSocketPath, grpc.WithInsecure(), grpc.WithBlock(), grpc.WithContextDialer(func(ctx context.Context, addr string) (net.Conn, error) { return (&net.Dialer{}).DialContext(ctx, "unix", addr) }), ) if err != nil { return nil, nil, fmt.Errorf("failed to dial socket %s, err: %v", unixSocketPath, err) } return registerapi.NewRegistrationClient(c), c, nil }
Let's go to here today. The next chapter will continue to introduce how to combine the above components and the implementation of the default callback management mechanism. Thank you for exploring here. Thanks for sharing. No money for reversing
k8s source reading e-book address: https://www.yuque.com/baxiaoshi/tyado3