Scheduling of k8s008

Posted by pl_towers on Wed, 09 Mar 2022 11:51:31 +0100

Pod scheduling strategy of kubernetes

  Scheduler is the Scheduler of kubernetes. Its main task is to allocate the defined pod to the nodes of the cluster. It sounds very simple, but 0 there are many questions to consider:

  • Fairness: how to ensure that each node can be allocated resources
  • Efficient utilization of resources: all resources in the cluster are used to the maximum extent
  • Efficiency: the scheduling performance is good, and it can complete the scheduling of a large number of pod s as soon as possible
  • Flexibility: allows users to control the logic of scheduling according to their own needs

   Sheduler runs as a separate program. After startup, it will always listen to API Server and get podspec For a pod with an empty nodeName, a binding will be created for each pod, indicating which node the pod should be placed on.

1, Scheduling process

Scheduling is divided into filtering and optimization.

The first is to filter out nodes that do not meet the conditions. This process is called predict; Then, the passing nodes are sorted according to priority; Select the highest priority node from the last. If there is an error in any of the steps, the error will be returned directly.

Ⅰ. Predict

Predict has a series of algorithms that can be used:

  • PodFitsResources: whether the remaining resources on the node are greater than the resources requested by the pod.
  • PodFitsHost: if NodeName is specified in pod, check whether the node name matches NodeName.
  • PodFitsHostPorts: whether the port already used on the node conflicts with the port applied by pod.
  • PodSelectorMatches: filter out nodes that do not match the label specified by pod.
  • NoDiskConflict: the volume that has been mount ed does not conflict with the volume specified by pod unless they are both read-only.

If there is no suitable node in the predict process, the pod will remain in the pending state and continue to retry scheduling until a node meets the conditions. After this step, if multiple nodes meet the conditions, continue the priorities process: sort the nodes according to the priority size.

Ⅱ. Priority

Priority consists of a series of key value pairs. The key is the name of the priority item and the value is its weight (the importance of the item). These priority options include:

  • Leastrequested priority: the weight is determined by calculating the utilization rate of CPU and Memory. The lower the utilization rate, the higher the weight. In other words, this priority indicator tends to nodes with lower resource utilization ratio.
  • Balanced resource allocation: the closer the CPU and Memory utilization on the node, the higher the weight. This should be used together with the above and should not be used alone.
  • ImageLocalityPriority: it tends to have nodes to use the image. The larger the total size of the image, the higher the weight.

All priority items and weights are calculated by the algorithm to get the final result.

Ⅲ. Custom scheduler

In addition to kubernetes' own scheduler, you can also write your own scheduler. By specifying the name of the scheduler through the spec:schedulername parameter, you can select a scheduler for the pod to schedule.

2, Affinity

Affinity is an attribute of pod (preference or rigid requirement), which makes pod attracted to a specific class of nodes.

Ⅰ. Node affinity

pod.spec.nodeAffinity

  • preferredDuringSchedulingIgnoredDuringExecution: soft policy.
  • requiredDuringSchedulingIgnoredDuringExecution: hard policy.
  • These two strategies can be used together.

requiredDuringSchedulingIgnoredDuringExecution

Hard strategy: if the conditions are not met, the Pod will continue to pending and wait for the conditions to be met.

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: hub.atguigu.com/library/myapp:v1
  affinity:
    nodeAffinity:     ## node affinity hard strategy
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - k8s-node02

preferredDuringSchedulingIgnoredDuringExecution

Soft strategy: if you don't meet the conditions, forget it.

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: hub.atguigu.com/library/myapp:v1
  affinity:
    nodeAffinity:     ## node affinity soft strategy
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: source
            operator: In
            values:
            - k8s-node02
  • Key value operation relationship
    • The value of In: label is In a list
    • NotIn: the value of label is not in a list
    • Gt: the value of label is greater than a certain value
    • Lt: the value of label is less than a certain value
    • Exists: a label exists
    • DoesNotExist: a label does not exist

Ⅱ. Pod affinity

  • preferredDuringSchedulingIgnoredDuringExecution: soft policy
  • requiredDuringSchedulingIgnoredDuringExecution: hard policy
  • The two strategies can also be used together.
apiVersion: v1
kind: Pod
metadata:
  name: pod-3
  labels:
    app: pod-3
spec:
  containers:
  - name: pod-3
    image: hub.atguigu.com/library/myapp:v1
  affinity:
    podAffinity:        #  pod affinity
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - pod-1
        topologyKey: kubernetes.io/hostname
    podAntiAffinity:    #  pod anti affinity
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm: 
          labelSelector:
            matchExpressions: 
            - key: app
              operator: In
              values:
              - pod-2
          topologyKey: kubernetes.io/hostname

The comparison of affinity / anti affinity scheduling strategies is as follows:

scheduling strategy Match labelOperatorTopology domain supportScheduling target
nodeAffinitynodeIn, NotIn, Exists,DoesNotExist, Gt, LtnoSpecify host
podAffinity PODpodIn, NotIn, Exists,DoesNotExistyesPOD is in the same topology domain as the specified POD
podAnitAffinity PODpodIn, NotIn, Exists,DoesNotExistyesPOD is not in the same topology domain as the specified POD

3, Taint and acceleration

Node affinity is an attribute (preference or rigid requirement) of pod, which makes pod attracted to a specific class of nodes. Taint, on the contrary, enables nodes to exclude a specific kind of pod.

Scope: node node.

Taint and tolerance cooperate with each other to avoid pod being allocated to inappropriate nodes. One or more taints can be applied to each node, which means that pods that cannot tolerate these taints will not be accepted by the node. If tolerance is applied to pods, it means that these pods can (but are not required to) be scheduled to nodes with matching taint.

Ⅰ. Stain (Taint)

Composition of Taint

The kubectl taint command can be used to set a stain on a Node. After the stain is set, there is a mutually exclusive relationship between the Node and the Pod, which can make the Node refuse the scheduling execution of the Pod, and even expel the existing Pod of the Node.

The composition of each stain is as follows:

key=value:effect

Each stain has a key and value as the label of the stain, where value can be empty and E "ect" describes the function of the stain. Currently, taint e "ect supports the following three options:

  • NoSchedule: indicates that k8s the Pod will not be scheduled to the Node with this stain.
  • PreferNoSchedule: indicates that k8s it will try to avoid scheduling the Pod to the Node with this stain.
  • NoExecute: indicates k8s that the Pod will not be scheduled to the Node with the stain, and the existing Pod on the Node will be expelled.

Setting, viewing and removal of stains

#Set stain
kubectl taint nodes node1 key1=value1:NoSchedule
#In the node description, look for the Taints field
kubectl describe pod pod-name
#Remove stains
kubectl taint nodes node1 key1:NoSchedule-

Usage scenario of the stain: the stain can remove all the pod s on the current node, so that some settings of the current node can be updated without business interruption.

Ⅱ. Tolerances

The Node with the stain set will generate mutually exclusive relationship according to the e "ect" of taint: NoSchedule, PreferNoSchedule, NoExecute and Pod, and the Pod will not be scheduled to the Node to a certain extent. However, we can set tolerance on the Pod, which means that the Pod with tolerance can tolerate the existence of stains and can be scheduled to the nodes with stains.

pod.spec.tolerations

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"
  tolerationSeconds: 3600
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"
- key: "key2"
  operator: "Exists"
  effect: "NoSchedule"
  • The key, vaule and E "ect should be consistent with the taint set on the Node.
  • If the value of operator is Exists, the value value will be ignored.
  • tolerationSeconds is used to describe the time that can continue to run on the Pod when the Pod needs to be expelled

1. When the key value is not specified, it means that all stain keys are tolerated

tolerations:
- operator: "Exists"

2. When the e "ect" value is not specified, it means that all stain effects are tolerated

tolerations:
- key: "key"
  operator: "Exists"

3. When there are multiple masters, the following settings can be used to prevent resource waste

kubectl taint nodes Node-Name node-role.kubernetes.io/master=:PreferNoSchedule

4, Specify scheduling node

Ⅰ. Pod.spec.nodeName

Pod.spec.nodeName directly schedules the pod to the specified Node, which will skip the Scheduler's scheduling policy. The matching rule is forced matching.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 7
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeName: k8s-node01
      containers:
      - name: myweb
        image: hub.atguigu.com/library/myapp:v1
        ports:
        - containerPort: 80

Ⅱ. Pod.spec.nodeSelector

Pod.spec.nodeSelector: select the node through the label-selector mechanism of kubernetes, match the label by the scheduler scheduling strategy, and then schedule Pod to the target node, which is a mandatory constraint.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeSelector:
        type: backEndNode1
      containers:
        - name: myweb
          image: harbor/tomcat:8.5-jre8
          ports:
          - containerPort: 80