Linux Process Scheduler - Basic

Posted by psych0 on Tue, 04 Jan 2022 08:37:11 +0100

Transferred from:


  • Read the fucking source code! -- By Lu Xun
  • A picture is worth a thousand words. -- By Golgi


  1. Kernel version: 4.14
  2. ARM64 processor, Contex-A53, dual core
  3. Tools used: Source Insight 3.5, Visio

1. General

Starting with this article, we will begin a series of studies on Linux scheduler.
This paper will also start with some basic concepts and data structures, first create a rough outline, and the subsequent articles will gradually deepen.

2. Concept

2.1 process

  • From the textbook, we can all know that the process is the smallest unit of resource allocation, and the thread is the smallest unit of CPU scheduling.
  • The process includes not only the code segment of the executable program, but also a series of resources, such as open files, memory, CPU time, semaphores, multiple execution thread streams, and so on. Threads can share the resource space within the process.
  • In the Linux kernel, both processes and threads use struct task_struct structure.
  • The virtual address space of a process is divided into user virtual address space and kernel virtual address space. All processes share the kernel virtual address space. Processes without user virtual address space are called kernel threads.

Linux kernel uses task_struct structure, which contains all kinds of information and resources of the process, such as process status, open files, address space information, signal resources and so on. task_ The structure of struct is very complex. The following only introduces some fields related to scheduling.

struct task_struct {
    /* ... */
    /* Process status */
    volatile long			state;

    /* Scheduling priority related, policy related */
	int				prio;
	int				static_prio;
	int				normal_prio;
	unsigned int			rt_priority;
    unsigned int			policy;
    /* Scheduling class, scheduling entity related, task group related, etc */
    const struct sched_class	*sched_class;
	struct sched_entity		se;
	struct sched_rt_entity		rt;
	struct task_group		*sched_task_group;
	struct sched_dl_entity		dl;
    /* Relationship between processes */
    	/* Real parent process: */
	struct task_struct __rcu	*real_parent;

	/* Recipient of SIGCHLD, wait4() reports: */
	struct task_struct __rcu	*parent;

	 * Children/sibling form the list of natural children:
	struct list_head		children;
	struct list_head		sibling;
	struct task_struct		*group_leader;
    /* ... */

2.2 process status

  • In the figure above, the popular process three state model in the operating system is on the left, and the process state switching corresponding to Linux is on the right. Each flag describes the current state of the process, which are mutually exclusive;
  • The ready state and running state in Linux correspond to TASK_RUNNING flag bit, ready status indicates that the process is in the queue and has not been scheduled; Running status indicates that the process is running on the CPU;

The main status fields in the kernel are defined as follows

/* Used in tsk->state: */
#define TASK_RUNNING			0x0000
#define TASK_INTERRUPTIBLE		0x0001

/* Used in tsk->exit_state: */
#define EXIT_DEAD			0x0010
#define EXIT_ZOMBIE			0x0020

/* Used in tsk->state again: */
#define TASK_PARKED			0x0040
#define TASK_DEAD			0x0080
#define TASK_WAKEKILL			0x0100
#define TASK_WAKING			0x0200
#define TASK_NOLOAD			0x0400
#define TASK_NEW			0x0800
#define TASK_STATE_MAX			0x1000

/* Convenience macros for the sake of set_current_state: */


2.3 scheduler

  • The so-called scheduling is to select processes from the ready queue of processes and allocate CPUs according to a scheduling algorithm, mainly to coordinate the use of resources such as CPUs. The goal of process scheduling is to maximize CPU time.

The kernel provides five schedulers by default, and the Linux kernel uses struct sched_class to abstract the scheduler:

  1. Stop scheduler, stop_sched_class: the scheduling class with the highest priority, which can preempt all other processes and cannot be preempted by other processes;
  2. Deadline scheduler, dl_sched_class: use the red black tree to sort the processes according to the absolute deadline, and select the smallest process for scheduling;
  3. RT scheduler, rt_sched_class: real time scheduler, maintaining a queue for each priority;
  4. CFS scheduler, cfs_sched_class: fully fair scheduler, which adopts fully fair scheduling algorithm and introduces the concept of virtual runtime;
  5. Idle task scheduler, idle_sched_class: idle scheduler. Each CPU will have an idle thread. When no other process can be scheduled, the idle thread will be scheduled to run;

The Linux kernel provides some scheduling strategies for the user program to select the scheduler. The Stop scheduler and the idle task scheduler are only used by the kernel and cannot be selected by the user:

  • SCHED_DEADLINE: deadline process scheduling policy, which enables task to select deadline scheduler to schedule operation;
  • SCHED_RR: real time process scheduling strategy, time slice rotation. After the process runs out of time slice, it will be added to the tail of the run queue corresponding to the priority and give the CPU to other processes with the same priority;
  • SCHED_FIFO: real-time process scheduling strategy. First in first out scheduling has no time slice. When there is no higher priority, you can only wait to actively give up the CPU;
  • SCHED_NORMAL: ordinary process scheduling strategy, which enables task to select CFS scheduler to schedule operation;
  • SCHED_BATCH: ordinary process scheduling strategy, batch processing, so that task selects CFS scheduler to schedule operation;
  • SCHED_IDLE: ordinary process scheduling strategy, which enables task to select CFS scheduler with the lowest priority to schedule operation;

2.4 runqueue run queue

  • Each CPU has a run queue, and each scheduler acts on the run queue;
  • The task assigned to the CPU is added to the run queue as a scheduling entity;
  • When a task runs for the first time, if possible, try to add it to the run queue of the parent task (allocated to the same CPU, the cache affinity will be higher and the performance will be improved);

The Linux kernel uses the struct rq structure to describe the running queue. The key fields are as follows:

 * This is the main, per-CPU runqueue data structure.
 * Locking rule: those places that want to lock multiple runqueues
 * (such as the load balancing or the thread migration code), lock
 * acquire operations must be ordered by ascending &runqueue.
struct rq {
	/* runqueue lock: */
	raw_spinlock_t lock;

	 * nr_running and cpu_load should be in the same cacheline because
	 * remote CPUs use both these fields when doing load calculation.
	unsigned int nr_running;
    /* Three scheduling queues: CFS scheduling, RT scheduling and DL scheduling */
	struct cfs_rq cfs;
	struct rt_rq rt;
	struct dl_rq dl;

    /* stop Point to the migrated kernel thread, and idle point to the idle kernel thread */
    struct task_struct *curr, *idle, *stop;
    /* ... */

2.5 task_group task grouping

  • Using the task grouping mechanism, you can set or limit the CPU utilization of the task group, such as limiting some tasks to a certain interval, so as not to affect the execution efficiency of other tasks;
  • Introduce task_ After group, the dispatcher's dispatcher object is not only the process, but the Linux kernel abstracts the sched_. entity/sched_ rt_ entity/sched_ dl_ Entity describes the scheduling entity, which can be a process or task_group´╝Ť
  • Using the data structure struct task_group to describe the task group. The task group maintains a CFS scheduling entity, CFS running queue, RT scheduling entity and RT running queue on each CPU;

The Linux kernel uses struct task_group to describe the task group. The key fields are as follows:

/* task group related information */
struct task_group {
    /* ... */

    /* Each CPU is assigned a CFS scheduling entity and a CFS run queue */
	/* schedulable entities of this group on each cpu */
	struct sched_entity **se;
	/* runqueue "owned" by this group on each cpu */
	struct cfs_rq **cfs_rq;
	unsigned long shares;

    /* Each CPU is assigned an RT scheduling entity and an RT run queue */
	struct sched_rt_entity **rt_se;
	struct rt_rq **rt_rq;

	struct rt_bandwidth rt_bandwidth;

    /* task_group Organizational relationship between */
	struct rcu_head rcu;
	struct list_head list;

	struct task_group *parent;
	struct list_head siblings;
	struct list_head children;

    /* ... */

3. Scheduler

The scheduler relies on several functions to complete the scheduling work. Several key functions will be introduced below.

  1. Active scheduling - schedule()
  • The schedule() function is the core function of process scheduling. The general process is shown in the figure above.
  • Core logic: select another process to replace the currently running process. The process is selected through the pick in the scheduler used by the process_ next_ Task function, which is implemented by different schedulers in different ways; The process is replaced through context_switch() to complete the switching. The specific details will be further analyzed in subsequent articles.
  1. Periodic scheduling - schedule_tick()
  • In the clock interrupt processing program, call schedule_tick() function;
  • The clock interrupt is the pulse of the scheduler. The kernel relies on the periodic clock to control the CPU;
  • The clock interrupt handler checks whether the execution time of the current process is excessive. If it is excessive, set the rescheduling flag (_TIF_NEED_RESCHED);
  • When the clock interrupt processing function returns, if the interrupted process is running in user mode, you need to check whether there is a rescheduling flag. If it is set, call schedule() for scheduling;
  1. High precision clock scheduling - hrtick()
  • High precision clock scheduling is similar to periodic scheduling. The difference is that the accuracy of periodic scheduling is ms level, while the accuracy of high-precision scheduling is ns level;
  • High precision clock scheduling requires corresponding hardware support;
  1. Scheduling on process wake - wake_up_process()
  • Wake is called when the process wakes up_ up_ Process() function, the awakened process may preempt the current process;

The functions mentioned above are commonly called during scheduling. In addition, some scheduling points will appear when creating a new process or when the kernel preempts.

This article is only a rough introduction. We will conduct more in-depth analysis on some modules later. Please look forward to it.

Author: LoyenWang
Official account: LoyenWang
Copyright: the copyright of this article belongs to the author and the blog park
Reprint: reprint is welcome, but this statement must be retained without the consent of the author; The original connection must be given in the article; Otherwise, legal responsibility must be investigated