Process 0 initializes the relevant structure of the scheduler, and transforms the scheduling class of process 0 into idle ﹣ sched ﹣ class scheduling class
The first function to start executing C language code from assembly code from arch/arm64/kernel/head.S is start_
Start_kernel ----- > sched_init(), let's take a look at the implementation process of sched_init function. We will explain it with reference to the function:
void __init sched_init(void) { int i, j; unsigned long alloc_size = 0, ptr; #ifdef CONFIG_FAIR_GROUP_SCHED /*Allocate space for CFS ﹣ RQ, sched ﹣ entity. CFS ﹣ RQ and se exist in each cpu*/ alloc_size += 2 * nr_cpu_ids * sizeof(void **); #endif #ifdef CONFIG_RT_GROUP_SCHED /*Allocate space for RT ﹣ RQ and RT ﹣ sched ﹣ entity, each cpu exists*/ alloc_size += 2 * nr_cpu_ids * sizeof(void **); #endif if (alloc_size) { ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT); #ifdef CONFIG_FAIR_GROUP_SCHED /*root_task_group As the root node of RB tree, as the According to the above alloc ﹐ size address space size and ptr Allocate as the first address of the space You can see that CFS ﹣ RQ and se allocated space on each cpu respectively,*/ root_task_group.se = (struct sched_entity **)ptr; ptr += nr_cpu_ids * sizeof(void **); root_task_group.cfs_rq = (struct cfs_rq **)ptr; ptr += nr_cpu_ids * sizeof(void **); #endif /* CONFIG_FAIR_GROUP_SCHED */ #ifdef CONFIG_RT_GROUP_SCHED /*As above, allocate space for rt correlation, that is, root task group contains cfs task rt task*/ root_task_group.rt_se = (struct sched_rt_entity **)ptr; ptr += nr_cpu_ids * sizeof(void **); root_task_group.rt_rq = (struct rt_rq **)ptr; ptr += nr_cpu_ids * sizeof(void **); #endif /* CONFIG_RT_GROUP_SCHED */ } #ifdef CONFIG_CPUMASK_OFFSTACK for_each_possible_cpu(i) { per_cpu(load_balance_mask, i) = (cpumask_var_t)kzalloc_node( cpumask_size(), GFP_KERNEL, cpu_to_node(i)); } #endif /* CONFIG_CPUMASK_OFFSTACK */ /*Initialize the bandwidth limit of rt thread, which is calculated in 1s cycle. If rt runs in rt ﹣ RQ for more than 950ms Force the task out of the RT ﹣ RQ queue, wait until the next 1s cycle runs again, and know the thread every time Run down*/ init_rt_bandwidth(&def_rt_bandwidth, global_rt_period(), global_rt_runtime()); /*The meaning is the same as above, but dl thread does not know what thread????*/ init_dl_bandwidth(&def_dl_bandwidth, global_rt_period(), global_rt_runtime()); #ifdef CONFIG_SMP /*Update the root domain and initialize the max CPU capacity structure. When updating the CPU capacity Will be used to*/ init_defrootdomain(); #endif #ifdef CONFIG_RT_GROUP_SCHED /*Set bandwidth limits for RT tasks within the root task group*/ init_rt_bandwidth(&root_task_group.rt_bandwidth, global_rt_period(), global_rt_runtime()); #endif /* CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_CGROUP_SCHED /*Add root task group to the task group list, and set its child and sibling list*/ list_add(&root_task_group.list, &task_groups); INIT_LIST_HEAD(&root_task_group.children); INIT_LIST_HEAD(&root_task_group.siblings); autogroup_init(&init_task); #endif /* CONFIG_CGROUP_SCHED */ /*Each cpu has an rq, which starts to initialize the rq on all CPUs*/ for_each_possible_cpu(i) { struct rq *rq; rq = cpu_rq(i); /*Using per cpu to associate cpu with rq*/ raw_spin_lock_init(&rq->lock); /*When the system is initialized, there are no runnable task s in the rq queue, set to 0*/ rq->nr_running = 0; rq->calc_load_active = 0; /*update period of calculation load*/ rq->calc_load_update = jiffies + LOAD_FREQ; /*Initialize CFS, RT, and DL rq in rq 1. Initialize the root node and vruntime of CFS ﹣ RQ rb tree, which rb is the main choice node The key point of operation, which will be analyzed later, is mainly the initialization of some CFS ﹣ RQ member variables*/ init_cfs_rq(&rq->cfs); init_rt_rq(&rq->rt); init_dl_rq(&rq->dl); #ifdef CONFIG_FAIR_GROUP_SCHED root_task_group.shares = ROOT_TASK_GROUP_LOAD; INIT_LIST_HEAD(&rq->leaf_cfs_rq_list); rq->tmp_alone_branch = &rq->leaf_cfs_rq_list; /* * How much cpu bandwidth does root_task_group get? * * In case of task-groups formed thr' the cgroup filesystem, it * gets 100% of the cpu resources in the system. This overall * system cpu resource is divided among the tasks of * root_task_group and its child task-groups in a fair manner, * based on each entity's (task or task-group's) weight * (se->load.weight). * * In other words, if root_task_group has 10 tasks of weight * 1024) and two child groups A0 and A1 (of weight 1024 each), * then A0's share of the cpu resource is: * * A0's bandwidth = 1024 / (10*1024 + 1024 + 1024) = 8.33% * * We achieve this by letting root_task_group's tasks sit * directly in rq->cfs (i.e root_task_group->se[] = NULL). */ /*Initializing the bandwidth limit of cfs task is more complex than rt, which will be explained later*/ init_cfs_bandwidth(&root_task_group.cfs_bandwidth); /*When analyzing the task group structure above, we know that the processes in a process group may be different It runs on the CPU of, so it's different CFS ﹣ RQ and scheduling entity, so each task will be marked as belonging to Which CFS? RQ and scheduling entity.*/ init_tg_cfs_entry(&root_task_group, &rq->cfs, NULL, i, NULL); #endif /* CONFIG_FAIR_GROUP_SCHED */ /*Initialize the time that rt can run. The default value is 950ms*/ rq->rt.rt_runtime = def_rt_bandwidth.rt_runtime; #ifdef CONFIG_RT_GROUP_SCHED /*Initialize rq and se of rt*/ init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL); #endif for (j = 0; j < CPU_LOAD_IDX_MAX; j++) rq->cpu_load[j] = 0; /*Mark load last update time*/ rq->last_load_update_tick = jiffies; #ifdef CONFIG_SMP rq->sd = NULL; rq->rd = NULL; /*Initialize the capacity value of rq from dts, but the rq - > CPU ﹣ capacity value will change with the system ,This needs to be solved. At present, I haven't found root cause either*/ rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE; /*The following three are related to balance. Active balance is a flag to enforce the load balance,Next "balance is a periodic tick, recording the time of periodic balance*/ rq->balance_callback = NULL; rq->active_balance = 0; rq->next_balance = jiffies; rq->push_cpu = 0; rq->push_task = NULL; /*rq The cpu*/ rq->cpu = i; rq->online = 0; /*rq idle,That is, the time stamp of cpu idle*/ rq->idle_stamp = 0; rq->avg_idle = 2*sysctl_sched_migration_cost; rq->max_idle_balance_cost = sysctl_sched_migration_cost; #ifdef CONFIG_SCHED_WALT /*The time to process each irq is calculated in WALT and converted to load*/ rq->cur_irqload = 0;/*irq run time of the current window*/ rq->avg_irqload = 0; /*irq It may run across multiple windows. After attenuation algorithm, it has many calculations As AVG ﹣ irq load windows*/ /*irq enter/exit time stamp*/ rq->irqload_ts = 0; /*We added a new flag for performance*/ rq->is_busy = CPU_BUSY_CLR; #endif /*Initializing the chain header of cfs tasks*/ INIT_LIST_HEAD(&rq->cfs_tasks); /*Mount rq on the default root domain, and check the domain carefully!*/ rq_attach_root(rq, &def_root_domain); #ifdef CONFIG_NO_HZ_COMMON rq->nohz_flags = 0; #endif #ifdef CONFIG_NO_HZ_FULL rq->last_sched_tick = 0; #endif #endif / * initialize hrtimer of rq*/ init_rq_hrtick(rq); /*Set rq iowait value to 0*/ atomic_set(&rq->nr_iowait, 0); #ifdef CONFIG_INTEL_DWS init_intel_dws(rq); #endif } /*So far, the whole rq initialization is completed*/ /*Set init task load weight. Each task will assign different weight values according to the priority of the task*/ set_load_weight(&init_task); #ifdef CONFIG_PREEMPT_NOTIFIERS /*Initialize preemption notification chain*/ INIT_HLIST_HEAD(&init_task.preempt_notifiers); #endif /* * The boot idle thread does lazy MMU switching as well: */ atomic_inc(&init_mm.mm_count); enter_lazy_tlb(&init_mm, current); /* * During early bootup we pretend to be a normal task: */ /*Set the current task as the fair scheduling class, current is init "task thread*/ current->sched_class = &fair_sched_class; /* * Make us the idle thread. Technically, schedule() should not be * called from this thread, however somewhere below it might be, * but because we are the idle thread, we just pick up running again * when this runqueue becomes "idle". *//*Initializes the current process as an idle process It's interesting, but the most important thing is to set its scheduling class to idle ﹐ sched ﹐ class*/ init_idle(current, smp_processor_id()); /*Time of next load update*/ calc_load_update = jiffies + LOAD_FREQ; #ifdef CONFIG_SMP zalloc_cpumask_var(&sched_domains_tmpmask, GFP_NOWAIT); /* May be allocated at isolcpus cmdline parse time */ if (cpu_isolated_map == NULL) zalloc_cpumask_var(&cpu_isolated_map, GFP_NOWAIT); /*Set the current task on the boot cpu to idle thread. For other plex CPUs, it is in idle threads init Face, for each cpu fork out idle ﹣ threads*/ idle_thread_set_boot_cpu(); /*Set rq age "stamp, that is, the starting time (including idle and running time) of rq, not the running time,*/ set_cpu_rq_start_time(); #endif init_sched_fair_class(); #ifdef CONFIG_64BIT_ONLY_CPU arch_get_64bit_only_cpus(&b64_only_cpu_mask); #ifdef CONFIG_SCHED_COMPAT_LIMIT /* get cpus that support AArch32 and store in compat_32bit_cpu_mask */ cpumask_andnot(&compat_32bit_cpu_mask, cpu_present_mask, &b64_only_cpu_mask); #endif #endif /*The scheduler is working*/ scheduler_running = 1; }
After the initialization of the above scheduler is completed, normal scheduling will start later. As follows
- Scheduler "tick" periodic tick "nsec
- Schedule according to the change of process status (such as process creation, process being wakeup from idle, etc.)
The next chapter will explain how scheduling algorithms work