[scheduler] II. Scheduler initialization

Posted by mgrphp on Thu, 21 Nov 2019 19:05:18 +0100

Process 0 initializes the relevant structure of the scheduler, and transforms the scheduling class of process 0 into idle ﹣ sched ﹣ class scheduling class
The first function to start executing C language code from assembly code from arch/arm64/kernel/head.S is start_
Start_kernel ----- > sched_init(), let's take a look at the implementation process of sched_init function. We will explain it with reference to the function:

void __init sched_init(void)  
{  
    int i, j;  
    unsigned long alloc_size = 0, ptr;  
  
#ifdef CONFIG_FAIR_GROUP_SCHED  
    /*Allocate space for CFS ﹣ RQ, sched ﹣ entity. CFS ﹣ RQ and se exist in each cpu*/
    alloc_size += 2 * nr_cpu_ids * sizeof(void **);  
#endif  
#ifdef CONFIG_RT_GROUP_SCHED  
   /*Allocate space for RT ﹣ RQ and RT ﹣ sched ﹣ entity, each cpu exists*/
    alloc_size += 2 * nr_cpu_ids * sizeof(void **);  
#endif  
    if (alloc_size) {  
        ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT);  
  
#ifdef CONFIG_FAIR_GROUP_SCHED  
        /*root_task_group As the root node of RB tree, as the
       According to the above alloc ﹐ size address space size and
        ptr Allocate as the first address of the space
        You can see that CFS ﹣ RQ and se allocated space on each cpu respectively,*/
        root_task_group.se = (struct sched_entity **)ptr;  
        ptr += nr_cpu_ids * sizeof(void **);  
  
        root_task_group.cfs_rq = (struct cfs_rq **)ptr;  
        ptr += nr_cpu_ids * sizeof(void **);  
  
#endif /* CONFIG_FAIR_GROUP_SCHED */  
#ifdef CONFIG_RT_GROUP_SCHED  
        /*As above, allocate space for rt correlation, that is, root task group contains cfs task
         rt task*/
        root_task_group.rt_se = (struct sched_rt_entity **)ptr;  
        ptr += nr_cpu_ids * sizeof(void **);  
  
        root_task_group.rt_rq = (struct rt_rq **)ptr;  
        ptr += nr_cpu_ids * sizeof(void **);  
  
#endif /* CONFIG_RT_GROUP_SCHED */  
    }  
#ifdef CONFIG_CPUMASK_OFFSTACK  
    for_each_possible_cpu(i) {  
        per_cpu(load_balance_mask, i) = (cpumask_var_t)kzalloc_node(  
            cpumask_size(), GFP_KERNEL, cpu_to_node(i));  
    }  
#endif /* CONFIG_CPUMASK_OFFSTACK */  
    /*Initialize the bandwidth limit of rt thread, which is calculated in 1s cycle. If rt runs in rt ﹣ RQ for more than
   950ms Force the task out of the RT ﹣ RQ queue, wait until the next 1s cycle runs again, and know the thread every time
    Run down*/
    init_rt_bandwidth(&def_rt_bandwidth,  
            global_rt_period(), global_rt_runtime());  
    /*The meaning is the same as above, but dl thread does not know what thread????*/
    init_dl_bandwidth(&def_dl_bandwidth,  
            global_rt_period(), global_rt_runtime());  
  
#ifdef CONFIG_SMP  
    /*Update the root domain and initialize the max CPU capacity structure. When updating the CPU capacity
    Will be used to*/
    init_defrootdomain();  
#endif  
  
#ifdef CONFIG_RT_GROUP_SCHED 
    /*Set bandwidth limits for RT tasks within the root task group*/ 
    init_rt_bandwidth(&root_task_group.rt_bandwidth,  
            global_rt_period(), global_rt_runtime());  
#endif /* CONFIG_RT_GROUP_SCHED */  
  
#ifdef CONFIG_CGROUP_SCHED  
    /*Add root task group to the task group list, and set its child and sibling list*/
    list_add(&root_task_group.list, &task_groups);  
    INIT_LIST_HEAD(&root_task_group.children);  
    INIT_LIST_HEAD(&root_task_group.siblings);  
    autogroup_init(&init_task);  
  
#endif /* CONFIG_CGROUP_SCHED */  
    /*Each cpu has an rq, which starts to initialize the rq on all CPUs*/
    for_each_possible_cpu(i) {  
        struct rq *rq;  
  
        rq = cpu_rq(i); /*Using per cpu to associate cpu with rq*/
        raw_spin_lock_init(&rq->lock);  
        /*When the system is initialized, there are no runnable task s in the rq queue, set to 0*/
        rq->nr_running = 0;  
        rq->calc_load_active = 0;
        /*update period of calculation load*/  
        rq->calc_load_update = jiffies + LOAD_FREQ;  
        /*Initialize CFS, RT, and DL rq in rq
        1. Initialize the root node and vruntime of CFS ﹣ RQ rb tree, which rb is the main choice 
        node The key point of operation, which will be analyzed later, is mainly the initialization of some CFS ﹣ RQ member variables*/
        init_cfs_rq(&rq->cfs);  
        init_rt_rq(&rq->rt);  
        init_dl_rq(&rq->dl);  
#ifdef CONFIG_FAIR_GROUP_SCHED  
        root_task_group.shares = ROOT_TASK_GROUP_LOAD;  
        INIT_LIST_HEAD(&rq->leaf_cfs_rq_list);  
        rq->tmp_alone_branch = &rq->leaf_cfs_rq_list;  
        /* 
         * How much cpu bandwidth does root_task_group get? 
         * 
         * In case of task-groups formed thr' the cgroup filesystem, it 
         * gets 100% of the cpu resources in the system. This overall 
         * system cpu resource is divided among the tasks of 
         * root_task_group and its child task-groups in a fair manner, 
         * based on each entity's (task or task-group's) weight 
         * (se->load.weight). 
         * 
         * In other words, if root_task_group has 10 tasks of weight 
         * 1024) and two child groups A0 and A1 (of weight 1024 each), 
         * then A0's share of the cpu resource is: 
         * 
         *  A0's bandwidth = 1024 / (10*1024 + 1024 + 1024) = 8.33% 
         * 
         * We achieve this by letting root_task_group's tasks sit 
         * directly in rq->cfs (i.e root_task_group->se[] = NULL). 
         */
        /*Initializing the bandwidth limit of cfs task is more complex than rt, which will be explained later*/
        init_cfs_bandwidth(&root_task_group.cfs_bandwidth);  
        /*When analyzing the task group structure above, we know that the processes in a process group may be different
        It runs on the CPU of, so it's different CFS ﹣ RQ and scheduling entity, so each task will be marked as belonging to
         Which CFS? RQ and scheduling entity.*/
        init_tg_cfs_entry(&root_task_group, &rq->cfs, NULL, i, NULL);  
#endif /* CONFIG_FAIR_GROUP_SCHED */  
         /*Initialize the time that rt can run. The default value is 950ms*/  
        rq->rt.rt_runtime = def_rt_bandwidth.rt_runtime;  
#ifdef CONFIG_RT_GROUP_SCHED  
        /*Initialize rq and se of rt*/
        init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL);  
#endif  
  
        for (j = 0; j < CPU_LOAD_IDX_MAX; j++)  
            rq->cpu_load[j] = 0;  
        /*Mark load last update time*/
        rq->last_load_update_tick = jiffies;  
  
#ifdef CONFIG_SMP  
        rq->sd = NULL;  
        rq->rd = NULL;  
        /*Initialize the capacity value of rq from dts, but the rq - > CPU ﹣ capacity value will change with the system
        ,This needs to be solved. At present, I haven't found root cause either*/
        rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;  
        /*The following three are related to balance. Active balance is a flag to enforce the load
        balance,Next "balance is a periodic tick, recording the time of periodic balance*/
        rq->balance_callback = NULL;  
        rq->active_balance = 0;
        rq->next_balance = jiffies;  
        rq->push_cpu = 0;  
        rq->push_task = NULL;  
        /*rq The cpu*/
        rq->cpu = i;  
        rq->online = 0;
        /*rq idle,That is, the time stamp of cpu idle*/  
        rq->idle_stamp = 0;  
        rq->avg_idle = 2*sysctl_sched_migration_cost;  
        rq->max_idle_balance_cost = sysctl_sched_migration_cost;  
#ifdef CONFIG_SCHED_WALT 
        /*The time to process each irq is calculated in WALT and converted to load*/ 
        rq->cur_irqload = 0;/*irq run time of the current window*/  
        rq->avg_irqload = 0; /*irq It may run across multiple windows. After attenuation algorithm, it has many calculations
        As AVG ﹣ irq load windows*/
        /*irq enter/exit time stamp*/ 
        rq->irqload_ts = 0;
        /*We added a new flag for performance*/  
        rq->is_busy = CPU_BUSY_CLR;  
#endif  
        /*Initializing the chain header of cfs tasks*/
        INIT_LIST_HEAD(&rq->cfs_tasks);  
        /*Mount rq on the default root domain, and check the domain carefully!*/
        rq_attach_root(rq, &def_root_domain);  
#ifdef CONFIG_NO_HZ_COMMON  
        rq->nohz_flags = 0;  
#endif  
#ifdef CONFIG_NO_HZ_FULL  
        rq->last_sched_tick = 0;  
#endif  
#endif / * initialize hrtimer of rq*/
        init_rq_hrtick(rq);
        /*Set rq iowait value to 0*/  
        atomic_set(&rq->nr_iowait, 0);  
  
#ifdef CONFIG_INTEL_DWS  
        init_intel_dws(rq);  
#endif  
    }  /*So far, the whole rq initialization is completed*/
    /*Set init task load weight. Each task will assign different weight values according to the priority of the task*/
    set_load_weight(&init_task);  
  
#ifdef CONFIG_PREEMPT_NOTIFIERS  
    /*Initialize preemption notification chain*/
    INIT_HLIST_HEAD(&init_task.preempt_notifiers);  
#endif  
  
    /* 
     * The boot idle thread does lazy MMU switching as well: 
     */  
    atomic_inc(&init_mm.mm_count);  
    enter_lazy_tlb(&init_mm, current);  
  
    /* 
     * During early bootup we pretend to be a normal task: 
     */
    /*Set the current task as the fair scheduling class, current is init "task thread*/  
    current->sched_class = &fair_sched_class;  
  
    /* 
     * Make us the idle thread. Technically, schedule() should not be 
     * called from this thread, however somewhere below it might be, 
     * but because we are the idle thread, we just pick up running again 
     * when this runqueue becomes "idle". 
     *//*Initializes the current process as an idle process
      It's interesting, but the most important thing is to set its scheduling class to idle ﹐ sched ﹐ class*/  
    init_idle(current, smp_processor_id());  
    /*Time of next load update*/ 
    calc_load_update = jiffies + LOAD_FREQ;  
  
#ifdef CONFIG_SMP  
    zalloc_cpumask_var(&sched_domains_tmpmask, GFP_NOWAIT);  
    /* May be allocated at isolcpus cmdline parse time */  
    if (cpu_isolated_map == NULL)  
        zalloc_cpumask_var(&cpu_isolated_map, GFP_NOWAIT);  
    /*Set the current task on the boot cpu to idle thread. For other plex CPUs, it is in idle threads init
     Face, for each cpu fork out idle ﹣ threads*/
    idle_thread_set_boot_cpu();  
    /*Set rq age "stamp, that is, the starting time (including idle and running time) of rq, not the running time,*/
    set_cpu_rq_start_time();  
#endif  
    init_sched_fair_class();  
  
#ifdef CONFIG_64BIT_ONLY_CPU  
    arch_get_64bit_only_cpus(&b64_only_cpu_mask);  
#ifdef CONFIG_SCHED_COMPAT_LIMIT  
    /* get cpus that support AArch32 and store in compat_32bit_cpu_mask */  
    cpumask_andnot(&compat_32bit_cpu_mask, cpu_present_mask,  
        &b64_only_cpu_mask);  
#endif  
#endif  
    /*The scheduler is working*/
    scheduler_running = 1;  
}  

After the initialization of the above scheduler is completed, normal scheduling will start later. As follows

  1. Scheduler "tick" periodic tick "nsec
  2. Schedule according to the change of process status (such as process creation, process being wakeup from idle, etc.)

The next chapter will explain how scheduling algorithms work

Topics: Windows C