preface
Every time I write the Go program, I'm always curious about his startup process. Today, let's pick it up.
Note: my computer is win10, so many places are not mainly linux. At the same time, this is my own learning process. There may be mistakes. I hope I can get guidance!
At the same time, part of the code in the article will be processed, and more attention will be paid to the core code flow.
I hope readers can understand a little assembly language.
assembly
To start a Go program, you need to initialize its own runtime, and its real program entry is in the runtime package.
The entry files of different platforms are different. Take Linux, macOS and win10 on AMD64 architecture as examples, which are located in Src / Runtime / RT0 respectively_ linux_ amd64. S and src/runtime/rt0_darwin_amd64.s and src/runtime/rt0_windows_amd64.s.
You can see similar entry code in all three files.
# runtime/rt0_windows_amd64.s #Taking windows as an example, linux and macos are the same, but the name has changed. TEXT _rt0_amd64_windows(SB),NOSPLIT,$-8 JMP _rt0_amd64(SB) Copy code
JMP is an unconditional jump, and then it jumps to_ rt0_amd64 this subroutine.
This approach is intuitive. After the program is compiled into machine code, it depends on the instruction set of specific CPU architecture, and the difference of operating system is directly reflected in different system level operations at runtime, such as system calls. rt0 is actually the abbreviation of runtime0, which means the creation of runtime. All subsequent creation is suffixed with 1.
The operating system communicates with the application through the agreement of entry parameters. When the program is just started, the first two values of stack pointer SP correspond to argc and argv respectively, and store the number of parameters and the value of specific parameters respectively
# runtime/asm_amd64.s # _rt0_amd64 is common startup code for most amd64 systems when using # internal linking. This is the entry point for the program from the # kernel for an ordinary -buildmode=exe program. The stack holds the # number of arguments and the C-style argv. #_ rt0_amd64 is a common startup code for most amd64 systems when using internal links. This is the entry point of the kernel program of the ordinary - buildmode=exe program. The stack holds the number of parameters and C-style argv. TEXT _rt0_amd64(SB),NOSPLIT,$-8 MOVQ 0(SP), DI // argc LEAQ 8(SP), SI // argv JMP runtime·rt0_go(SB) Copy code
rt0_go
Then continue to jump to rt0_go subroutine.
Let's dig into the logic.
The previous part is to determine the program entry parameters and CPU processor information.
# runtime/asm_amd64.s TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0 // Copy the parameters forward to an even stack MOVQ DI, AX // argc MOVQ SI, BX // argv SUBQ $(4*8+7), SP // 2args 2auto ANDQ $~15, SP MOVQ AX, 16(SP) MOVQ BX, 24(SP) #Creates an istack from a given (operating system) stack_ cgo_init may update stackguard. # Initialize g0 execution stack MOVQ $runtime·g0(SB), DI LEAQ (-64*1024+104)(SP), BX MOVQ BX, g_stackguard0(DI) MOVQ BX, g_stackguard1(DI) MOVQ BX, (g_stack+stack_lo)(DI) MOVQ SP, (g_stack+stack_hi)(DI) // Determine CPU processor information MOVL $0, AX CPUID MOVL AX, SI CMPL AX, $0 JE nocpuinfo #Figure out how to serialize RDTSC. On Intel processors, LFENCE is enough. AMD requires MFENCE. I don't know the rest, so let's do MFENCE. CMPL BX, $0x756E6547 // "Genu" JNE notintel CMPL DX, $0x49656E69 // "ineI" JNE notintel CMPL CX, $0x6C65746E // "ntel" JNE notintel MOVB $1, runtime·isIntel(SB) MOVB $1, runtime·lfenceBeforeRdtsc(SB) # A large piece of code is omitted Copy code
A very important operation that affects the runtime is thread local storage (TLS).
# runtime/asm_amd64.s TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0 # A large piece of code is omitted notintel: #ifdef GOOS_darwin // Skip TLS settings on Darwin JMP ok #endif LEAQ runtime·m0+m_tls(SB), DI #// DI = m0.tls CALL runtime·settls(SB) # Set TLS address to DI // //Use it for storage to ensure normal operation get_tls(BX) MOVQ $0x123, g(BX) MOVQ runtime·m0+m_tls(SB), AX CMPQ AX, $0x123 // Judge whether the TLS is set successfully JEQ 2(PC) // If equal, jump back to two instructions CALL runtime·abort(SB) // Interrupt execution using INT instruction Copy code
To create global variables g0 and m0, you also need to associate m0 and g0 with each other through pointers.
# runtime/asm_amd64.s TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0 # A large piece of code is omitted // Set per goroutine and per Mach "registers" // The program has just started and is now in the main thread // The current stack and resources are saved in g0 // The thread is saved in m0 get_tls(BX) LEAQ runtime·g0(SB), CX MOVQ CX, g(BX) LEAQ runtime·m0(SB), AX //m0 and g0 are associated with each other through pointers. // save m->g0 = g0 MOVQ CX, m_g0(AX) // save m0 to g0->m MOVQ AX, g_m(CX) Copy code
Here we do some checksum system level initialization, including runtime type checking, acquisition of system parameters and initialization of related constants affecting memory management and program scheduling.
# runtime/asm_amd64.s TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0 # A large piece of code is omitted CLD // convention is D is always left cleared //Runtime type check CALL runtime·check(SB) MOVL 16(SP), AX // copy argc MOVL AX, 0(SP) MOVQ 24(SP), AX // copy argv MOVQ AX, 8(SP) //Acquisition of system parameters CALL runtime·args(SB) //Initialization of related constants affecting memory management. CALL runtime·osinit(SB) //Initialization of program scheduling related constants CALL runtime·schedinit(SB) Copy code
It will start running soon!
# runtime/asm_amd64.s TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0 # A large piece of code is omitted // Create a new goroutine to start the program MOVQ $runtime·mainPC(SB), AX // Entry / / the entry mainpc method (that is, the runtime · main function, which is a global variable) is pressed into the AX register PUSHQ AX PUSHQ $0 // arg size pushes the first parameter onto the stack CALL runtime·newproc(SB) // Call the newproc function to create a new g POPQ AX POPQ AX // Start this M.mstart CALL runtime·mstart(SB) CALL runtime·abort(SB) // M.mstart should never return RET //Prevent dead code elimination of debugCallV2, which is designed to be called by the debugger. MOVQ $runtime·debugCallV2<ABIInternal>(SB), AX RET Copy code
The compiler is responsible for generating the entry address of the main function, runtime Mainpc is defined as runtime in the data segment Main save the main goroutine entry address:
# mainPC is runtime The function value of main to be passed to newproc. For runtime Main is referenced through ABI international because newproc requires the actual function (not the ABI0 wrapper). DATA runtime·mainPC+0(SB)/8,$runtime·main<ABIInternal>(SB) GLOBL runtime·mainPC(SB),RODATA,$8 Copy code
When the boot program of the Go program starts, the following core functions will be called to complete the verification and system initialization:
- Check: runtime type check
- args: acquisition of system parameters
- osinit: initialization of related constants affecting memory management
- schedinit: program scheduling and initialization of constants related to memory allocators and recyclers
- newproc: responsible for creating execution unit G that can be scheduled by runtime according to the main goroutine (i.e. main) entry address.
- mstart: start the scheduler's scheduling cycle.
According to the analysis, we know that the Go program is neither from main Main starts directly, not from runtime Main starts directly. Instead, its actual entry is at runtime_ rt0_ amd64_*. It then goes to runtime rt0_ Go call.
Program boot and initialization is one of the most critical basic steps of the whole runtime. During the calling process of schedinit function, the initialization of the whole program runtime will be completed, including the initialization of scheduler, execution stack, memory allocator, scheduler, garbage collector and other components. Finally, the scheduler starts to execute the main goroutine by calling newproc and mstart.
The startup flow chart is as follows:

Core function
We learned some core functions in the previous analysis. Now let's take a brief look at the logic. What exactly does each function do? As for the principle behind analysis, we leave it to specific chapters.
The check function is essentially a check of the compiler's translation work, and checks the memory size of the type again.
//# runtime/runtime1.go func check() { var ( a int8 b uint8 c int16 d uint16 //ellipsis ) type x1t struct { x uint8 } type y1t struct { x1 x1t y uint8 } var x1 x1t var y1 y1t // Verify whether the int8 type sizeof is 1, the same below if unsafe.Sizeof(a) != 1 { throw("bad a") } //ellipsis } Copy code
The args function assigns the two parameters argc and argv passed by the operating system as global variables
//# runtime/runtime1.go var ( argc int32 argv **byte ) func args(c int32, v **byte) { argc = c argv = v sysargs(c, v) } Copy code

Then call the system specific sysargs function.
//runtime/os_dragonfly.go func sysargs(argc int32, argv **byte) { // Skip argv, envv and the first string are paths n := argc + 1 //Skip argv, envp into auxv for argv_index(argv, n) != nil { n++ } // Skip NULL separator / / skip NULL separator n++ // Attempt to read auxv auxv := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*sys.PtrSize)) sysauxv(auxv[:]) } func sysauxv(auxv []uintptr) { // Read auxv key value pairs in sequence for i := 0; auxv[i] != _AT_NULL; i += 2 { tag, val := auxv[i], auxv[i+1] switch tag { case _AT_PAGESZ: // Read the size of the memory page physPageSize = val } } } Copy code
I'm confused here. I've covered the memory pages at the bottom of the operating system. Not much to explain here. I don't understand anymore. 😥
osinit function will obtain the number of CPU cores and the page storage size of the current operating system.
//runtime/os_dragonfly.go func osinit() { // Get the number of CPU cores ncpu = getncpu() if physPageSize == 0 { physPageSize = getPageSize() } } Copy code
The schedinit function is named as an initialization of the scheduler. In fact, what is actually done inside is the initialization of some core parts, such as stack, memory, gc, thread, etc.
The initialization here also has certain order rules. As for why, it may be because the previous function provides some important data for the later function.
// Bootstrap sequence is: // call osinit // call schedinit // Make & queue new g / / add new G to the queue // call runtime·mstart // The new G calls runtime·main. func schedinit() { lockInit(&sched.lock, lockRankSched) //Omit lockinit //Gets an object of g _g_ := getg() sched.maxmcount = 10000 // Limit the maximum number of system threads // The world starts stopped. For lock rank, worldStopped() moduledataverify() stackinit() // Initialize execution stack mallocinit() // Initialize memory allocator fastrandinit() // must run before mcommoninit / / random number initialization, mcommoninit(_g_.m, -1) // Initialize the current system thread. / / the pre allocated id can be passed as "id", or omitted by passing - 1. cpuinit() // Must run before initialize / / initialize CPU Information alginit() // maps must not be used before this call / / initializes the value of the hash algorithm modulesinit() // provides activeModules // activeModules data initialization, mainly used for gc data, typelinksinit() // uses maps, activeModules / / mainly initializes the typemap of activeModules itabsinit() // uses activeModules / / initialize interface related, sigsave(&_g_.m.sigmask) // Initialize the signal mask of m initSigmask = _g_.m.sigmask goargs() // Put the parameter in the argslice variable goenvs() // Environment variables in envs parsedebugvars() // Initialize a series of debug related variables gcinit() // Garbage collector initialization //Scheduler locking lock(&sched.lock) sched.lastpoll = uint64(nanotime()) // Create P // The number of P is determined by the number of CPU cores and GOMAXPROCS environment variable procs := ncpu // //procs is set to the number of CPUs if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 { // If GOMAXPROCS is set, the value of procs is overwritten procs = n } // Increase or decrease the number of instances of P (fill in procs p to the global variable allp storing all P). If there are more P, clean up the more P, and if there are fewer P, create a new P, but m is not started. After m is started, P will be taken from here and hooked if procresize(procs) != nil { throw("unknown runnable goroutine during bootstrap") } //Scheduler unlock unlock(&sched.lock) //Omit a large piece of code } Copy code
The newproc function creates a new G under the P of the current M, which is actually the runtime we expect Main is not directly added to the run queue at the beginning, but placed in the local queue of P to become the next running G.
Why do we have to put it in runtime Runnext, not in the run queue?
My guess is G0 at present, and the corresponding thread of m has not been created at this time. Now I just initialize some related properties of m, so it is not suitable to put it directly into the run queue.
func newproc(siz int32, fn *funcval) { argp := add(unsafe.Pointer(&fn), sys.PtrSize) gp := getg() // Get the pointer of the current goroutine, pc := getcallerpc() // Get the contents of the pseudo register PC, and the function is also filled by the compiler systemstack(func() { //Create a new G newg := newproc1(fn, argp, siz, gp, pc) //Key function //Get pointer to P _p_ := getg().m.p.ptr() //Add the newly created G to the runtime If the runnext queue is full, it will be added to the global queue for scheduling by other P runqput(_p_, newg, true) //Try adding another P to perform G. Called when G becomes runnable (newproc, ready). if mainStarted { wakep() } }) } func newproc1(fn *funcval, argp unsafe.Pointer, narg int32, callergp *g, callerpc uintptr) *g { (...) _g_ := getg() _p_ := _g_.m.p.ptr() newg := gfget(_p_) ////Get a G object from p's dead g list. If not, grab a batch of G objects from the global g list, put them into p's dead g list, and then get them. G will be put back into the dead g list for reuse after running if newg == nil { // It should not be available at the beginning of startup newg = malg(_StackMin) // Create a new g casgstatus(newg, _Gidle, _Gdead) // Set the status of g from idle to dead allgadd(newg) // Gdead publishing with G - > status is used, so the GC scanner does not view uninitialized stacks. } (...) (...)//About the property configuration of newg newg.startpc = fn.fn // Specify the mainPC method (that is, the runtime · main method) as the startup method of this coroutine if _g_.m.curg != nil { newg.labels = _g_.m.curg.labels } if isSystemGoroutine(newg, false) { atomic.Xadd(&sched.ngsys, +1) } // Track initial transition? newg.trackingSeq = uint8(fastrand()) if newg.trackingSeq%gTrackingPeriod == 0 { // Judge whether it is a system collaboration (all g startup functions with the prefix of runtime. * are system collaboration, except runtime.main and runtime.handleasyncevent) newg.tracking = true } casgstatus(newg, _Gdead, _Grunnable) // Set the state of g from dead state to runnable state (...) releasem(_g_.m) // Abandon exclusive m return newg } Copy code
mstart function mainly starts M and starts scheduling (we'll discuss this next time).
//Mstart is the entry point for new Ms. It is written in assembly, uses ABI0, marks it as TOPFRAME, and calls mstart0. func mstart() func mstart0() { _g_ := getg() osStack := _g_.stack.lo == 0 if osStack { //Initializes the stack boundary from the system stack. Cgo may be in stack The stack size is reserved in hi. minit may update the stack boundary. Note: these boundaries may not be very accurate. We set hi to & size, but there are some things on it. 1024 should make up for this, but it's a little arbitrary. size := _g_.stack.hi if size == 0 { size = 8192 * sys.StackGuardMultiplier } _g_.stack.hi = uintptr(noescape(unsafe.Pointer(&size))) _g_.stack.lo = _g_.stack.hi - size + 1024 } //Initialize stack protection so that we can start calling general // Go code. _g_.stackguard0 = _g_.stack.lo + _StackGuard // This is g0, so we can also call the go:systemstack function to check stackguard1. _g_.stackguard1 = _g_.stackguard0 mstart1() // Exit this thread. if mStackIsSystemAllocated() { // Windows, Solaris, illumos, Darwin, AIX, and Plan 9 are always system allocate stacks, but are placed before mstart_ g_.stack, so the above logic has not set ossack. osStack = true } mexit(osStack) } func mstart1() { _g_ := getg() if _g_ != _g_.m.g0 { // Judge whether it is g0 throw("bad runtime·mstart") } _g_.sched.g = guintptr(unsafe.Pointer(_g_)) _g_.sched.pc = getcallerpc() // Save pc and sp information to g0 _g_.sched.sp = getcallersp() asminit() // asm initialization minit() // m initialization // Install signal handlers; after minit so that minit can // prepare the thread to be able to handle the signals. if _g_.m == &m0 { mstartm0() // Start signal handler for m0 } if fn := _g_.m.mstartfn; fn != nil { fn() } if _g_.m != &m0 { // If not m0 acquirep(_g_.m.nextp.ptr()) _g_.m.nextp = 0 } schedule() // Enter scheduling. This function blocks } Copy code
Summary process
- Entrance: rt0_windows_amd64.s assembly function
- Initialize m0,g0
- Check: check the correctness of the memory occupied by each type
- args: set argc and argv parameters
- osinit: operating system related init, such as page size
- schedinit: initialize all P, initialize other details
- newproc: create a new G under p of the current m (m0) and specify it as the next running g of p
- mstart: m0 starts, and then enters the scheduling, where it is blocked
- abort: exit
tips
After going through the whole process, I still feel a little vague. Maybe I don't have enough knowledge of the operating system to support myself to understand the whole process, but don't panic. take your time! Come on, dumpling! Later, we will gradually learn the operating system, and then supplement relevant details.