[Go basics] start process

Posted by eskimo42 on Sat, 11 Dec 2021 03:56:36 +0100

preface

Every time I write the Go program, I'm always curious about his startup process. Today, let's pick it up.

Note: my computer is win10, so many places are not mainly linux. At the same time, this is my own learning process. There may be mistakes. I hope I can get guidance!

At the same time, part of the code in the article will be processed, and more attention will be paid to the core code flow.

I hope readers can understand a little assembly language.

assembly

To start a Go program, you need to initialize its own runtime, and its real program entry is in the runtime package.

The entry files of different platforms are different. Take Linux, macOS and win10 on AMD64 architecture as examples, which are located in Src / Runtime / RT0 respectively_ linux_ amd64. S and src/runtime/rt0_darwin_amd64.s and src/runtime/rt0_windows_amd64.s.

You can see similar entry code in all three files.

# runtime/rt0_windows_amd64.s
#Taking windows as an example, linux and macos are the same, but the name has changed.
TEXT _rt0_amd64_windows(SB),NOSPLIT,$-8
	JMP	_rt0_amd64(SB)
Copy code

JMP is an unconditional jump, and then it jumps to_ rt0_amd64 this subroutine.

This approach is intuitive. After the program is compiled into machine code, it depends on the instruction set of specific CPU architecture, and the difference of operating system is directly reflected in different system level operations at runtime, such as system calls. rt0 is actually the abbreviation of runtime0, which means the creation of runtime. All subsequent creation is suffixed with 1.

The operating system communicates with the application through the agreement of entry parameters. When the program is just started, the first two values of stack pointer SP correspond to argc and argv respectively, and store the number of parameters and the value of specific parameters respectively

# runtime/asm_amd64.s
# _rt0_amd64 is common startup code for most amd64 systems when using
# internal linking. This is the entry point for the program from the
# kernel for an ordinary -buildmode=exe program. The stack holds the
# number of arguments and the C-style argv.
#_ rt0_amd64 is a common startup code for most amd64 systems when using internal links. This is the entry point of the kernel program of the ordinary - buildmode=exe program. The stack holds the number of parameters and C-style argv.
TEXT _rt0_amd64(SB),NOSPLIT,$-8
	MOVQ	0(SP), DI	// argc
	LEAQ	8(SP), SI	// argv
	JMP	runtime·rt0_go(SB)
Copy code

rt0_go

Then continue to jump to rt0_go subroutine.

Let's dig into the logic.

The previous part is to determine the program entry parameters and CPU processor information.

# runtime/asm_amd64.s
TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
	// Copy the parameters forward to an even stack
	MOVQ	DI, AX		// argc
	MOVQ	SI, BX		// argv
	SUBQ	$(4*8+7), SP		// 2args 2auto
	ANDQ	$~15, SP
	MOVQ	AX, 16(SP)
	MOVQ	BX, 24(SP)

	#Creates an istack from a given (operating system) stack_ cgo_init may update stackguard.
	# Initialize g0 execution stack
	MOVQ	$runtime·g0(SB), DI
	LEAQ	(-64*1024+104)(SP), BX
	MOVQ	BX, g_stackguard0(DI)
	MOVQ	BX, g_stackguard1(DI)
	MOVQ	BX, (g_stack+stack_lo)(DI)
	MOVQ	SP, (g_stack+stack_hi)(DI)

	// Determine CPU processor information
	MOVL	$0, AX
	CPUID
	MOVL	AX, SI
	CMPL	AX, $0
	JE	nocpuinfo
		#Figure out how to serialize RDTSC. On Intel processors, LFENCE is enough. AMD requires MFENCE. I don't know the rest, so let's do MFENCE.
	CMPL	BX, $0x756E6547  // "Genu"
	JNE	notintel
	CMPL	DX, $0x49656E69  // "ineI"
	JNE	notintel
	CMPL	CX, $0x6C65746E  // "ntel"
	JNE	notintel
	MOVB	$1, runtime·isIntel(SB)
	MOVB	$1, runtime·lfenceBeforeRdtsc(SB)
# A large piece of code is omitted
 Copy code

A very important operation that affects the runtime is thread local storage (TLS).

# runtime/asm_amd64.s
TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
# A large piece of code is omitted

notintel:
#ifdef GOOS_darwin
	// Skip TLS settings on Darwin
	JMP ok
#endif
	LEAQ	runtime·m0+m_tls(SB), DI #// DI = m0.tls
	CALL	runtime·settls(SB) # Set TLS address to DI

	// //Use it for storage to ensure normal operation
	get_tls(BX)
	MOVQ	$0x123, g(BX)
	MOVQ	runtime·m0+m_tls(SB), AX 
	CMPQ	AX, $0x123 // Judge whether the TLS is set successfully
	JEQ 2(PC)  // If equal, jump back to two instructions
	CALL	runtime·abort(SB) // Interrupt execution using INT instruction
 Copy code

To create global variables g0 and m0, you also need to associate m0 and g0 with each other through pointers.

# runtime/asm_amd64.s
TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
# A large piece of code is omitted
	// Set per goroutine and per Mach "registers"
	// The program has just started and is now in the main thread
	// The current stack and resources are saved in g0
	// The thread is saved in m0
	get_tls(BX)
	LEAQ	runtime·g0(SB), CX
	MOVQ	CX, g(BX)
	LEAQ	runtime·m0(SB), AX
	//m0 and g0 are associated with each other through pointers.
	// save m->g0 = g0
	MOVQ	CX, m_g0(AX)
	// save m0 to g0->m
	MOVQ	AX, g_m(CX)
Copy code

Here we do some checksum system level initialization, including runtime type checking, acquisition of system parameters and initialization of related constants affecting memory management and program scheduling.

# runtime/asm_amd64.s
TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
# A large piece of code is omitted

	CLD				// convention is D is always left cleared
	//Runtime type check
	CALL	runtime·check(SB)

	MOVL	16(SP), AX		// copy argc
	MOVL	AX, 0(SP)
	MOVQ	24(SP), AX		// copy argv
	MOVQ	AX, 8(SP)
	//Acquisition of system parameters
	CALL	runtime·args(SB)
	//Initialization of related constants affecting memory management.
	CALL	runtime·osinit(SB)
	//Initialization of program scheduling related constants
	CALL	runtime·schedinit(SB)
Copy code

It will start running soon!

# runtime/asm_amd64.s
TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
# A large piece of code is omitted

	// Create a new goroutine to start the program
	MOVQ	$runtime·mainPC(SB), AX		// Entry / / the entry mainpc method (that is, the runtime · main function, which is a global variable) is pressed into the AX register
	PUSHQ	AX
	PUSHQ	$0			// arg size pushes the first parameter onto the stack
	CALL	runtime·newproc(SB) // Call the newproc function to create a new g
	POPQ	AX
	POPQ	AX

	// Start this M.mstart 
	CALL	runtime·mstart(SB)

	CALL	runtime·abort(SB)	// M.mstart should never return
	RET

	//Prevent dead code elimination of debugCallV2, which is designed to be called by the debugger.
	MOVQ	$runtime·debugCallV2<ABIInternal>(SB), AX
	RET
 Copy code

The compiler is responsible for generating the entry address of the main function, runtime Mainpc is defined as runtime in the data segment Main save the main goroutine entry address:

# mainPC is runtime The function value of main to be passed to newproc. For runtime Main is referenced through ABI international because newproc requires the actual function (not the ABI0 wrapper).
DATA	runtime·mainPC+0(SB)/8,$runtime·main<ABIInternal>(SB)
GLOBL	runtime·mainPC(SB),RODATA,$8
 Copy code

When the boot program of the Go program starts, the following core functions will be called to complete the verification and system initialization:

Check: runtime type check
args: acquisition of system parameters
osinit: initialization of related constants affecting memory management
schedinit: program scheduling and initialization of constants related to memory allocators and recyclers
newproc: responsible for creating execution unit G that can be scheduled by runtime according to the main goroutine (i.e. main) entry address.
mstart: start the scheduler's scheduling cycle.

According to the analysis, we know that the Go program is neither from main Main starts directly, not from runtime Main starts directly. Instead, its actual entry is at runtime_ rt0_ amd64_*. It then goes to runtime rt0_ Go call.

Program boot and initialization is one of the most critical basic steps of the whole runtime. During the calling process of schedinit function, the initialization of the whole program runtime will be completed, including the initialization of scheduler, execution stack, memory allocator, scheduler, garbage collector and other components. Finally, the scheduler starts to execute the main goroutine by calling newproc and mstart.

The startup flow chart is as follows:

Core function

We learned some core functions in the previous analysis. Now let's take a brief look at the logic. What exactly does each function do? As for the principle behind analysis, we leave it to specific chapters.

The check function is essentially a check of the compiler's translation work, and checks the memory size of the type again.

//# runtime/runtime1.go
func check() {
    var (
		a     int8
		b     uint8
		c     int16
		d     uint16
        //ellipsis
	)
    type x1t struct {
		x uint8
	}
	type y1t struct {
		x1 x1t
		y  uint8
	}
	var x1 x1t
	var y1 y1t
	// Verify whether the int8 type sizeof is 1, the same below
	if unsafe.Sizeof(a) != 1 {
		throw("bad a")
	}
    //ellipsis
    
}
Copy code

The args function assigns the two parameters argc and argv passed by the operating system as global variables

//# runtime/runtime1.go
var (
	argc int32
	argv **byte
)

func args(c int32, v **byte) {
	argc = c 
	argv = v
	sysargs(c, v)
}
Copy code

Then call the system specific sysargs function.

//runtime/os_dragonfly.go
func sysargs(argc int32, argv **byte) {
    // Skip argv, envv and the first string are paths
	n := argc + 1

	//Skip argv, envp into auxv
	for argv_index(argv, n) != nil {
		n++
	}

	// Skip NULL separator / / skip NULL separator
	n++
	// Attempt to read auxv
	auxv := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*sys.PtrSize))
	sysauxv(auxv[:])
}

func sysauxv(auxv []uintptr) {
    // Read auxv key value pairs in sequence
	for i := 0; auxv[i] != _AT_NULL; i += 2 {
		tag, val := auxv[i], auxv[i+1]
		switch tag {
		case _AT_PAGESZ:
            // Read the size of the memory page
			physPageSize = val
		}
	}
}
Copy code

I'm confused here. I've covered the memory pages at the bottom of the operating system. Not much to explain here. I don't understand anymore. 😥

osinit function will obtain the number of CPU cores and the page storage size of the current operating system.

//runtime/os_dragonfly.go
func osinit() {
    // Get the number of CPU cores
	ncpu = getncpu()
	if physPageSize == 0 {
		physPageSize = getPageSize()
	}
}
Copy code

The schedinit function is named as an initialization of the scheduler. In fact, what is actually done inside is the initialization of some core parts, such as stack, memory, gc, thread, etc.

The initialization here also has certain order rules. As for why, it may be because the previous function provides some important data for the later function.

// Bootstrap sequence is:
//	call osinit
//	call schedinit
//	Make & queue new g / / add new G to the queue
//	call runtime·mstart 
// The new G calls runtime·main. 
func schedinit() {
	lockInit(&sched.lock, lockRankSched)
    //Omit lockinit

	//Gets an object of g
	_g_ := getg()

	sched.maxmcount = 10000 // Limit the maximum number of system threads

	// The world starts stopped.   For lock rank,
	worldStopped()

	moduledataverify()
	stackinit() // Initialize execution stack
	mallocinit() // Initialize memory allocator
	fastrandinit() // must run before mcommoninit / / random number initialization,
	mcommoninit(_g_.m, -1) 	// Initialize the current system thread. / / the pre allocated id can be passed as "id", or omitted by passing - 1.
	cpuinit()       // Must run before initialize / / initialize CPU Information
	alginit()       // maps must not be used before this call / / initializes the value of the hash algorithm
	modulesinit()   // provides activeModules // activeModules data initialization, mainly used for gc data,
	typelinksinit() // uses maps, activeModules / / mainly initializes the typemap of activeModules
	itabsinit()     // uses activeModules / / initialize interface related,

	sigsave(&_g_.m.sigmask) // Initialize the signal mask of m
	initSigmask = _g_.m.sigmask

	goargs()  // Put the parameter in the argslice variable
	goenvs()  // Environment variables in envs
	parsedebugvars()  // Initialize a series of debug related variables
	gcinit()  // Garbage collector initialization
	//Scheduler locking
	lock(&sched.lock)
	sched.lastpoll = uint64(nanotime())
    // Create P
	// The number of P is determined by the number of CPU cores and GOMAXPROCS environment variable
	procs := ncpu // //procs is set to the number of CPUs
	if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {  // If GOMAXPROCS is set, the value of procs is overwritten
		procs = n
	}
    // Increase or decrease the number of instances of P (fill in procs p to the global variable allp storing all P). If there are more P, clean up the more P, and if there are fewer P, create a new P, but m is not started. After m is started, P will be taken from here and hooked
	if procresize(procs) != nil {
		throw("unknown runnable goroutine during bootstrap")
	}
    //Scheduler unlock
	unlock(&sched.lock)
	//Omit a large piece of code
}
Copy code

The newproc function creates a new G under the P of the current M, which is actually the runtime we expect Main is not directly added to the run queue at the beginning, but placed in the local queue of P to become the next running G.

Why do we have to put it in runtime Runnext, not in the run queue?

My guess is G0 at present, and the corresponding thread of m has not been created at this time. Now I just initialize some related properties of m, so it is not suitable to put it directly into the run queue.

func newproc(siz int32, fn *funcval) {
	argp := add(unsafe.Pointer(&fn), sys.PtrSize)
	gp := getg() // Get the pointer of the current goroutine,
	pc := getcallerpc() // Get the contents of the pseudo register PC, and the function is also filled by the compiler
	systemstack(func() {
        //Create a new G
		newg := newproc1(fn, argp, siz, gp, pc) //Key function
		//Get pointer to P
		_p_ := getg().m.p.ptr()
        //Add the newly created G to the runtime If the runnext queue is full, it will be added to the global queue for scheduling by other P
		runqput(_p_, newg, true)
		//Try adding another P to perform G. Called when G becomes runnable (newproc, ready).
		if mainStarted {
			wakep()
		}
	})
}
func newproc1(fn *funcval, argp unsafe.Pointer, narg int32, callergp *g, callerpc uintptr) *g {
    (...)
	_g_ := getg()
	_p_ := _g_.m.p.ptr()
	newg := gfget(_p_) ////Get a G object from p's dead g list. If not, grab a batch of G objects from the global g list, put them into p's dead g list, and then get them. G will be put back into the dead g list for reuse after running
	if newg == nil { // It should not be available at the beginning of startup
		newg = malg(_StackMin) // Create a new g
		casgstatus(newg, _Gidle, _Gdead) // Set the status of g from idle to dead
		allgadd(newg) // Gdead publishing with G - > status is used, so the GC scanner does not view uninitialized stacks.
	}
    (...)

    (...)//About the property configuration of newg
	newg.startpc = fn.fn // Specify the mainPC method (that is, the runtime · main method) as the startup method of this coroutine
	if _g_.m.curg != nil {
		newg.labels = _g_.m.curg.labels
	}
	if isSystemGoroutine(newg, false) {
		atomic.Xadd(&sched.ngsys, +1)
	}
	// Track initial transition?
	newg.trackingSeq = uint8(fastrand())
	if newg.trackingSeq%gTrackingPeriod == 0 { // Judge whether it is a system collaboration (all g startup functions with the prefix of runtime. * are system collaboration, except runtime.main and runtime.handleasyncevent)
		newg.tracking = true
	}
	casgstatus(newg, _Gdead, _Grunnable)  // Set the state of g from dead state to runnable state

	(...)
	releasem(_g_.m) // Abandon exclusive m

	return newg
}

Copy code

mstart function mainly starts M and starts scheduling (we'll discuss this next time).

//Mstart is the entry point for new Ms. It is written in assembly, uses ABI0, marks it as TOPFRAME, and calls mstart0.
func mstart()
func mstart0() {
	_g_ := getg()

	osStack := _g_.stack.lo == 0
	if osStack {
//Initializes the stack boundary from the system stack. Cgo may be in stack The stack size is reserved in hi. minit may update the stack boundary. Note: these boundaries may not be very accurate. We set hi to & size, but there are some things on it. 1024 should make up for this, but it's a little arbitrary.
		size := _g_.stack.hi
		if size == 0 {
			size = 8192 * sys.StackGuardMultiplier
		}
		_g_.stack.hi = uintptr(noescape(unsafe.Pointer(&size)))
		_g_.stack.lo = _g_.stack.hi - size + 1024
	}
	//Initialize stack protection so that we can start calling general
	// Go code.
	_g_.stackguard0 = _g_.stack.lo + _StackGuard
	// This is g0, so we can also call the go:systemstack function to check stackguard1.
	_g_.stackguard1 = _g_.stackguard0
	mstart1()

	// Exit this thread.
	if mStackIsSystemAllocated() {
		// Windows, Solaris, illumos, Darwin, AIX, and Plan 9 are always system allocate stacks, but are placed before mstart_ g_.stack, so the above logic has not set ossack.
		osStack = true
	}
	mexit(osStack)
}
func mstart1() {
	_g_ := getg()

	if _g_ != _g_.m.g0 { // Judge whether it is g0
		throw("bad runtime·mstart")
	}
	_g_.sched.g = guintptr(unsafe.Pointer(_g_))
	_g_.sched.pc = getcallerpc()   // Save pc and sp information to g0
	_g_.sched.sp = getcallersp()

	asminit() // asm initialization
	minit()  // m initialization

	// Install signal handlers; after minit so that minit can
	// prepare the thread to be able to handle the signals.
	if _g_.m == &m0 {
		mstartm0()  // Start signal handler for m0
	}

	if fn := _g_.m.mstartfn; fn != nil {
		fn()
	}

	if _g_.m != &m0 { // If not m0
		acquirep(_g_.m.nextp.ptr())
		_g_.m.nextp = 0
	}
	schedule()   // Enter scheduling. This function blocks
}
Copy code

Summary process

Entrance: rt0_windows_amd64.s assembly function
Initialize m0,g0
Check: check the correctness of the memory occupied by each type
args: set argc and argv parameters
osinit: operating system related init, such as page size
schedinit: initialize all P, initialize other details
newproc: create a new G under p of the current m (m0) and specify it as the next running g of p
mstart: m0 starts, and then enters the scheduling, where it is blocked
abort: exit

tips

After going through the whole process, I still feel a little vague. Maybe I don't have enough knowledge of the operating system to support myself to understand the whole process, but don't panic. take your time! Come on, dumpling! Later, we will gradually learn the operating system, and then supplement relevant details.

Programmer Think