Learn computer again (13. Did the program start with main?)

Posted by sundru on Thu, 06 Jan 2022 17:13:12 +0100

In this article, I want to compare the hard core. Did the program start with the main function?

13.1 does the program start with the main function?

13.1.1 gcc compilation detailed output

When we learn the c language, does the teacher always say that the c program starts from the main function, and then when we write code, we actually start from the main function. After compilation and execution, the printing is also printed from the main function. Whether the c program starts from the main function is very deep-rooted. This time, let's overturn it.

Let's write a code:

#include <stdio.h>

int main(int argc, char **argv)
{
    printf("hello world\n");
    return 0;
}

It's the familiar hello world again. I've talked about more than a dozen articles, and it seems to be back to the origin.

root@ubuntu:~/c_test/13# gcc test.c -o test
root@ubuntu:~/c_test/13# ./test
hello world

It is compiled and run again. It seems that it is still a familiar formula. There are no other changes.

We use a - v parameter of gcc (I forgot to compile the link before. It's embarrassing, but it's good to make it up now)

/usr/lib/gcc/x86_64-linux-gnu/5/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/5/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper -plugin-opt=-fresolution=/tmp/ccp83T0j.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -z relro -o test /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/5/../../.. /tmp/ccQmBcOv.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/5/crtend.o /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crtn.o

collect2 is the linker ld we mentioned earlier. The intercepted part is actually the link part. Through inspection, we do find the following O file, participation link: CRT1 o,crti.o,crtbegin.o.

But this does not prove that o will run in the gas of the main function.

13.1.2 link script

Have you forgotten that the program link is controlled through the linker? If we do not specify the linker, it is the default connector. If you forget, you can go back to this article: Relearn computer (v. static link and link control).

Now let's intercept a useful paragraph:

root@ubuntu:/usr/lib/ldscripts# ld -verbose
==================================================
/* Script for -z combreloc: combine and sort reloc sections */
/* Copyright (C) 2014-2015 Free Software Foundation, Inc.
   Copying and distribution of this script, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.  */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
	      "elf64-x86-64")		
OUTPUT_ARCH(i386:x86-64)    /* Output format */
ENTRY(_start)				/* This is important to specify the entry function of the program */
SEARCH_DIR("=/usr/local/lib/x86_64-linux-gnu"); SEARCH_DIR("=/lib/x86_64-linux-gnu"); SEARCH_DIR("=/usr/lib/x86_64-linux-gnu"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); SEARCH_DIR("=/usr/x86_64-linux-gnu/lib64"); SEARCH_DIR("=/usr/x86_64-linux-gnu/lib");
/* SEARCH_DIR It is the library that the ld linker looks up in the specified directory, which is equivalent to - Lpath */
SECTIONS	/* This is the definition of each segment. Are you familiar with the following segment names */
{
  /* Read-only sections, merged into text segment: */
  /* Define a symbol in the link script that can be used in code */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;  /* This is the start address of the defined program, sizeof_ Heads can be left for later*/
  .interp         : { *(.interp) }		// *Is a wildcard that indicates the of all files All interp segments are qualified
  .note.gnu.build-id : { *(.note.gnu.build-id) }
  
  .init           :
  {
    KEEP (*(SORT_NONE(.init)))
    //After the option - GC sections is used in the connection command line, the connector may filter out some sections that it considers useless. At this time, it is necessary to force the connector to retain some specific sections. This can be achieved by using the KEEP() keyword
  }
  .fini           :
  {
    KEEP (*(SORT_NONE(.fini)))
  }
  .init_array     :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
    KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors))
    PROVIDE_HIDDEN (__init_array_end = .);
  }
  .fini_array     :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*)))
    KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors))
    PROVIDE_HIDDEN (__fini_array_end = .);
  }
  .ctors          :
  {
    /* gcc uses crtbegin.o to find the start of
       the constructors, so we make sure it is
       first.  Because this is a wildcard, it
       doesn't matter if the user does not
       actually link against crtbegin.o; the
       linker won't look for a file to match a
       wildcard.  The wildcard also means that it
       doesn't matter which directory crtbegin.o
       is in.  */
    KEEP (*crtbegin.o(.ctors))
    KEEP (*crtbegin?.o(.ctors))
    /* We don't want to include the .ctor section from
       the crtend.o file until after the sorted ctors.
       The .ctor section from the crtend file contains the
       end of ctors marker and it must be last */
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
    KEEP (*(SORT(.ctors.*)))
    KEEP (*(.ctors))
  }
}

Through this file, we can see that the function entry is ENTRY(_start). This time, I'm sure it's not the start of the main function. I'll stay init and fini segment, description There will be code execution before and after text.

Let's modify the above code slightly:

#include <stdio.h>

// static void __attribute__((section(".init"))) init_main(void)
// {
//     printf("init main\n");
// }

static void __attribute__ ((constructor)) before_main(void)   // Before main function
{
    printf("befor main\n");
}

static void __attribute__ ((destructor)) after_main(void)  // After main function
{
    printf("after main\n");
}

// static void __attribute__((section(".fini"))) fini_main(void)
// {
//     printf("fini main\n");
// }

int main(int argc, char **argv)
{
    printf("hello world\n");

    return 0;
}

Use__ attribute__ To specify the properties of the following functions, compile and run:

root@ubuntu:~/c_test/13# gcc test.c -o test
root@ubuntu:~/c_test/13# ./test
befor main
hello world
after main
root@ubuntu:~/c_test/13# 

It is found that it is before and after the main function. Is there any doubt here about why the init and fini segments should be shielded? In fact, if these two codes are opened, there will be segment errors. Although printing is OK, after printing, there will be segment errors. Record this problem and come back for analysis later. And about__ attribute__ You can look at this article, which is quite good: Introduction to several useful GCC attributes

13.1.3 _start function

My system is Ubuntu 64 bit, so_ start.S is in: sysdeps\x86_64\start.S.

Let's copy it and analyze it:

Both are compilation + English. It's big at first sight. It's still translated with translation software.

/* This is the canonical entry point, usually the first thing in the text
   segment.  The SVR4/i386 ABI (pages 3-31, 3-32) says that when the entry
   point runs, most registers' values are unspecified, except for:

   %rdx		Contains a function pointer to register with 'atexit'
		This is how the dynamic linker calls DT for the shared library_ Fini type functions, which are loaded before the code runs

   %rsp		The stack contains parameters and environment:
		0(%rsp)				argc
		LP_SIZE(%rsp)			argv[0]
		...
		(LP_SIZE*argc)(%rsp)		NULL
		(LP_SIZE*(argc+1))(%rsp)	envp[0]
		...
						NULL
*/
/* The part noted above is in_ It's ready before start, that is, there are so many variables in the stack at this time. It seems that I really don't know when to store these variables in the register. Is it in the exec function? This needs to analyze the kernel, but don't care for the time being. We'll talk about it when we have a chance in the future. */
#include <sysdep.h>

ENTRY (_start)
	/* Insufficient clear frame pointer, use CFI  */
	cfi_undefined (rip)
	/* Clear stack pointer.  */
	/* EBP: The extended base pointer register stores a pointer that always points to the bottom of the top stack frame of the system stack */
	xorl %ebp, %ebp

	/* Extract the parameters encoded on the stack and set__ libc_ start_ Parameters of main (int (* main) (int, char * *, char * *),
		   int argc, char *argv,
		   void (*init) (void), void (*fini) (void),
		   void (*rtld_fini) (void), void *stack_end).
	   Parameters are passed through registers and stacks:
	main:		%rdi
	argc:		%rsi
	argv:		%rdx
	init:		%rcx
	fini:		%r8
	rtld_fini:	%r9
	stack_end:	stack.	*/

	mov %RDX_LP, %R9_LP	/* About rdx_ For the definition of LP, see sysdep. In the same folder h. The rdx register first stores the termination function of the DLL and passes this address to the r9 register 
    It should be that the dynamic linker will assign the link address in the rdx register.*/
#ifdef __ILP32__
	mov (%rsp), %esi	/* Simulate popping 4-byte argument count.  */
	add $4, %esp
#else
	popq %rsi		/* rsi Point to argc  */
#endif
	/* After argc is ejected, argv is at the top of the stack, and RDX points to argv  */
	mov %RSP_LP, %RDX_LP
	/* Align the stack to a 16 byte boundary to follow ABI  */
	and  $~15, %RSP_LP

	/* Push garbage because we push 8 more bytes.  */
	pushq %rax

	/* After the above steps, rsp may have pointed to the actual top of the stack Omission, embarrassment  */
	pushq %rsp

#ifdef PIC / / dynamic link
	/* Pass address of our own entry points to .fini and .init.  */
	mov __libc_csu_fini@GOTPCREL(%rip), %R8_LP		// These two continue to assign values
	mov __libc_csu_init@GOTPCREL(%rip), %RCX_LP

	mov main@GOTPCREL(%rip), %RDI_LP			// The main function address is also stored in rdi
#else
	/* Pass our own entry address to fini and init.   */
	mov $__libc_csu_fini, %R8_LP
	mov $__libc_csu_init, %RCX_LP

	mov $main, %RDI_LP
#endif

	/* Call the user's main function and exit with its value
	   But let libc call main Since __ libc_ start_ main in
	   libc.so is called very early, lazy binding isn't relevant
	   here.  Use indirect branch via GOT to avoid extra branch
	   to PLT slot.  In case of static executable, ld in binutils
	   2.26 or above can convert indirect branch into direct
	   branch.  */
	call *__libc_start_main@GOTPCREL(%rip)

	hlt			/* Crash if 'exit' returns	 */
END (_start)

The x86 compilation is really a big head. It also has a simple analysis. I don't understand it very much. I feel it's for you__ libc_start_main is populated with various parameters. You can take a look at this one. It's well written [Reading Notes in Chapter 11 of self cultivation of programmers]

13.1.4 __libc_start_main function

Next, follow the jump function:

This__ libc_start_main in CSU / libc start C,

STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
		 int argc, char **argv,
		 __typeof (main) init,
		 void (*fini) (void),
		 void (*rtld_fini) (void), void *stack_end)
    {
  /* Result of the 'main' function.  */
  int result;

  __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up;

#ifndef SHARED
  _dl_relocate_static_pie ();

  char **ev = &argv[argc + 1];

  __environ = ev;    // Get environment variable

  /* Store the lowest stack address.  This is done in ld.so if this is
     the code for the DSO.  */
  __libc_stack_end = stack_end;



  /* Initialize libpthread if linked in.  */    // Initialize multithreading
  if (__pthread_initialize_minimal != NULL)
    __pthread_initialize_minimal ();


#endif /* !SHARED  */

  /* Register the destructor of the dynamic linker if there is any.  */
  if (__glibc_likely (rtld_fini != NULL))
    __cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL);
    //Register the destructor of the dynamic linker, if any

#ifndef SHARED
  /* Call the initializer of the libc.  This is only needed here if we
     are compiling for the static library in which case we haven't
     run the constructors in `_dl_start_user'.  */
  __libc_init_first (argc, argv, __environ);      // Initialize libc Library

  /* Register the destructor of the program, if any.  */
    //Register the destructor of the program (if any)
  if (fini)
    __cxa_atexit ((void (*) (void *)) fini, NULL, NULL);

  /* Some security at this point.  Prevent starting a SUID binary where
     the standard file descriptors are not opened.  We have to do this
     only for statically linked applications since otherwise the dynamic
     loader did the work already.  */
  if (__builtin_expect (__libc_enable_secure, 0))
    __libc_check_standard_fds ();
#endif

  if (init)
    (*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);    // init function execution


  /* Nothing fancy, just call the function.  */
  result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);   // The main function is here. It's a lot of relief to see here

  exit (result);
}

There are a lot of functions in this function. I think it's almost deleted by me. Before I don't have enough skills, it's really unnecessary to deeply study all lines of code. It's too difficult. It's really too difficult to remove some unnecessary ones and leave the key ones.

13.2 exit() function

When I saw the application of linux to the kernel, I introduced the exit function. I'll follow it. If it's too difficult, I'll retreat immediately.

void
exit (int status)
{
  __run_exit_handlers (status, &__exit_funcs, true, true);
}

Will call__ run_exit_handlers.

__ run_ exit_ As long as the functions of handlers are cleaned up when they are registered, they are executed. exit() function.

void
_exit (int status)
{
  while (1)
    {
#ifdef __NR_exit_group
      INLINE_SYSCALL (exit_group, 1, status);    //This is like a multithreaded exit
#endif
      INLINE_SYSCALL (exit, 1, status);		// This is the previous exit

      // Look at this call and you will know that it is a system function in the kernel
      
#ifdef ABORT_INSTRUCTION
      ABORT_INSTRUCTION;
#endif
    }
}

I won't read the kernel first. I'll see you again. It seems that too hard core analysis can't work.

13.3 summary

Although this article has a hard core, it's just for understanding. Just know this. I'll have a chance to analyze the core in the future. Now stop the car first.