Symbolic source code analysis in [Linux] link

Posted by adamjblakey on Wed, 22 Dec 2021 09:08:23 +0100

1, Definition of symbols

The essence of linking is to splice multiple different object files together like a jigsaw puzzle. In order to make different object files bond with each other, these object files need specific rules.

Specific rules: references to addresses between target files, that is, references to the addresses of variables or functions

For example, if the function foo in the object file A is used in the object file B, then the object file A defines the function foo, and the object file B refers to the function foo in the object file A

In the link, we collectively refer to functions and variables as symbols, and function names and variable names are symbolic names

Each target file will have a symbol table to record all symbol names and symbol values. Symbol values refer to the address of symbols

With simplesection C as an example, symbols can be divided into the following types:

int printf( const char* format, ... );
int global_init_var = 84;
int global_uninite_var;
void func1( int i )
{
  printf("%d\n", i);
}
int main()
{
  static int static_var = 85;
  static int static_var2;
  int a = 1;
  int b;
  func1(static_var + static_var2 + a + b);
  return a;                                
}

  • Global symbols defined in this target file: can be referenced by other target files, such as global_init_var,global_uninite_var,func1,main
  • Global symbol referenced in the target file: this symbol is defined in other target files, such as printf
  • Local symbols: these symbols are only visible in this object file, such as static_var,static_var2,a,b
  • Segment name: this symbol is generated by the compiler, and its symbol value is the starting address of the segment

Use the nm command to view the target file symbol. The results are as follows:

[gongruiyang@localhost ws]$ nm SimpleSection.o
0000000000000000 T func1
0000000000000000 D global_init_var
0000000000000004 C global_uninite_var
0000000000000022 T main
                 U printf
0000000000000004 d static_var.1801
0000000000000000 b static_var2.1802
  • T: The symbol is located at text segment
  • D/d: this symbol is located at data segment
  • C: This symbol is common data and uninitialized data
  • U: Indicates that the symbol is undefined and needs to be found in other target files
  • b: The symbol is located at bss segment

2, Symbol structure: Elf32_Sym

The symbol table in the ELF file is a segment in the target file called symtab, the information of this segment is represented by the structure Elf32_Shdr to describe that the data information in the segment is actually an array, and each element in the array is a structure Elf32_Sym: include\linux\elf.h

typedef struct elf32_sym{
    Elf32_Word/*unsigned int*/    st_name;
    Elf32_Addr/*unsigned int*/    st_value;
    Elf32_Word/*unsigned int*/    st_size;
    unsigned char                 st_info;
    unsigned char                 st_other;
    Elf32_Half/*unsigned short*/  st_shndx;
} Elf32_Sym;

typedef struct elf64_sym {
    Elf64_Word/*unsigned int*/        st_name;   /* Symbol name, index in string tbl */
    unsigned char                     st_info;  /* Type and binding attributes */
    unsigned char                     st_other; /* No defined meaning, 0 */
    Elf64_Half/*unsigned short*/      st_shndx;    /* Associated section index */
    Elf64_Addr/*unsigned long long*/  st_value;    /* Value of the symbol */
    Elf64_Xword/*unsigned long long*/ st_size;    /* Associated symbol size */
} Elf64_Sym;
  • st_name: symbol name. The value of this member variable represents the subscript of the symbol name in the string table
  • st_value: the value corresponding to the symbol, which may be the address of the symbol. The values of different symbols have different meanings
  • st_size: symbol size refers to the size of the data type of the symbol
  • st_info: symbol type and binding information. The lower 4 bits of the member represent the symbol type, and the upper 28 bits represent the binding information of the symbol
  • st_other: meaningless, fill 0
  • st_shndx: the segment where the symbol is located

st_info

Symbol binding information

#define STB_LOCAL  0
#define STB_GLOBAL 1
#define STB_WEAK   2
Constant namevaluemeaning
STB_LOCAL0Local symbol, not visible externally
STB_GLOBAL1Global symbol, externally visible
STB_WEAK2Weak reference

Symbol type

#define STT_NOTYPE  0
#define STT_OBJECT  1
#define STT_FUNC    2
#define STT_SECTION 3
#define STT_FILE    4
#define STT_COMMON  5
#define STT_TLS     6
Constant namevaluemeaning
STT_NOTYPE0unknown type
STT_OBJECT1Data object type, such as variable, array
STT_FUNC2Function or executable code
STT_SECTION3The symbol is a segment
STT_FILE4The symbol is the file name
STT_COMMON5common data
STT_TLS6Thread local data

st_shndx

Segment of symbol

/* special section indexes */
#define SHN_UNDEF		0
#define SHN_LORESERVE	0xff00
#define SHN_LOPROC		0xff00
#define SHN_HIPROC		0xff1f
#define SHN_ABS			0xfff1
#define SHN_COMMON		0xfff2
#define SHN_HIRESERVE	0xffff
constantvaluemeaning
SHN_UNDEF0The symbol is not defined in this document, but in other target documents
SHN_LORESERVE0xff00The lower bound of the reserved index number range
SHN_LOPROC0xff00The lower limit of the index number range reserved for a specific processor custom section
SHN_HIPROC0xff1fThe upper limit of the index number range reserved for a processor specific custom section
SHN_ABS0xfff1Indicates that the symbol contains an absolute value. For example, the symbol representing the file name belongs to this type
SHN_COMMON0xfff2Indicates that the symbol is a "COMMON block" type symbol, which is the type of uninitialized global symbols
SHN_HIRESERVE0xffffThe upper limit of the index number range reserved

st_value

Each symbol has a corresponding value. If the symbol is the definition of a function or variable, the value of the symbol is the address of the function or variable. More accurately, it can be divided into the following cases:

  • In the target file, if the symbol is the definition of a function or variable and the segment of the symbol is not a "COMMON block" (that is, st_shndx is not SHN_COMMON), the value of st_value represents the offset of the symbol in the segment
  • In the target file, if the symbol is the definition of a function or variable and the segment of the symbol is a "COMMON block" (that is, st_shndx is SHN_COMMON), the value of st_value represents the alignment attribute of the symbol
  • In the executable, st_value represents the virtual address of the symbol

With simplesection O as an example, analyze the status of each symbol:

[gongruiyang@localhost ws]$ readelf -s SimpleSection.o
Symbol table '.symtab' contains 16 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS SimpleSection.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000000004     4 OBJECT  LOCAL  DEFAULT    3 static_var.1801
     7: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    4 static_var2.1802
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    8 
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
    11: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 global_init_var
    12: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM global_uninite_var
    13: 0000000000000000    34 FUNC    GLOBAL DEFAULT    1 func1
    14: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
    15: 0000000000000022    51 FUNC    GLOBAL DEFAULT    1 main
  • Output column Description:

The second column Value corresponds to st_value

The third column Size corresponds to st_size

The fourth column Type corresponds to the fifth column Bind_ info

The sixth column Vis is not used

The seventh column Ndx corresponds to st_shndx

The eighth column corresponds to st_name

  • Analysis of each symbol:
Symbol nameSegmentBinding informationSymbol type
main / func11 (.text)STB_GLOBAL (global symbol)STT_FUNC (function or executable code)
printfUND (other target documents)STB_GLOBAL (global symbol)STT_NOTYPE (undefined)
global_init_var3(.data)STB_GLOBAL (global symbol)STT_OBJECT (data object type: variable)
global_uninite_varCOM (COMMON block)STB_GLOBAL (global symbol)STT_COMMON (COMMON block)
static_var.18013(.data)STB_LOCAL (local symbol)STT_OBJECT (data object type: variable)
static_var2.18024(.bss)STB_LOCAL (local symbol)STT_OBJECT (data object type: variable)
SimpleSection.cABS (absolute value of file name)STB_LOCAL (local symbol)STT_FILE (file name)