Get iOS arbitrary threaded call stack (4) Symbolized Reality

Posted by jamess on Wed, 26 Jun 2019 18:14:00 +0200

From: http://blog.csdn.net/jasonblog/article/details/49909209

1. Related API s and data structures

Since we get a set of addresses from the backtrace thread call stack above, the input and output symbolized here should be addresses and symbols respectively, and the interface design is similar to the following:

- (NSString *)symbolicateAddress:(uintptr_t)addr;

But in practice, we need to rely on dyld-related methods and data structure:

/*
 * Structure filled in by dladdr().
 */
typedef struct dl_info {
        const char      *dli_fname;     /* Pathname of shared object */
        void            *dli_fbase;     /* Base address of shared object */
        const char      *dli_sname;     /* Name of nearest symbol */
        void            *dli_saddr;     /* Address of nearest symbol */
} Dl_info;

_dyld_image_count() returns the current number of images mapped in by dyld. Note that using this count to iterate all images is not thread safe, because another thread may be adding or removing images dur-ing duringing the iteration.     
_dyld_get_image_header() returns a pointer to the mach header of the image indexed by image_index.  If image_index is out of range, NULL is returned.     
_dyld_get_image_vmaddr_slide() returns the virtural memory address slide amount of the image indexed by image_index. If image_index is out of range zero is returned.     
_dyld_get_image_name() returns the name of the image indexed by image_index. The C-string continues to be owned by dyld and should not deleted.  If image_index is out of range NULL is returned.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

In order to determine whether this resolution was successful, the interface design evolved into:

bool jdy_symbolicateAddress(const uintptr_t addr, Dl_info *info)

Dl_info is used to populate the parsed results.

2. Algorithmic ideas

Symbolizing an address is also straightforward, finding the memory mirror to which the address belongs, locating the symbol table in the mirror, and matching the symbol of the target address from the symbol table.

(Pictures from Apple's official documents)

The following ideas describe a general direction and do not cover specific details, such as ASLR-based offsets:

//ASLR-based offset https://en.wikipedia.org/wiki/Address_space_layout_randomization
/**
 * When the dynamic linker loads an image, 
 * the image must be mapped into the virtual address space of the process at an unoccupied address.
 * The dynamic linker accomplishes this by adding a value "the virtual memory slide amount" to the base address of the image.
*/
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

2.1 Find a target image containing addresses

By traversing each paragraph, determine if the target address falls within the scope of the paragraph:

/*
 * The segment load command indicates that a part of this file is to be
 * mapped into the task's address space.  The size of this segment in memory,
 * vmsize, maybe equal to or larger than the amount to map from this file,
 * filesize.  The file is mapped starting at fileoff to the beginning of
 * the segment in memory, vmaddr.  The rest of the memory of the segment,
 * if any, is allocated zero fill on demand.  The segment's maximum virtual
 * memory protection and initial virtual memory protection are specified
 * by the maxprot and initprot fields.  If the segment has sections then the
 * section structures directly follow the segment command and their size is
 * reflected in cmdsize.
 */
struct segment_command { /* for 32-bit architectures */
    uint32_t    cmd;        /* LC_SEGMENT */
    uint32_t    cmdsize;    /* includes sizeof section structs */
    char        segname[16];    /* segment name */
    uint32_t    vmaddr;     /* memory address of this segment */
    uint32_t    vmsize;     /* memory size of this segment */
    uint32_t    fileoff;    /* file offset of this segment */
    uint32_t    filesize;   /* amount to map from the file */
    vm_prot_t   maxprot;    /* maximum VM protection */
    vm_prot_t   initprot;   /* initial VM protection */
    uint32_t    nsects;     /* number of sections in segment */
    uint32_t    flags;      /* flags */
};


/**
 * @brief Determine if a segment_command contains addr, based on the virtual address of the segment and the segment size
 */
bool jdy_segmentContainsAddress(const struct load_command *cmdPtr, const uintptr_t addr) {
    if (cmdPtr->cmd == LC_SEGMENT) {
        struct segment_command *segPtr = (struct segment_command *)cmdPtr;
        if (addr >= segPtr->vmaddr && addr < (segPtr->vmaddr + segPtr->vmsize)) {
            return true;
        }    }}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36

This way, we can find the mirror file that contains the destination address.

2.2 Symbol Table for Positioning Target Mirror

Since the collection of symbols and the creation of symbol tables run through the compilation and linking phases, this does not expand here, but simply confirms that there is a _LINKEDIT segment containing symbol tables in addition to the code snippet_TEXT and data snippet DATA:

The __LINKEDIT segment contains raw data used by the dynamic linker, such as symbol, string, and relocation table entries.

So now we need to navigate to the u LINKEDIT section, which is also taken from Apple's official documents:

Segments and sections are normally accessed by name. Segments, by convention, are named using all uppercase letters preceded by two underscores (for example, _TEXT); sections should be named using all lowercase letters preceded by two underscores (for example, _text). This naming convention is standard, although not required for the tools to operate correctly.

By traversing each segment, we compare whether the segment name is the same as u LINKEDIT:

usr/include/mach-o/loader.h
#define SEG_LINKEDIT    "__LINKEDIT"
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3

Next, look for the symbol table:

/**
 * From The Mac Hacker's Handbook:
 * The LC_SYMTAB load command describes where to find the string and symbol tables within the __LINKEDIT segment.  * The offsets given are file offsets, so you subtract the file offset of the __LINKEDIT segment to obtain the virtual memory offset of the string and symbol tables.  * Adding the virtual memory offset to the virtual-memory address where the __LINKEDIT segment is loaded will give you the in-memory location of the string and sym- bol tables.
 */
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

That is, we need to combine u LINKEDIT segment_command (see structure description above) with LC_SYMTAB load_command (see structure description below) to locate the symbol table:

/*
 * The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
 * "stab" style symbol table information as described in the header files
 * <nlist.h> and <stab.h>.
 */
struct symtab_command {
    uint32_t    cmd;        /* LC_SYMTAB */
    uint32_t    cmdsize;    /* sizeof(struct symtab_command) */
    uint32_t    symoff;     /* symbol table offset */
    uint32_t    nsyms;      /* number of symbol table entries */
    uint32_t    stroff;     /* string table offset */
    uint32_t    strsize;    /* string table size in bytes */
};
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

As described above, offsets in LC_SYMTAB and _LINKEDIT are file offsets, so to get the addresses of in-memory symbol and string tables, we first subtract the fileoff of LINKEDIT from the symoff and stroff of LC_SYMTAB to get the virtual address offset, and then add the vmoffset of _LINKEDIT to get the virtual address.Of course, to get the final actual memory address, you also need to add an offset based on ASLR.

2.3 Find symbols in the symbol table that best match the target address

Finally, I found the symbol table. I am a little tired to write here, so I paste the code directly:

/**
 * @brief Match the most appropriate symbol for the address in the specified symbol table, where the address needs to be subtracted from vmaddr_slide
 */
const JDY_SymbolTableEntry *jdy_findBestMatchSymbolForAddress(uintptr_t addr,
                                                              JDY_SymbolTableEntry *symbolTable,
                                                              uint32_t nsyms) {
    // 1. addr >= symbol.value; because addr is an instruction address in a function, it should be greater than or equal to the entry address of the function, that is, the value of the corresponding symbol;
    // 2. symbol.value is nearest to addr; the function entry address closer to the instruction address addr is the more accurate match;

    const JDY_SymbolTableEntry *nearestSymbol = NULL;
    uintptr_t currentDistance = UINT32_MAX;

    for (uint32_t symIndex = 0; symIndex < nsyms; symIndex++) {
        uintptr_t symbolValue = symbolTable[symIndex].n_value;
        if (symbolValue > 0) {
            uintptr_t symbolDistance = addr - symbolValue;
            if (symbolValue <= addr && symbolDistance <= currentDistance) {
                currentDistance = symbolDistance;
                nearestSymbol = symbolTable + symIndex;
            }
        }
    }

    return nearestSymbol;
}

/*
 * This is the symbol table entry structure for 64-bit architectures.
 */
struct nlist_64 {
    union {
        uint32_t  n_strx; /* index into the string table */
    } n_un;
    uint8_t n_type;        /* type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see <mach-o/stab.h> */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40

Once we have found a matching nlist structure, we can use.n_un.n_strx to locate the corresponding symbol names in the string table.

Topics: Linker REST Mac