Find the stack overflow vulnerability of strcpy with angr

Posted by phphunger on Tue, 25 Jan 2022 18:27:24 +0100

This article is actually a small example of vulnerability digging given in the angr document.

The original text is linked here: https://docs.angr.io/examples#beginner-vulnerability-discovery-example-strcpy_find

This is the first tutorial to find available conditions in binary with angr. The first example is a very simple program. The angr script will find a path from the entry of the program to strcpy. (only when we can control the parameters of strcpy)

In order to find the correct path, angr needs to calculate a password parameter. To calculate the password, angr only takes 2s. The script may look a lot, but that's because there are a lot of comments.

Binary program link: https://github.com/angr/angr-doc/tree/master/examples/strcpy_find/strcpy_test

Script link: https://github.com/angr/angr-doc/tree/master/examples/strcpy_find/solve.py

First, analyze the target binary with ida. Decompile the main function. You can see that if the parameter is less than 2, it will enter func3 function. Otherwise, compare the password. If the password is correct, execute func function.

Take another look at the func3 function, which is similar to the function of printing help information.

Then look at the func function. You will find that there is a strcpy function, and the source operand of strcpy is controlled by a function parameter a1. In the main function, a1 is argv[2], which is the input that the user can control. If you enter a long string, a stack overflow occurs.

The next angr script to analyze is to find the input that can trigger this vulnerability. Run the script to know the value of password.

Then, run the target binary and use the password obtained from the previous script as input, and you can find that there is a segment error.

Next, we will analyze how the script finds the input that can be executed to the vulnerability. See GitHub of angr doc for the complete script: https://github.com/angr/angr-doc/tree/master/examples/strcpy_find/solve.py

The first is a helper function, which is used to find the address of the function from the symbol table.

def getFuncAddress( funcName, plt=None ):
        found = [
            addr for addr,func in cfg.kb.functions.items()
            if funcName == func.name and (plt is None or func.is_plt == plt)
            ]
        if len( found ) > 0:
            print("Found "+funcName+"'s address at "+hex(found[0])+"!")
            return found[0]
        else:
            raise Exception("No address found for function : "+funcName)

Get a byte of data. It seems that this script does not use this function. It can be ignored.

def get_byte(s, i):
        pos = s.size() // 8 - 1 - i
        return s[pos * 8 + 7 : pos * 8]

Loading binary, no need to load additional libraries

project = angr.Project("strcpy_test", load_options={'auto_load_libs':False})

Construct the CFG graph, and then we can get the function address from the symbol table. Set fail_ The fast option is True to minimize the time required for this process.

cfg = project.analyses.CFG(fail_fast=True)

Get the address of strcpy and bad functions

addrStrcpy = getFuncAddress('strcpy', plt=True)
addrBadFunc = getFuncAddress('func3')

Create a list of command line parameters and add the program name to the list

argv = [project.filename]   #argv[0]

Add a symbolic variable to the password buffer for subsequent solution

sym_arg_size = 40   #max number of bytes we'll try to solve for

Use 8*sym_arg_size is used as the size parameter of the symbolic variable because it is bit, not bytes.

sym_arg = claripy.BVS('sym_arg', 8*sym_arg_size)
argv.append(sym_arg)    #argv[1]

Add the information buffer. When the subsequent execution reaches the strcpy vulnerability point, you can use this value to confirm whether it has reached the vulnerability point.

argv.append("HAHAHAHA") # argv[2]

Initializes the entry of the program_ state

state = project.factory.entry_state(args=argv)

Create a simulation manager based on entry state

sm = project.factory.simulation_manager(state)

When we can control the src buffer, we want to find a path to strcpy, so we need a check function to take the path as a parameter.

You may want to know what we should do to let angr find the path of our target address. Because we replaced the find = parameter with the check function.

Check p.state before checking that other conditions are met ip. Args [0] (current instruction pointer), make sure we are on the target path we want.

Logic of check function:

Check whether the instruction pointer is at the address of strcpy. If so, continue
The rsi value is loaded from memory, and the BV object is loaded, so you need to use solver's eval to convert it into python string (byte type).
character string. encode() can change a string into a byte type.
If the source parameter of strcpy is the same as our argv, it indicates that the vulnerability is real and returns True.

 def check(state):
        if (state.ip.args[0] == addrStrcpy):    # Ensure that we're at strcpy
            '''
             By looking at the disassembly, I've found that the pointer to the
             source buffer given to strcpy() is kept in RSI.  Here, we dereference
             the pointer in RSI and grab 8 bytes (len("HAHAHAHA")) from that buffer.
            '''
            BV_strCpySrc = state.memory.load( state.regs.rsi, len(argv[2]) )
            '''
             Now that we have the contents of the source buffer in the form of a bit
             vector, we grab its string representation using the current state's
             solver engine's function "eval" with cast_to set to str so we get a python string.
            '''
            strCpySrc = state.solver.eval( BV_strCpySrc , cast_to=bytes )
            '''
             Now we simply return True (found path) if we've found a path to strcpy
             where we control the source buffer, or False (keep looking for paths) if we
             don't control the source buffer
            '''
            return True if argv[2].encode() in strCpySrc else False
        else:
            '''
             If we aren't in the strcpy function, we need to tell angr to keep looking
             for new paths.
            '''
            return False

Use the explore interface to find the path satisfying the check function. If you specify a tuple / list / set for find or avoid, it will translate it into an address to find/avoid. If it is given to a function, it will pass the state to the function, and then see whether the function returns True or False. Like the check function above.

Here, we tell explore r to find the path satisfying the check method and avoid any path ending with addrBadfunc.

sm = sm.explore(find=check, avoid=(addrBadFunc,))

found = sm.found

Extract the specific value of password from the found path. If you put this password in the first parameter of the program, you should be able to strcpy any string into the target buffer. If the string is too large, it can lead to segment errors.

if len(found) > 0:    #   Make sure we found a path before giving the solution
        found = sm.found[0]
        result = found.solver.eval(argv[1], cast_to=bytes)
        try:
            result = result[:result.index(b'\0')]
        except ValueError:
            pass
    else:   # Aww somehow we didn't find a path.  Time to work on that check() function!
        result = "Couldn't find any paths which satisfied our conditions."
    return result

Then the test function.

def test():
    output = main()
    target = b"Totally not the password..."
    assert output[:len(target)] == target

The main function outputs password

if __name__ == "__main__":
    print('The password is "%s"' % main())

Finally, summarize the process of this angr script. First, import the binary and build the project object. Generate CFG from project object. Then use the function manager in CFG to find the address of strcpy. Determine the end point of symbol execution exploration and the state to be avoided.

On the other hand, the parameters of the required solution are symbolized, and the symbolic parameters are used to initialize the entry_state. Then use entry_state instantiates a simulation manager object sm.

After the simulation manager object has the start state, end state and avoid, you can explore the next state. Every time you reach a state, you should go through the check function to check whether the state has reached the end point.

Finally, if SM If found, the symbolic variable is solved. The calculated value is the input that can reach the vulnerability point.

Programmer Think

Find the stack overflow vulnerability of strcpy with angr

Hot Topics