angr's Reaching Definition Analysis

Posted by cyronuts on Sun, 26 Dec 2021 22:25:12 +0100

preface

How to write a function handler to simulate the impact of a function on the analysis state?

1, Background

At a high level, we can use static analysis to collect data flow facts about program variables without executing them. To do this, this analysis interprets the effect of program statements on the states it tracks in order to some extent

What if such a statement is a function call?

  • You can continue to analyze the statement of the objective function, and then jump to its original position when the function returns.

What if this function is an external function? For example, provided by dynamic link library?

  • In this case, the statements that make up the contents of the objective function (its implementation in the binary file) cannot be directly used for analysis
  • It should be noted that we don't really want to analyze an external library as part of the process: we want to focus on the binaries at hand and prefer to avoid spending resources (computing time and memory) to track what happens "externally"

But most of the time, we "know" what library functions do. Here are some examples of libc functions we "know":

  • printf: use several parameters to form a string and write it to stdout;
  • malloc: allocate a memory block whose size is determined by the first parameter and return a pointer to it;
  • strcpy: copy the contents of the second parameter to the memory area pointed to by the first parameter.

From the perspective of program and analysis, these functions are black boxes: their implementation details are hidden. However, we only care about the impact of these functions on the system state when the program is running; From an analytical point of view, their impact on the performance of this state.

In the first case (local function), the function handler should drive the analysis of the called function and fully return; In the second case (external function), the function handler should update the analysis state based on the "known" function behavior.

2, Usage and Description

We use angr's reaching definitions and alysis to implement our analysis. As described in the documentation, it accepts an optional function_handler parameter. function_handler needs to inherit from FunctionHandler, as you can see in the documentation of FunctionHandler, which means that the given function_ The handler must have the following methods:

  • hook: a way for a handler to refer to analysis and access information about its context (architecture, facts collected in the knowledge base, etc.). In particular, reaching definitions and alysis calls it during initialization;

  • handle_local_function: when a call to a local function is encountered, the analysis will run.
    Then, in order for reachingdefinitions and alysis to handle printf, malloc, or strcpy, we will add a corresponding method to the concrete class inherited from FunctionHandler: handle_printf,handle_malloc and handle_strcpy.

  • For example, such a concrete class MyHandlers will generate exposed handles_ An instance of printf. When a call to printf is encountered in the binary file, the instance will be called during analysis (for calls to malloc and strcpy, handle_malloc and handle_strcpy, respectively)

In short:

  • The function handler is a (Python) method that is called by the analysis when a call instruction is encountered.
  • FunctionHandler is an ABC class, which describes that a specific class (such as MyHandlers) must work with angr
  • function_handler is the parameter name that passes reachingdefinitionsanalysis; It is a kind of MyHandlers, so it is a kind of FunctionHandler.

3, Example

Binary program for analysis

We will analyze the command_line_injection.c generated binary file.

git clone git@github.com:Pamplemousse/bits_of_static_binary_analysis.git
cd bits_of_static_binary_analysis
make

If everything is OK, run/ build/command_line_injection ~ / should list your home directory

The simplest analysis

The simplest analysis starting with the function main is as follows:

from angr import Project

project = Project('./build/command_line_injection', auto_load_libs=False)
cfg = project.analyses.CFGFast(normalize=True, data_references=True)

main_function = project.kb.functions.function(name='main')
program_rda = project.analyses.ReachingDefinitions(
    subject=main_function,
)

# Do domething with `program_rda`
...

However, analysis is procedural: it only runs on the function main. Happily, when executing Python analysis Py, angr will warn us: "Please implement the local function handler with your own logic." So we know that it encountered a call to a local function

Processing local functions

We can improve analysis Py to provide the necessary handle for reaching definitions and alysis_ local_ Function, which will be triggered when analyzing main and precisely when checking the instruction call

from angr import Project
from angr.analyses.reaching_definitions.function_handler import FunctionHandler


class MyHandler(FunctionHandler):
    def __init__(self):
        self._analysis = None

    def hook(self, rda):
        self._analysis = rda
        return self

    def handle_local_function(self, state, function_address, call_stack, maximum_local_call_depth, visited_blocks,
                              dependency_graph, src_ins_addr=None, codeloc=None):
        function = self._analysis.project.kb.functions.function(function_address)

        # Break point so you can play around with what you have access to here.
        import ipdb; ipdb.set_trace()
        pass

        return True, state, visited_blocks, dependency_graph

project = Project('./build/command_line_injection', auto_load_libs=False)
cfg = project.analyses.CFGFast(normalize=True, data_references=True)

handler = MyHandler()

main_function = project.kb.functions.function(name='main')
program_rda = project.analyses.ReachingDefinitions(
    function_handler=handler,
    observe_all=True,
    subject=main_function
)

# Do domething with `program_rda`
...

Run Python analysis Py, due to handle_local_function, we now have a shell. From here on, I invite you to investigate and try what you can do; Remember: you can use self_ Analysis accesses many facts collected by angr. Whether the item is arch and kb et al.

Handling external functions

As mentioned earlier, handlers can also be triggered when calling library functions and used to model the effects of code that cannot be analyzed directly. In our example, we can see that the function check calls the libc function sprintf.
Here is a new analysis Py, which shows how to let the analysis consider this call; A richer MyHandler, including a handle_sprintf method.

from angr import Project
from angr.analyses.reaching_definitions.function_handler import FunctionHandler


class MyHandler(FunctionHandler):
    def __init__(self):
        self._analysis = None

    def hook(self, rda):
        self._analysis = rda
        return self

    def handle_local_function(self, state, function_address, call_stack, maximum_local_call_depth, visited_blocks,
                              dependency_graph, src_ins_addr=None, codeloc=None):
        function = self._analysis.project.kb.functions.function(function_address)
        return True, state, visited_blocks, dependency_graph

    def handle_sprintf(self, state, codeloc):
        # Break point so you can play around with what you have access to here.
        import ipdb; ipdb.set_trace()
        pass

        return True, state

project = Project('./build/command_line_injection', auto_load_libs=False)
cfg = project.analyses.CFGFast(normalize=True, data_references=True)

handler = MyHandler()

sprintf_plt_stub = project.kb.functions.function(name='sprintf', plt=True)
program_rda = project.analyses.ReachingDefinitions(
    function_handler=handler,
    observe_all=True,
    subject=sprintf_plt_stub
)

# Do domething with `program_rda`
...

Note that for the simplicity of the example, the analysis starts with the sprintf PLT stub rebuilt by angr. If not, this example will first encounter handle_local_function, because check has a calling instruction pointing to the PLT location, which is not an external address! In other words, to handle external functions called using the PLT mechanism, you need to start a reachingdefinitions analysis on the target PLT stub with the appropriate handler.
Ideally, we want to start with a function check and expect a handle_sprintf is called at some point: in particular, the analysis should use handle_local_function points the analysis to the PLT stub, which should eventually trigger the handle_sprintf.
Coincidentally, this is a special case of a more general problem: how to perform interprocedural analysis?

4, Function interprocedural analysis

In real-world programs, it is unlikely that all answers to analyst questions remain shallow. Most of the time, we want to start with the entry point of the binary file and expect it to continue between function calls until we get the information we're looking for. In our example, this means starting reachingdefinitionsanalysis on the main function and expecting it to analyze check and call handle at the same time_ sprintf.
Because we want an analysis to run multiple functions, we need an interprocedural analysis. Unfortunately, this has not yet been implemented in the angr main library!
CSE545 video From a "high-level" point of view, this paper introduces how to transform the reaching definitions and alysis of angr into inter process analysis.
The idea of running it recursively is:

  • Each time a call to a local function is encountered, a "child" reachingdefinitions analysis starts the objective function. Once completed, the analysis country returns to the parent "copy" after the war and continues for it (in the call instruction).

Its implementation depends on the function handler. In particular, handle_local_function is where recursion occurs:

  • It starts the child function reachingdefinitions and alysis on the objective function with appropriate parameters (pass the current kb, initialize the parent state of the child function with the init_state parameter, and forward the function_handler);
  • When the child process returns, it updates the observed of the parent process_ Results so that the parent process knows what is captured during the child process;
  • It returns the status that the parent process continues to run (including the current live_definitions) and other structures of analysis records (visited_blocks, dep_graph).

Some functions can have several exits (for example, there are multiple return statements in the source file), so from an analytical point of view, there are several output states! In this case, handle_local_function must combine these states to create a unique state from which the parent analysis can recover.

5, Conclusion

Function handlers are a convenient tool for static analysis using realizing definition analysis: they can apply the effects of external functions to states without accessing their implementation
By applying the same principles to local functions, they even take us further: interprocedural analysis is just a customization of the analytical behavior of calling instructions (recursion, state management and internal bookkeeping).

  • Other variables that may be affected by user input can be found by "polluting" a variable that obtains a value from user input and propagating this pollution during use. Contamination variables used for sensitive operations (execution or system parameters, camouflage of fixed size buffers, etc.) indicate potential security vulnerabilities
  • If you want more details on how analysis works and more specific examples of such analysis, it is strongly recommended that you take a look The presentation mentioned above
  • For those who are interested in the underlying mechanism of angr, the instance method of handler is angr/analyses/reaching_definitions/ engine_vex.net .py Invoked in.

Topics: software testing