Showing posts with label BinNavi. Show all posts
Showing posts with label BinNavi. Show all posts

Tuesday, February 19, 2008

badass debugger + badass toy = geek pr0n

Today I finally got working a hacked-together minimal version of the iPhone debugger client for BinNavi. It's heavily based on Patrick Walton's (with HD's updates) weasel debugger. Once tied to BinNavi debug client framework the whole client-server interaction is trivial.

It feels just right, the best looking debugger together with the slickest device.. recipe for fun.. ;-)



The test application is telnet on the iPhone. On the iPhone's screen is the debug output from BinNavi's debug client. telnet is launched from an ssh session in OSX, where BinNavi is running.



For anybody trying to link Mach's debugging interface with a C++ iPhone application, remember the extern "C" when defining boolean_t exc_server(mach_msg_header_t *in, mach_msg_header_t *out); (which is not defined in the header files, as pointed in weasel's source code). Otherwise you'll get a nasty "Undefined symbols" message when linking.

extern "C" is also needed for catch_exception_raise(...) so exc_server can call it to handle exceptions. Documented here.
(I've used the standard iPhone toolchain on Debian, this is running on the firmware 1.1.3)

Thursday, June 14, 2007

BinNavi: Simplifying code II. The implementation

Following up the ideas presented on a previous post, this one will discuss their implementation.

We wanted to write some code that would help us understand the meaning of certain operations encoded in assembly instructions. We wanted to do that by going through the instructions and building extended expressions representing the relations between the input and output values of a basic block.

In order to implement the desired functionality, we will write some code to replace parts of the operand trees with existing, known values, from the already analyzed code.
  • The code will iterate through the operand tree until it finds leafs (labeled as ’l’)
  • The leafs will then be substituted if their contents are known
  • The new, augmented, operand tree will be returned

The main part of this simple operand expression "builder" is to parse instructions and keep track of the values of the operands for the current instruction. If any values can be augmented by including the operations performed in previous instructions, those will be replaced by the result of such operations.
  • The function will take a dictionary (aka associative array) instance that will map the registers and memory addresses to their known contents
  • It will parse each instruction and update the dictionary according to the semantics of the opcode which in this example won't be totally accurate but will suffice to illustrate the process
  • It will return an updated dictionary with the results of tracking the values

This first function will retrieve an operand tree and return it as a nested list.
(The code presented in this post makes use of BinNavi's Python scripting to access operands, instructions, etc)


# Retrieve an operand as nested lists
#
def get_optree(op, parent=None):
    if not parent:
        return get_optree(op, op.get_roots()[0])[0]
    tree = []
    children = op.get_children(parent)
    for child in children:
        subtree = get_optree(op, child)
        if subtree:
            tree.append( [str(child), subtree] )
        else:
            tree.append( [’l’, str(child)] )
    return tree



And the result of running it on the following instructions will be:

004A4703: lea b4 ecx,b4 ss [ebp+4294967240]
004A4706: push b4 26
004A4708: sub b4 eax,b4 ecx
004A470A: pop b4 ecx
004A470B: add b4 eax,b4 13

[[’l’, ’ecx’],
    [’ss’, [[’[’, [
    [’+’, [[’l’, ’ebp’], [’l’, ’4294967240’]]]]]]]]
[[’l’, ’26’]]
[[’l’, ’eax’], [’l’, ’ecx’]]
[[’l’, ’ecx’]]
[[’l’, ’eax’], [’l’, ’13’]]



The following function is in charge of substituting any values in the tree for which their content is known. For example, if ecx is used as a source operand but in a previous instruction is was assigned a value of 10, we would substitute ecx in the tree with 10.


# Substitute the leafs in the operand tree
# with the last values obtained during the
# analysis
#
def substitute_leafs(tree, operand_trees):
    if not isinstance(tree, list):
        return tree
    if isinstance(tree[0], str) and tree[0]==’l’:
        return operand_trees.get(str(tree), tree)
    else:
        for idx, elm in enumerate(tree):
            tree[idx] = substitute_leafs(elm, operand_trees)
    return tree


The next function will be responsible for parsing the semantics of the instructions (in a merely illustrative way, it's far from implementing strictly the semantics of the Intel instructions used here). Instructions responsible for basic arithmetic and assignment operations will be processed and their results stored in the operand tree that we are processing.


# Parse instructions

def parse_instruction(insn, stack, operand_trees=dict()):
    # Process the MOV instruction
    if insn.mnem == 'mov':
        # Get the operands
        dst, src = map(get_optree, insn.operands)
        # Perform substitutions for any known values
        src = substitute_leafs(src, operand_trees)
        # Assing the source operand to the destination
        operand_trees[str(dst)] = src
    elif insn.mnem == 'add':
        dst, src = map(get_optree, insn.operands)
        dst_expr = operand_trees.get(str(dst), dst)
        src = substitute_leafs(src, operand_trees)
        src_expr = operand_trees.get(str(src), src)
        operand_trees[str(dst)] = ['+', [dst_expr, src_expr]]
    elif insn.mnem == 'sub':
        dst, src = map(get_optree, insn.operands)
        dst_expr = operand_trees.get(str(dst), dst)
        src = substitute_leafs(src, operand_trees)
        src_expr = operand_trees.get(str(src), src)
        operand_trees[str(dst)] = ['+', [dst_expr, ['-', [src_expr]]]]
    elif insn.mnem == 'lea':
        dst, src = map(get_optree, insn.operands)
        src = substitute_leafs(src, operand_trees)
        src_expr = operand_trees.get(str(src), src)
        operand_trees[str(dst)] = src[1][0][1][0]
    elif insn.mnem == 'push':
        dst, src = ['l', 'STACK_TOP'], get_optree(insn.operands[0])
        src = substitute_leafs(src, operand_trees)
        src_expr = operand_trees.get(str(src), src)
        stack.append(src)
    elif insn.mnem == 'pop':
        dst, src = get_optree(insn.operands[0]), ['l', 'STACK_TOP']
        src = stack.pop()
        operand_trees[str(dst)] = src
    elif insn.mnem == 'cdq':
        pass
    elif insn.mnem == 'idiv':
        src = get_optree(insn.operands[0])
        # dst is always eax
        dst = ['l', 'eax']
        src = substitute_leafs(src, operand_trees)
        src_expr = operand_trees.get(str(src), src)
        orig_src = operand_trees.get(str(dst), dst)
        operand_trees[str(dst)] = ['*',
            [orig_src, ['INV', [src_expr]]] ]
        operand_trees[str(['l', 'edx'])] = ['%',
            [orig_src, ['mod', [src_expr]]] ]
    return operand_trees


After running the previous functions on a simple basic block, we will obtain operand trees representing the operations performed by the instructions on their operands.



In the end we can ask the value of a register and we will see, in a single extended expression, the value of the register as a result of all operations taking place in the basic block.


>>> operand_trees["['l', 'ecx']"]
['l', '(0x1A, 26)']

>>> operand_trees["['l', 'al']"]
['ss',
  [['[',
    [['+',
      [['l', 'ebp'],
      ['%',
        [['+',
          [['+',
            [['l', 'eax'],
            ['-', [['+',
              [['l', 'ebp'],
              ['l', 'lowercase_chars']]]]]]],
          ['l', '13']]],
        ['mod', [['l', '(0x1A, 26)']]]]],
      ['l', 'lowercase_chars']]]]]]]




We can also graphically represent the result of the operand composition in BinNavi.
The following function will traverse the operand tree and add the nodes to the graph instance provided.


# Convert nested lists into a graph object
# that can be visualized
#
def tree_to_graph(tree, g=Graph(), parent=None):
    if isinstance(tree[1], list):
        node_str = tree[0]
        node = g.add_node(str(node_str))
        if parent:
            g.add_edge(parent, node)
        for child in tree[1]:
            tree_to_graph(child, g, node)
    else:
        leaf = g.add_node(str(tree[1]))
        if parent:
            g.add_edge(parent, leaf)
    return g



And running it on one of the composed operand trees results in:


tree_to_graph(operand_trees["['l', 'al']"], Graph()).show()





As a last step we can represent the expression textually and add it to the node.


# Convert nested lists into a single expression as
# a string
#

def tree_to_expression(tree):
    root = tree[0]
    children = tree[1]
    if isinstance(children, list):
        parsed_children = [tree_to_expression(child) for child in children]
    else:
        return children
    if root == '*':
        return '(' + '*'.join(parsed_children) + ')'
    elif root == '+':
        return '(' + '+'.join(parsed_children) + ')'
    elif root == '-':
        return '(-' + parsed_children[0] + ')'
    elif root == '[':
        return '[' + ' '.join(parsed_children) + ']'
    elif root == 'INV':
        return 'INV('+parsed_children[0]+')'
    elif root == 'mod':
        return parsed_children[0]
    elif root == '%':
        return '(' + parsed_children[0]+'%'+parsed_children[1] + ')'
    elif root == 'l':
        return children
    elif root == 'ss':
        return parsed_children[0]



And running the following in BinNavi's Python interpreter will generate the expression and add it to the basic block as a comment in BinNavi's view.


fgv = Graphview.get_open_views()[0]
expr = tree_to_expression(operand_trees["['l', 'al']"])
fgv.set_node_comment(0x4a4703L, expr)





The resulting expression represents the ROT13 operation for lowercase characters. (The code belongs to the Mydoom.D worm)
ebp+lowercase_chars points to a string containing the alphabet. eax is the pointer to the character to encode/decode within the alphabet, and its position is calculated by subtracting the beginning address of the alphabet from eax. Then 13 is added to it and wrapped around by getting the value modulo 26 and added again to the beginning of the alphabet to obtain the final decoded/encoded character.
This expression is a nice and condensed form of expressing the functionality implemented in the basic block.

This post illustrates the versatility of a properly abstracted representation of the operands and instructions and the level of automation that can be achieved in a relatively simple manner with the help of the interactive Python scripting in BinNavi.

Tuesday, May 15, 2007

BinNavi: Simplifying code

This in a slightly expanded example of what Halvar and I showed in the Sabre-Security trainings last October. With it I illustrate how to, in BinNavi, build a small dataflow analyzer in Python so that it can do some work for us.

Lets imagine that there's a basic block like the following, which is part of the ROT13 routine in the Mydoom worm.



Ideally we would like to have such tool what would spare us from having to figure out what the assembly code in that block does, regardless of how simple it might be in this case. Such functionality would make analysis much less tedious and less error prone.

Given that BinNavi offers an interactive Python interpreter with access to every single function, basic block, instruction, operand and expressions within an operand, this is a problem that we could try to address with some scripting.



The dataflow reconstruction


Assembly instructions move data around and modify it by performing different operations on it. Values are fetched from registers, memory or specified explicitly and the results are stored back in registers or memory.

So, if we would like to be able to transform the operations into something more readable we could track all the assignments and operations and compose some sort of combined expression. That's actually much easier than it might sound like.

Taking advantage of Python's dictionaries (aka hashes, aka associative arrays, aka maps) we could proceed as follows.

We process individually each instruction. As each has different semantics, we have to deal with them individually (some groups of similar instructions can be handled somehow generically tho.).

First the instruction might push or pop something from the stack and we could emulate that behaviour with a simple list.

Second the instruction might get data from a source register/memory/immediate and, upon operating on it, assign it to a destination operand. We will just consider these two cases for the time being, this is only an exercise after all ;)

In this case we could store in our dictionary the source item using as key the destination register or memory location. That would allow us to keep the current state of the registers and memory locations. For instance: lea eax, [edi+2]

The destination of the lea operation is eax the source/result is [edi+2]. Hence we could have an entry in our dictionary that maps eax to edi+2, there are no brackets here because lea will store the value of the addition, not its contents, so we don't need the brackets (which stand for memory-dereference)

If a register is accessed later on by another instruction and we know where the value comes from (because we stored the previous assignment operation in our dictionary) we could compose it together, having a combined expression that tracks all data. Following the example, imagine the instruction mov bl, [eax+esi], that will put into bl the value pointed to by [eax+esi], in our dictionary we would map bl to have the value [eax+esi]

But, if we would check if we have any of the expression parts already in our dictionary (remember that eax maps to edi+2) then we can expand [eax+esi] into [edi+2+esi]. One might be able to infer now that this procedure can be taken further, composing longer expressions that would reduce the amount and complexity of the assembly code to look at.

As another illustrative example, one could assign the the AL key in the dictionary the nested lists representing the expression tree for the memory reference as seen in the following figure.



As just seen, operands are treated as graphs by BinNavi. That allows for a versatile manipulation of their contents. We can transform the tree to nested lists and operate on those.


  • The form of the nested lists is

  • [root , [child_1 , child_2 , ...]]

  • Where each children has the same form


For instance, the following two operand trees



would take the following from as nested lists:

[b4, [ecx]]

[b4, [ss, ['[', [+, [ebp], [4294967240]]]]]

In short, in order to do basic reconstruction of the data flow the following operations need to be performed:

  • Keep track of all assigned values using a dictionary

  • Substituting leafs with values already in our dictionary

  • Perform substitutions and assign SRCs to DSTs hence updating the dictionary to the last state


In a following post, I'll show an implementation using BinNavi's embedded Python interpreter.

Tuesday, March 20, 2007

A PE trick, the Thread Local Storage

In our training, when discussing packer's tricks and the PE file format, Pedram and I talk about different ways of executing code in a PE executable before the entry point (the one pointed to by the AddressOfEntryPoint field of the Optional header) is given control by the Windows loader.

One possible way of achieving it is to use the TLS directory entry of the PE file format's headers. TLS stands for Thread Local Storage and it's meant to be used to allocate storage for thread-specific data. The TLS structure, IMAGE_TLS_DIRECTORY, pointed to by the TLS directory entry has a small number of fields. The one of special interest is the one pointing to a list of callbacks, AddressOfCallBacks.

During the class I got asked how one would implement the functionality required to get code to run, called from the TLS callbacks. I had a rough an idea how the implementation would go but had never tried implementing it myself. So before the training was over I started to look into it and finally, a few days later, got it to work.

So, in order to put this together the first thing I did was to dig up the definition of the IMAGE_TLS_DIRECTORY structure.


typedef struct _IMAGE_TLS_DIRECTORY {
    UINT32 StartAddressOfRawData;
    UINT32 EndAddressOfRawData;
    PUINT32 AddressOfIndex;
    PIMAGE_TLS_CALLBACK *AddressOfCallBacks;
    UINT32 SizeOfZeroFill;
    UINT32 Characteristics;
} IMAGE_TLS_DIRECTORY, *PIMAGE_TLS_DIRECTORY;



Then I started hacking it together with a hexeditor, editing a harmless test PE file in order to have a TLS directory entry that would point to my manually hex-crafted structure.

PE File, TLS construction A

I then added some placeholder code


90
90
90
C2 0C 00
nop
nop
nop
retn 0x0c



which would handle the stack as a TLS callback is expected to

typedef void (MODENTRY *PIMAGE_TLS_CALLBACK) ( PTR DllHandle, UINT32 Reason, PTR Reserved );


and pointed to it from the callback list I created. I then pointed the callback field, AddressOfCallBacks, in the TLS structure to my callback list.

PE File, TLS construction B

And everything should be fine according to the plans... but nope!! and this is where I was stuck for a few days. The best I could do was to get my TLS callback code run on program unload, but never before the entry point was given control. Puzzling...




I dug out a file with a working TLS. Ilfak wrote a while ago about TLS here and had a nice, small example. (And, by the way, IDA does read and handle TLS just fine and marks them as entry points as well. Very convenient!)

I was set to get mine working, so I started looking at what his was doing different from mine (besides him just being sane and not doing it with a hex-editor ;-) )

I took a look at all the PE headers but none of the differences seemed to have anything to do with my sample not working.

I started to grow slightly uncomfortable and decided to bring in the artillery. Taking a look at how the windows loader (residing in NTDLL.DLL) handles both files and seeing what's affecting my TLS callback not being called should help. So I brought up BinNavi and traced the execution path of both binaries being loaded by Windows.

First thing was to trace the execution of Ilfak's example, I wanted to see all functions visited in the windows loader as his executable was being loaded. The TLS callbacks would have to be called by one of these.

Ilfak's callgraph trace

I then recorded the execution path of my test executable and took a look at what functions were being visited in both traces. (All the nodes in the following graph are visited by the working example, the green ones are the ones visited by mine, so there's a lot of superfluous code I can skip looking at)

Ilfak's callgraph trace with mine

Eventually spotted a function called when processing both binaries, _LdrpRunInitializeRoutines, that looked like a good candidate to be the one calling the TLS callbacks and took a look at the execution traces within that specific function.

In the following graph each node represents a basic block, the red one is where the TLS callbacks are called from. That's the node reached in the working example but not in mine. The green nodes are all the ones visited in the case the execution flow reaches the red basic block. The darker ones are the execution trace of my test. Hence I need to figure out which conditions are diverting the execution flow and how they are related to things I could change in my test program.

Ilfak's code and my trace

Now I could see the common parts of the execution path and a couple of branches that were taken differently. Given the visual output, it's extremely easy to see what branches were different and I could now check what affected the flow.

The TLS callbacks were ran immediately after the following condition

The critical branch

Which, tracing it back, comes for an initial check at the beginning of the function

The initial condition

The following article helped me when I was trying to figure out what was going on. According to it, the function _LdrpClearLoad­InProgress returns the number of DLLs currently loaded. That's the value that gets assigned to the variable that gets compared to zero and makes the flow of my test program diverge from Ilfak's working example. Therefore TLS callbacks only get run when a given amount of DLLs have already been loaded and that was the reason my test didn't run on load... I only needed to add one mode DLL to the import table for it to work. Fortunately it was easy to spot with BinNavi.

Thank you to cailin for the proofreading.

Tuesday, February 06, 2007

BinNavi database format

In the latest version of BinNavi we moved to SQL for storage of the disassembly information.

We spent a fair amount of time thinking about how to best store the disassembly information in a way that would be as little architecture-dependent as possible and allow for fast querying while, at the same time, trying to not make it too hard to use directly through SQL.

We also wanted to be able to have a central repository of all disassembly data, doing away with the need of keeping local databases that easily get out of sync. A central repository has other advantages like allowing for different users to look at the same project.

The disassembly data is currently exported from IDA into the database via a exporter , ida2sql, written in Python and requiring IDAPython. This exporter is included with BinNavi and also made available separately. If you want to play with it just drops us a line at Sabre.

One can already perform interesting analysis by just using SQL queries directly, although we provide a lot of the functionality in a more convenient form through BinNavi's integrated Python interface.

The core set of tables looks like this:

BinNavi's database schema

The basic table layout is:

  • modules, which holds the information about all the disassemblies in the database, the name of the file, the date it was imported and a comment field

For each of the following the module id will be appended to the table name.

  • functions, containing all the functions in the disassembly and specifying their address name and type

  • basic_blocks, all the basic blocks in the disassembly and their parent function

  • instructions, the instructions making the basic blocks. Contains the data making up the instruction together with its address, mnemonic and parent basic block

  • callgraph relates all callers and callees

  • control_flow_graph expresses all the links between basic blocks

  • operand_strings contains the operand strings as shown by IDA

  • expression_tree represents all expressions composing the operands as a tree

  • operand_tuples maps addresses to the operands used by the instruction at such location

  • expression_substitutions allows to replace any part of the expression tree of an operand with a string, variable names are handled through this table

  • operand_expressions relates the operands to the expressions composing them

  • address_references contains all references, both to code and data labeled with their type

  • sections holds the raw data for the section composing the binary source for the disassembly


Currently only MySQL has been thoroughly tested. Some of the advantages the schema brings are:
  • Supports multiple modules in a single database

  • Instruction operands are stored as trees, which enables it to support a variety of architectures, efficient storage and advanced querying

  • Use of SQL statements to perform advanced analysis and data-mining

  • The exporter module from IDA to the SQL schema currently supports METAPC(x86), PPC and ARM exporting (the latter two are still in beta). More will be added in the future.

Some query examples



Listing the modules in the database:
SELECT * FROM modules

Counting the number of functions:
SELECT count(address) FROM functions

Counting the number of basic blocks (blocks shared by functions will be counted multiple times):
SELECT count(address) FROM basic_blocks

and without counting the shared ones:
SELECT count(DISTINCT(address)) FROM basic_blocks;

How about a histogram showing the of mnemonic distribution?
SELECT mnemonic, COUNT(mnemonic) as mnem_count FROM instructions GROUP BY mnemonic ORDER BY mnem_count;


.
.
or, 275
inc, 361
movzx, 377
leave, 392
sub, 403
stos, 429
.
.

or something a bit more elaborate like getting all addresses in a disassembly where a specific register is used:

SELECT
    HEX(instructions.address), mnemonic
FROM
    instructions
JOIN
    (operand_tuples, operand_expressions, expression_tree)
ON
    instructions.address = operand_tuples.address AND
    operand_tuples.operand_id = operand_expressions.operand_id AND
    expression_tree.id = operand_expressions.expr_id
WHERE
    symbol='ebp';



.
.
71CF14D3, mov
71CF14D6, mov
71CF14E9, pop
71CF15E1, push
71CF15E2, mov
71CF15E6, mov
71CF1617, pop
.
.

for queries like the last one or more complex ones it's usually a good idea to move to the embedded Python interpreter in BinNavi before getting lost in SQL...

Saturday, January 20, 2007

BinNavi's basic block handling

A while back I talked about the problem of highly optimized code and the resulting problems when we want to store it in a database while allowing for all possible constructs.
In this post I'll show how the recently released BinNavi 1.2 handles some cases where code is shared among functions and basic blocks exhibit non-typical characteristics.

Note: I could not get the Microsoft or Intel compilers to produce compiled code with functions sharing basic blocks as an example for this post. Using PGO (Profiling Guided Optimization) and other optimizations proved fun but the furthest I could go was producing multi-chunked functions. I might need to play more with it in order to get chunks to be shared... Anyway, I just picked a Microsoft DLL for the example. Specifically wkssvc.dll.

In this DLL there are two functions sharing some blocks on their exit paths. The shared code is shown in green in this graphic exported from BinNavi.





Interestingly enough, those blocks share exactly the same code but one function has six of them while the other has seven. I previously commented on the issue here and happends because of a reference in one of the functions that targets the shared code and causes a block to split.

Here one can see the shared code grouped and highlighted. Click on the image for a larger view.



Now a zoomed version of the block on the left:



And the one on the right:



Within the database, BinNavi handles this by allowing instructions to belong to multiple basic blocks, as well as basic blocks to belong to multiple functions. And it just works...