[Quickpost] [IDAPython] Locating libc in an unknown firmware without string references.

Very often, you find yourself reversing a completely unknown firmware from some memory dump, and know very little about it.  Probably, the processor architecture, the kind of work it makes, etc.

Generally, you can search for patterns (like the opcodes from the function prologue) to try to define the first functions , look for strings that could add some extra info, look for headers giving us an idea of how the firmware is structured and of course, try to identify the libc itself and its location.

This last two points are, in my opinion, the most important ones.

Often, we have to go without all this important information. Maybe we don’t have any strings. Or we have it but there are no code references to it so we can’t link them to the code. Maybe, we can’t reproduce the in-memory layout of the firmware and its structure.

Well, this is the exact situation that made me think on developing this script.

The basic idea is pretty simple. As generally, all the libc functions are put together, I will try to obtain the location of the libc by locating memcpy (or memset, or memcmp). To accomplish this, I will analyze the function graph of all the defined functions and will do a few assumptions:

1 – In most cases, the memcpy, memset and memcmp function graph contain a single basic block loop.
(I know, this is a hell of an assumption).
2 – This functions, are called very, very often, so they should have a lot of references to it.

So, if we enumerate all the functions containing a single basic block loop and then we sort it in a descending order, from the most referenced to the less referenced. It’s very probable that we will end up with any of this three functions (memcpy, most likely) between the first 5 listed.

Obviously, this can vary a lot between different firmwares and it’s very susceptible to false positives but in my experience, the technique works very well and, at the end, it’s really easy to manually validate the obtained results.


from idaapi import *

###############################################

def _branches_from (ea):
    if is_call_insn(ea):
        return []
    xrefsgen = CodeRefsFrom(ea, 1)
    xrefs=[]
    for xref in xrefsgen:
      xrefs.append(xref)
    if len(xrefs) == 1 and xrefs[0] == NextNotTail(ea):
        xrefs = []
    return xrefs

###############################################

def _branches_from_noflow (ea):
    if is_call_insn(ea):
        return []
    xrefsgen = CodeRefsFrom(ea, 0)
    xrefs=[]
    for xref in xrefsgen:
      xrefs.append(xref)
    if len(xrefs) == 1 and xrefs[0] == NextNotTail(ea):
        xrefs = []
    return xrefs

###############################################

def _branches_to (ea):
    xrefs        = []
    prev_ea      = PrevNotTail(ea)
    prev_code_ea = prev_ea
    while not isCode(GetFlags(prev_code_ea)):
        prev_code_ea = PrevNotTail(prev_code_ea)
    for xref in CodeRefsTo(ea, 1):
        if not is_call_insn(xref) and xref not in [prev_ea, prev_code_ea]:
            xrefs.append(xref)
    return xrefs

########################################################################################

def get_bb_from_function(function):
    """
    Returns a list of the basic blocks contained in the function.
    """

    #print "[+] Enumerating basic blocks of %08x" % function
    basic_blocks_from = []
    basic_blocks_to = []
    function_end   = FindFuncEnd(function)

#    print "Function start: %08x" % function
#    print "Function end:   %08x" % function_end

    basic_blocks_from.append(function)

    # Walk over all the address looking for references *to* Code
    for address in range(function, function_end):
        refsfrom = CodeRefsFrom(address, 1)
        refsfrom = list(refsfrom)

        # We just care for the ones pointing to TWO places (Next EIP + Other Address)
        # And these will be pointing within our function
        if len(refsfrom) > 1:
            if  (function < refsfrom[0] < function_end) and (function < refsfrom[1] < function_end):
                basic_blocks_from.append(refsfrom[0])
                basic_blocks_from.append(refsfrom[1])

     # Walk over all the address looking for referenced addresses
    for address in range(function, function_end):
        # If this address was already appended, keep going
        if address in basic_blocks_from:
            continue
        refsto = CodeRefsTo(address, 1)
        refsto = list(refsto)

        if len(refsto) > 1:
            if  function < refsto[0] < function_end:
                basic_blocks_to.append(address)

    return basic_blocks_from + basic_blocks_to

def get_bb_limits(bb_addr):
        eip = bb_addr

        while True:
            eip=NextHead(eip,MaxEA())
            if eip==0xffffffff:
                end_addr = eip
                break

            refs_from=_branches_from(eip)
            refs_to=_branches_to(eip)

            if len(refs_from) or len(refs_to) or is_ret_insn(eip) or (eip == MaxEA()):
              end_addr = eip
              break

        return bb_addr, end_addr

def get_all_bb(ea):
    bb_init_list = get_bb_from_function(ea)
    bb_list = []

    for bb_addr in bb_init_list:
        bb_init, bb_end = get_bb_limits(bb_addr)
        bb_list.append((bb_init, bb_end))

    return bb_list

def find_loops(bb_limits):
    try:
        if _branches_from_noflow(bb_limits[1])[0] == bb_limits[0]:
            print "[+] There is a loop at basic block = %08x" % bb_limits[0]
            return True
    except:
        return False

def has_loops(funct_ea):

    bb_list = get_all_bb(funct_ea)
    for x in bb_list:
        #print x
        if find_loops(x):
            return True

    return False

def get_num_of_callers(function_ea):

        f_name = GetFunctionName(function_ea)
        # save the callers name and size
        # return len(list(CodeRefsTo(function_ea, 0))) # Only code refs
        # return len(list(CodeRefsTo(function_ea, 0))) + len(list(DataRefsTo(function_ea))) # Code refs + Data refs
        return len(list(CodeRefsTo(function_ea, 0))) + len(list(DataRefsTo(function_ea))) + len(list(DataRefsTo(function_ea+1))) # Code refs + Data refs + (Data refs+1)ARM thingy

#####################################################################################################
#####################################################################################################

def run():
    DEBUG = 0

    ea = ScreenEA()
    functs_with_loops = []

    # Loop through all the functions
    for function_ea in Functions(SegStart(ea), SegEnd(ea)):
        if DEBUG: print "DEBUG: Finding loops"
        if has_loops(function_ea):
            if DEBUG: print "DEBUG: Appending functions with loops"
            functs_with_loops.append([function_ea, get_num_of_callers(function_ea)])

    if DEBUG: print "DEBUG: Finished appending of 'loop' functions"
    # Sort it
    functs_with_loops = sorted (functs_with_loops, key=lambda functions:functions[1])
    functs_with_loops.reverse()

    print "\n\n[ ] Most called 'loop' functions:"

    for function in functs_with_loops[:10]:
        print "[+] %s is called %d times" % (hex(function[0]), function[1])

    print "[+] I think %s could be memcpy or memset" % hex(functs_with_loops[0][0])

I hope this can be helpful for you. I suppose that, as usual, it’s full of bugs and it could be improved a lot so please, feel free to suggest/add/fix anything you want.

~ by aLS -- on July 2, 2012.

Leave a comment