Quickpost: IDAPython script to identify unrecognized functions.
Hey folks! This time I’m gonna share with you a small IDAPython tool made by Federico Muttis (aka @acid_. Maybe you remember him from the -pretty awesome- pidgin vulnerability or the WebEx one). This is one of those scripts that you have to use and reuse several times when working with obscure firmwares, memory dumps or even unknown pieces of code. A lot of us made something like this in the past. It’s a must. But I felt that we really needed something with a little more generical approach. Like Acid did.
Let’s see what he has to say about it
When reversing unknown binaries, such as firmware or any non-standard executable (ELF, PE, etc), it’s pretty common that IDA doesn’t recognize most of the functions.
This is when I usually start hitting “C” whenever something looks like code, and then define everything that looks like functions using “P”.
Of course IDA helps a bit, i.e. when you find a function that jumps to another section on the file, it disassemblies that part, and defines some functions.
But sometimes the binary file is just too long, and even if IDA helps by defining such sections of the file as code/functions, there is a lot of undefined code as well.
This little IDA Python script finds all your defined functions, takes the first instruction’s opcode and searches for it in the rest of the file, if the opcode is found in an undefined portion of the file, it does MakeCode, which is the same as hitting “C”, and then MakeFunction (IDC equivalent for “P”).
It’s worth mentioning that the script also filters which opcodes are functions prologues based on a set of common instructions (i.e. “STMFD” (for ARM), “PUSH” and “MOV”).
You should modify it to suit your needs.
import idc import struct import idautils def find_all( opcode_str ): ret =  ea = idc.FindBinary(0, 1, opcode_str) while ea != idc.BADADDR: ret.append(ea) ea = idc.FindBinary(ea + 4, 1, opcode_str) return ret def define_functions(): # The function first searches for all user defined functions, reads # the opcodes and searches for that opcodes in the rest of the file. # # You can extend this by adding more disassembled instructions that # make you believe are function prologues. # # Obviously not any PUSH is a function start, this is only a filter # against erroneously defined functions. So if you define a function # that starts with other instruction (and you think there could be # other functions that start with that instruction), just add it here. prologues = ["STMFD", "push", "PUSH", "mov", "MOV"] print "Finding all signatures" ea = 0 opcodes = set() for funcea in idautils.Functions(idc.SegStart(ea), idc.SegEnd(ea)): # Get the opcode start_opcode = idc.Dword(funcea) # Get the disassembled text dis_text = idc.GetDisasm(funcea) we_like_it = False # Filter possible errors on manually defined functions for prologue in prologues: if prologue in dis_text: we_like_it = True # If it passes the filter, add the opcode to the search list. if we_like_it: opcodes.add(start_opcode) print "# different opcodes: %x" % (len(opcodes)) while len(opcodes) > 0: # Search for this opcode in the rest of the file opcode_bin = opcodes.pop() opcode_str = " ".join(x.encode("hex") for x in struct.pack("<L", opcode_bin)) print "Searching for " + opcode_str matches = find_all( opcode_str ) for matchea in matches: # If the opcode is found in a non-function if not idc.GetFunctionName(matchea): # Try to make code and function print "Defining function at " + hex(matchea) idc.MakeCode(matchea) idc.MakeFunction(matchea) print "We're done!" define_functions()
This in an example of a firmware file with only user (and IDA) defined functions:
And this is after the script ran:
Obviously, blue means code within a function.