Disassembling Arduino generated code


Johnny Five... No disassemble!


I recently had need to work out how many clock cycles certain functions in my code took to execute, so it was time to find the HEX file and disassemble back to the ASM instructions.  In the Arduino environment you are shielded from getting your hands dirty with the generated compiler files, but they can be located in a temporary directory in Windows, found under:

C:\Users\USERNAME\AppData\Local\Temp\buildXYZ.tmp

where "USERNAME" is your ... user name and XYZ is some long integer that Arduino comes up with.  Make sure you have recently compiled your code in Arduino - the folders are deleted or wiped when you exit Arduino.  In this folder you'll find some interesting files, among them the .HEX file and .ELF files are of particular interest.  Either make a .BAT file or use the CMD prompt to execute the following commands.

Disassemble from HEX

We'll use avr-objdump, but we need to specify the architecture.  Find the necessary chip here:


For the Atmega328 MCU we're told to use "avr5" as the target architecture.

avr-objdump -j .sec1 -d -m avr5 foo.cpp.hex > output.txt

The ASM output is saved to "output.txt"

Disassemble from ELF

avr-objdump -d -S foo.cpp.elf > output.txt

The ASM output is saved to "output.txt"

Counting clock cycles

Now we have the ASM code, we want to count up the clock cycles used per instruction, within a given line range.  I wrote a quick program in Python to scrape the datasheet for the Atmega328 chip to generate a dictionary of opcodes and the minimum and maximum clock cycles they can take, to give:
commands = {'BRLT': [1, 2], 'CPI': 1, 'CPSE': [1, 2, 3], 'CPC': 1, 'EOR': 1, 'WDR': 1, 
'MOVW': 1, 'BRLO': [1, 2], 'RETI': 4, 'SBIC': [1, 2, 3], 'SBIW': 2, 'SBIS': [1, 2, 3], 
'BRSH': [1, 2], 'MULSU': 2, 'BRPL': [1, 2], 'LPM': 3, 'BRCS': [1, 2], 'BRTS': [1, 2], 
'BRCC': [1, 2], 'OR': 1, 'BSET': 1, 'BRTC': [1, 2], 'BRNE': [1, 2], 'TST': 1, 'DEC': 1, 
'ROL': 1, 'IN': 1, 'LDD': 2, 'JMP': 3, 'SBRC': [1, 2, 3], 'LDS': 2, 'LSR': 1, 'ROR': 1, 
'SBRS': [1, 2, 3], 'SEV': 1, 'ANDI': 1, 'BRMI': [1, 2], 'SUBI': 1, 'AND': 1, 'BRVS': [1, 2],
 'CBI': 2, 'MOV': 1, 'CBR': 1, 'BRVC': [1, 2], 'RJMP': 2, 'NOP': 1, 'BREQ': [1, 2], 'INC': 1,
 'SUB': 1, 'NEG': 1, 'BRHS': [1, 2], 'ICALL': 3, 'RET': 4, 'BLD': 1, 'MUL': 2, 'BRHC': [1, 2], 
'COM': 1, 'ASR': 1, 'SET': 1, 'SES': 1, 'SER': 1, 'BST': 1, 'SEZ': 1, 'IJMP': 2, 'SEC': 1, 
'SWAP': 1, 'PUSH': 2, 'SEN': 1, 'SEI': 1, 'SEH': 1, 'CLN': 1, 'CLH': 1, 'CLI': 1, 'ADIW': 2, 
'CLC': 1, 'ADD': 1, 'ADC': 1, 'CLZ': 1, 'LDI': 1, 'CLT': 1, 'CLV': 1, 'CP': 1, 'CLR': 1, 
'CLS': 1, 'SBR': 1, 'ST': 2, 'BRGE': [1, 2], 'SBC': 1, 'ORI': 1, 'SBI': 2, 'RCALL': 3, 
'FMUL': 2, 'MULS': 2, 'BCLR': 1, 'LD': 2, 'SBCI': 1, 'BRBS': [1, 2], 'POP': 2, 'SLEEP': 1, 
'BRBC': [1, 2], 'STD': 2, 'STS': 2, 'FMULSU': 2, 'BRID': [1, 2], 'FMULS': 2, 'OUT': 1, 
'CALL': 4, 'BRIE': [1, 2], 'LSL': 1}

i.e. BRLT either takes 1 or 2 clock cycles, depending on the exact evaluation.  Note this set may not be (probably won't be) useful for other chips!

It is then a fairly trivial task to go through output.txt, look at the commands in a certain line range, and spit out a minimum and maximum clock cycle count.

def sumclockcycles(startline, endline):
    ## Line numbers are inclusive
    
    f = open("output.txt",'r')
    error = 0
    mincycles = 0
    maxcycles = 0
    
    for line in f.readlines():
        if ":" in line:
            splitline = line.split(":")
            linenumber = "0x"+splitline[0].strip()
            valid = 1
            
            try:
                ln=int(linenumber,16)
            except ValueError:
                valid = 0

            if valid:
                if ln>=startline and ln<=endline:                    
                    splitrestline = splitline[1].split("\t");
                    cmd = splitrestline[2].strip().upper()
                    if cmd in commands:
                        cycles = commands[cmd]
                        ##print hex(ln), cmd, cycles
                        try:
                            length = len(cycles)
                            mincycles += min(cycles)
                            maxcycles += max(cycles)
                        except TypeError:
                            mincycles += cycles
                            maxcycles += cycles
                    else:
                        print "Cmd not found",cmd,splitrestline
                        error+=1
    f.close()
    return (mincycles, maxcycles,error)

print sumclockcycles(0x55e, 0x5e6)
Hence interrupts, functions, etc can be critically assessed for performance.

Comments

Popular posts from this blog

Getting started with the Pro Micro Arduino Board

Arduino and Raspberry Pi serial communciation