Programming in the small

How small can a C program be?

(For this post, I am using an Atmel ATTINY4313a AVR processor, but most of this stuff should apply to C code complied for any 8-bit AVR chip with the avr-gcc compiler. This  includes the Ardunio.)

Here is the smallest useful C program I could come up with…

int main(void) {
    DDRA|= 0x01;      // Set PORTA0 bit to output.

    while(1) {        // Repeat forever
        PINA |= 0x01; // Toggle the bit

First it sets pin PA0 to output mode, then it toggles it on and off as fast as it can forever.

Here is the assembly code that compiles down to…

3c: d0 9a      sbi   0x1a, 0;  // DDRA |= 0x01
3e: c8 9a      sbi   0x19, 0;  // PINA |= 0x01
40: fe cf      rjmp  .-4;      // Jump back and do it again...

Which is pretty sort and sweet – only 6 bytes long! Note that I ORed the values into the registers so the compiler could use the set bit (SBI) instruction which is only 1 word long. It really doesn’t get any smaller than this.

We can check to make sure the program actually works by connecting an oscilloscope to pin 5 of the chip, and we see this…


Processor speed at bootup= 1MHz
Time for each cycle= 1/1MHz = 1us/cycle
Cycles for SBI instruction=2 cycles
Cycles for RJMP instruction=2  cycles
Total cycles to toggle bit on then off=2*(2 cycles+2 cycles)=8 cycles
Total period=8 cycles * 1 us/cycle=8us

It looks like this is in fact our code talking to us, all 6 bytes of it.

Unfortunately, when we look at what actually downloaded into the chip, we see that it used up 70 bytes of our precious program memory! Who invited the other 64 bytes to this party? Let’s take a look at the compiler output and see…

00000000 <__vectors>:
   0:	14 c0 rjmp	.+40     	; 0x2a <__ctors_end>
   2:	1b c0 rjmp	.+54     	; 0x3a <__bad_interrupt>
   4:	1a c0 rjmp	.+52     	; 0x3a <__bad_interrupt>
   6:	19 c0 rjmp	.+50     	; 0x3a <__bad_interrupt>
   8:	18 c0 rjmp	.+48     	; 0x3a <__bad_interrupt>
   a:	17 c0 rjmp	.+46     	; 0x3a <__bad_interrupt>
   c:	16 c0 rjmp	.+44     	; 0x3a <__bad_interrupt>
   e:	15 c0 rjmp	.+42     	; 0x3a <__bad_interrupt>
  10:	14 c0 rjmp	.+40     	; 0x3a <__bad_interrupt>
  12:	13 c0 rjmp	.+38     	; 0x3a <__bad_interrupt>
  14:	12 c0 rjmp	.+36     	; 0x3a <__bad_interrupt>
  16:	11 c0 rjmp	.+34     	; 0x3a <__bad_interrupt>
  18:	10 c0 rjmp	.+32     	; 0x3a <__bad_interrupt>
  1a:	0f c0 rjmp	.+30     	; 0x3a <__bad_interrupt>
  1c:	0e c0 rjmp	.+28     	; 0x3a <__bad_interrupt>
  1e:	0d c0 rjmp	.+26     	; 0x3a <__bad_interrupt>
  20:	0c c0 rjmp	.+24     	; 0x3a <__bad_interrupt>
  22:	0b c0 rjmp	.+22     	; 0x3a <__bad_interrupt>
  24:	0a c0 rjmp	.+20     	; 0x3a <__bad_interrupt>
  26:	09 c0 rjmp	.+18     	; 0x3a <__bad_interrupt>
  28:	08 c0 rjmp	.+16     	; 0x3a <__bad_interrupt>

0000002a <__ctors_end>:
  2a:	11 24 eor	r1, r1
  2c:	1f be out	0x3f, r1	; 63
  2e:	cf e5 ldi	r28, 0x5F	; 95
  30:	d1 e0 ldi	r29, 0x01	; 1
  32:	de bf out	0x3e, r29	; 62
  34:	cd bf out	0x3d, r28	; 61
  36:	02 d0 rcall	.+4      	; 0x3c 
  38:	04 c0 rjmp	.+8      	; 0x42 <_exit>

0000003a <__bad_interrupt>:
  3a:	e2 cf rjmp	.-60     	; 0x0 <__vectors>

0000003c <main>:

  3c:	d0 9a sbi	0x1a, 0	; 26
  3e:	c8 9a sbi	0x19, 0	; 25
  40:	fe cf rjmp	.-4      	; 0x3e <__SP_H__>

00000042 <_exit>:
  42:	f8 94 cli

00000044 <__stop_program>:
  44:	ff cf rjmp	.-2      	; 0x44 <__stop_program>

You can spot our little routine just after main(), but it is drowning in a sea of other code.

It turns out that the C compiler throws lots of extra stuff in that, under normal circumstances, makes C programmers’ (and compiler writers’) lives easier. Here is the breakdown of the 70 bytes…

Interrupt vector table 42
Initialization code 16
Bad Interrupt Vector 2
main (our program) 6
Exit routine 4
Total 70

Lets take each of these and see what they do and what we can do about them.

Interrupt Vector Table

When ever the processor gets interrupted from running normal step-by-step code, it will jump to one of these addresses based on what interrupted it. This is part of the defined behavior of the chip. If Timer 1 overflows, it jumps to vector #6. If the Analog Compare triggers, it jumps to vector #12. There are 21 vectors in all, each for a different source of interrupts. Each vector is really just an instruction to jump to someplace else, so each vector takes up 2 bytes. 21 vectors * 2 bytes/vector = 42 bytes.

Note that the first vector (at address 0) is particularly important because this is the Reset vector. This where the processor starts up after a reset – including when it gets turned on.

Initialization Code

Here is the initialization code….

  2a:	11 24       	eor	r1, r1
  2c:	1f be       	out	0x3f, r1	; 63
  2e:	cf e5       	ldi	r28, 0x5F	; 95
  30:	d1 e0       	ldi	r29, 0x01	; 1
  32:	de bf       	out	0x3e, r29	; 62
  34:	cd bf       	out	0x3d, r28	; 61
  36:	02 d0       	rcall	.+4      	; 0x3c

The first line clears register R1 to equal 0 (anything XORed with itself is zero). The compiler often needs a zero handy, so it dedicates register 1 to always and forever have a zero in it. This is how that original zero gets there.

The next line clears out location the Status Register (0x3f) by loading it with zero (using the handy zero that was put in R1 on the line before). I’m not really sure why they do this…

The next 4 lines set up the Stack Pointer (0x3C) to point to the top of RAM (0x5F). It also is putting an 0x01 in location 0x3E, which according to the chip’s documentation should be a reserved location an not used. Maybe this is a benign copy-paste error from another chip that supported a 2 byte Stack Pointer?

The last line calls into the main() function of our C code.

Bad Interrupt Vector

This looks like just a vector for other vectors to point to, and all of the unassigned interrupt vectors point to it. The Bad Interrupt Vector itself just jump back to the reset vector. This mans that if, say, you get a timer interrupt and you have not set up a vector for it, then the processor will first jump to the Timer Interrupt Vector, which will send it to the Bad Interrupt Vector, which will send it to the Reset Vector, which will send it to the Initialization code. Seems like it would be much easier and more efficient just to have all the unassigned vectors point directly to wherever you want them to go (currently the Reset Vector).

Exit Routine

This is code that executes when the main() function returns (ours loops forever so this never happens in our case).  All the Exit Routine does is turn off all interrupts and then loop forever. Again not sure why you’d do this – I can imagine writing a program that was completely interrupt driven and just sets up all the interrupts in the main() and then returns. Because this Exit Routine turns off all interrupts, this wont work and I must add an extra while(1) in my main().

At least there is nothing magic going on here – we can see exactly where all of the extra bytes are coming from, and figure out why they are there (or don’t need to be!).

Tune in next time for some drastic cutting….

Cutting power the easy way

Turned off all unused units like the analog comparator, Timer0, USI, and  USART.

Saved about 0.2mA. Not much, but I guess worth the tiny amount of effort.

ACSR |= ACD;        // Turn off analog compare unit. We don't use it, so save power. Saves about 0.1ma   3.6mA drops to  3.5 mA

PRR = PRTIM0 | PRUSI | PRUSART;        // Turn off Timer/Counter0, USI, USART since we don't need them. Saves about 0.1mA.

Brownout detector already off via fuses. Won’t really worry about setting IO pins to output since the input is disabled during sleep and we will be asleep almost all the time except when updating display.

I guess it is now all about maximizing time in SLEEP!

Commit here…