EmbeddedRelated.com
Forums

AVR interrupt response time

Started by Pygmi January 12, 2005
I just started my first time critical project with AVR's.
And time critical meaning interrupt response times.
So far I have been using avr-gcc (3.3.x) and I have been
pretty happy with it. And I have written ALL code in C.

I'm hoping to get some code executed within 2 us or so
after external interrupt (INT0/INT1 with ATMega32).
I wrote the code to be executed today and ended up to
appr. 20 instructions/cycles. With 16 MHz clock that
means something like 1.25 us. Nothing much to optimize
there.
From datasheets I have found out that it takes 4 cycles
minimum (?) to jump to interrupt handler. By adding some
register saving and stuff, I was expecting less than 0.5 us
to start executing my own code => resulting in <2 us.

Ok, that was what I was hoping...

When I compiled the code and ran it, I noticed that it
took about 2.5 us to start executing my code?!?!
(I used ATMega8 as I don't have any M32 at the
moment, but I guess it isn't relevant??)
I checked the list file and one reason is the
LENGHTY prologue added by gcc into interrupt
handler (17 instructions!!!), saving LOT of registers...

Two questions:
1. Even with 4 cycles + 17 instructions there is 1 us
    missing?? What else happens before my own handler
    code starts executing?
2. Is there any way to tell gcc NOT to 'push' all those
    registers in to the prologue??

And finally:
If I expect to have my own code to execute
within 0.5 us, is the assembler the only way to go??

Thanks for any info in advance....
Any links to good resources are appreciated as well.
I REALLY like to know exactly what happens there.

Pygmi


"Pygmi" <bronco_castor@hotmail.com> wrote in message
news:B2fFd.411$6X6.281@read3.inet.fi...
> I just started my first time critical project with AVR's. > And time critical meaning interrupt response times. > So far I have been using avr-gcc (3.3.x) and I have been > pretty happy with it. And I have written ALL code in C. > > I'm hoping to get some code executed within 2 us or so > after external interrupt (INT0/INT1 with ATMega32). > I wrote the code to be executed today and ended up to > appr. 20 instructions/cycles. With 16 MHz clock that > means something like 1.25 us. Nothing much to optimize > there. > From datasheets I have found out that it takes 4 cycles > minimum (?) to jump to interrupt handler. By adding some > register saving and stuff, I was expecting less than 0.5 us > to start executing my own code => resulting in <2 us. > > Ok, that was what I was hoping... > > When I compiled the code and ran it, I noticed that it > took about 2.5 us to start executing my code?!?! > (I used ATMega8 as I don't have any M32 at the > moment, but I guess it isn't relevant??) > I checked the list file and one reason is the > LENGHTY prologue added by gcc into interrupt > handler (17 instructions!!!), saving LOT of registers... > > Two questions: > 1. Even with 4 cycles + 17 instructions there is 1 us > missing?? What else happens before my own handler > code starts executing? > 2. Is there any way to tell gcc NOT to 'push' all those > registers in to the prologue?? > > And finally: > If I expect to have my own code to execute > within 0.5 us, is the assembler the only way to go?? > > Thanks for any info in advance.... > Any links to good resources are appreciated as well. > I REALLY like to know exactly what happens there. > > Pygmi >
The processor first synchronizes the external input to it's own clock, that's takes at 2 clocks. The processor also has to finish the currently executing instruction. It takes 3 cyles to go the interrupt vector, from where it executes a jump to your ISR, another 3 cycles. This is 8 cycles to 11 cycles total time, depending on the executing instruction; or 0.6875 us. Then it has entered your ISR; you at least need to save the statusregister and a few registers before useful work can be done. How did you check the response time? With a scope? Assembly will be neccesary if you want to sqeeze out every last bit of performance. What's the application that this is so critical? Jeroen
"Jeroen" <jayjay.1974@xs4all.nl> wrote in message
news:41e58945$0$6208$e4fe514c@news.xs4all.nl...
> > "Pygmi" <bronco_castor@hotmail.com> wrote in message > news:B2fFd.411$6X6.281@read3.inet.fi... > > I just started my first time critical project with AVR's. > > And time critical meaning interrupt response times. > > So far I have been using avr-gcc (3.3.x) and I have been > > pretty happy with it. And I have written ALL code in C. > > > > I'm hoping to get some code executed within 2 us or so > > after external interrupt (INT0/INT1 with ATMega32). > > I wrote the code to be executed today and ended up to > > appr. 20 instructions/cycles. With 16 MHz clock that > > means something like 1.25 us. Nothing much to optimize > > there. > > From datasheets I have found out that it takes 4 cycles > > minimum (?) to jump to interrupt handler. By adding some > > register saving and stuff, I was expecting less than 0.5 us > > to start executing my own code => resulting in <2 us. > > > > Ok, that was what I was hoping... > > > > When I compiled the code and ran it, I noticed that it > > took about 2.5 us to start executing my code?!?! > > (I used ATMega8 as I don't have any M32 at the > > moment, but I guess it isn't relevant??) > > I checked the list file and one reason is the > > LENGHTY prologue added by gcc into interrupt > > handler (17 instructions!!!), saving LOT of registers... > > > > Two questions: > > 1. Even with 4 cycles + 17 instructions there is 1 us > > missing?? What else happens before my own handler > > code starts executing? > > 2. Is there any way to tell gcc NOT to 'push' all those > > registers in to the prologue?? > > > > And finally: > > If I expect to have my own code to execute > > within 0.5 us, is the assembler the only way to go?? > > > > Thanks for any info in advance.... > > Any links to good resources are appreciated as well. > > I REALLY like to know exactly what happens there. > > > > Pygmi > > > > The processor first synchronizes the external input to it's own clock, > that's takes at 2 clocks. The processor also has to finish the currently > executing instruction. It takes 3 cyles to go the interrupt vector, from > where it executes a jump to your ISR, another 3 cycles. This is 8 cycles
to
> 11 cycles total time, depending on the executing instruction; or 0.6875
us.
> Then it has entered your ISR; you at least need to save the statusregister > and a few registers before useful work can be done. > > How did you check the response time? With a scope? > > Assembly will be neccesary if you want to sqeeze out every last bit of > performance. What's the application that this is so critical? > > Jeroen > >
Thanks for the response. Yes, I checked the response time with scope. From external signal to first executed instruction of my "own" code in interrupt handler. I have a need to service ISA bus logic (I/O read/writes), and I have been told that R/W requests should be serviced within 2.5 us (so not actually 2 us). I'm not quite sure about the 2.5 us requirement, but if it is valid, it seems to be too much for AVR with 16 MHz... Maybe if this could be the only interrupt in the system or having nested interrupts. ..or I should forget all about interrupts and do the things I need by polling. Not very tempting. ..or just faster processor (which would mean also jump from AVR to another architecture) ..or the solution is a dual ported RAM?? ..or some other option...there are of course options...but for additional HW cost of course Pygmi
On Wed, 12 Jan 2005 19:40:17 GMT, "Pygmi" <bronco_castor@hotmail.com> wrote:

>I just started my first time critical project with AVR's. >And time critical meaning interrupt response times. >So far I have been using avr-gcc (3.3.x) and I have been >pretty happy with it. And I have written ALL code in C.
For the lowest latency, the fastest way is to dedicate some registers for use only within the interrupt code - that way you don't have to push/pop anything, just copy status to a register. If you can tell your C compiler to never use certain registers in foreground code, and write your int code in assembler, this will give the fastest response. It may be that the standard C int handler can be modified to reduce what it saves if it doesn't use all the regs it saves - take a look at the assembler it generates - you may be able to hand-tweak it.
"Pygmi" <bronco_castor@hotmail.com> wrote in message
news:2fhFd.500$6X6.308@read3.inet.fi...
> > "Jeroen" <jayjay.1974@xs4all.nl> wrote in message > news:41e58945$0$6208$e4fe514c@news.xs4all.nl... > > > > "Pygmi" <bronco_castor@hotmail.com> wrote in message > > news:B2fFd.411$6X6.281@read3.inet.fi...
...
> > 11 cycles total time, depending on the executing instruction; or 0.6875 > us. > > Then it has entered your ISR; you at least need to save the
statusregister
> > and a few registers before useful work can be done. > > > > How did you check the response time? With a scope? > > > > Assembly will be neccesary if you want to sqeeze out every last bit of > > performance. What's the application that this is so critical? > > > > Jeroen > > > > > > Thanks for the response. > > Yes, I checked the response time with scope. From external > signal to first executed instruction of my "own" code in interrupt > handler. > > I have a need to service ISA bus logic (I/O read/writes), and I have > been told that R/W requests should be serviced within 2.5 us > (so not actually 2 us). I'm not quite sure about the 2.5 us requirement, > but if it is valid, it seems to be too much for AVR with 16 MHz... > Maybe if this could be the only interrupt in the system or having > nested interrupts. > > ..or I should forget all about interrupts and do the things I need > by polling. Not very tempting. > ..or just faster processor (which would mean also jump from > AVR to another architecture) > ..or the solution is a dual ported RAM?? > ..or some other option...there are of course options...but for > additional HW cost of course > > Pygmi >
Latency on bigger processors is usually even worse... Interrupt latency on for example a 80386 can take hunderds of cycles. It's better to have some hardware to interface the ISA bus. A small cheap CPLD is best, all you really need is an address decoder and a few registers. The ISA bus runs at 8Mhz, the AVR runs at 16Mhz; this is just 2 instructions for each ISA bus cycle. A jump alone is 3 cycles. So it's not possible, the AVR just can't do anything useful. Only a much faster CPU could do it, but still then the load is still very high. The ISA interface can be done in plain HCT logic, and will only be a few chips. A possible solution is a GAL20V8 as adress decoder. This decoder will generate two strobes. One to enable a '574 that stores the data from the databus and another to pass data from the AVR to the ISA via an '244. INT0/1 on the AVR can be used to let the AVR know something has been written. An external interrupts needs to be at least 2 AVR clock cycles before it's recognized, but to be on the safe side, it's better to use a flipflop that's set by the address decoder, and reset by the AVR. The output of the FF goes the INT0/1 input. This costs only 4 chips that cost next to nothing. If board space is at premium, a 44 pin CPLD like a MAX7000S could be used. Jeroen
Jeroen wrote:
> "Pygmi" <bronco_castor@hotmail.com> wrote in message > >> I just started my first time critical project with AVR's. >> And time critical meaning interrupt response times. >> So far I have been using avr-gcc (3.3.x) and I have been >> pretty happy with it. And I have written ALL code in C. >> >> I'm hoping to get some code executed within 2 us or so >> after external interrupt (INT0/INT1 with ATMega32). >> I wrote the code to be executed today and ended up to >> appr. 20 instructions/cycles. With 16 MHz clock that >> means something like 1.25 us. Nothing much to optimize >> there. >> >> From datasheets I have found out that it takes 4 cycles >> minimum (?) to jump to interrupt handler. By adding some >> register saving and stuff, I was expecting less than 0.5 us >> to start executing my own code => resulting in <2 us. >> >> Ok, that was what I was hoping... >> >> When I compiled the code and ran it, I noticed that it >> took about 2.5 us to start executing my code?!?! >> (I used ATMega8 as I don't have any M32 at the >> moment, but I guess it isn't relevant??) >> I checked the list file and one reason is the >> LENGHTY prologue added by gcc into interrupt >> handler (17 instructions!!!), saving LOT of registers... >> >> Two questions: >> 1. Even with 4 cycles + 17 instructions there is 1 us >> missing?? What else happens before my own handler >> code starts executing? >> 2. Is there any way to tell gcc NOT to 'push' all those >> registers in to the prologue?? >> > > And finally: >> If I expect to have my own code to execute >> within 0.5 us, is the assembler the only way to go?? > > The processor first synchronizes the external input to it's own > clock, that's takes at 2 clocks. The processor also has to finish > the currently executing instruction. It takes 3 cyles to go the > interrupt vector, from where it executes a jump to your ISR, > another 3 cycles. This is 8 cycles to 11 cycles total time, > depending on the executing instruction; or 0.6875 us. Then it has > entered your ISR; you at least need to save the statusregister > and a few registers before useful work can be done. > > How did you check the response time? With a scope? > > Assembly will be neccesary if you want to sqeeze out every last > bit of performance. What's the application that this is so > critical?
And all that assumes that the executing code has no critical sections implemented by disabling interrupts. Does no ARM instruction take over 3 cycles? What about a return? What about other interrupts and returns from them, if any. Hairy. -- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> USE worldnet address!
> When I compiled the code and ran it, I noticed that it took about 2.5 us
to start executing my code?!?!
> (I used ATMega8 as I don't have any M32 at the moment, but I guess it
isn't relevant??)
> I checked the list file and one reason is the LENGHTY prologue added by
gcc into interrupt
> handler (17 instructions!!!), saving LOT of registers...
You can try a better compiler than WinAVR! // IAR C interrupt handler #pragma vector=12 __interrupt void handler() { BYTE i = PORTB; PORTB = 0xF0; PORTB = 0x0F; PORTB = i; } Generated code 51 __interrupt void handler() \ handler: 52 { \ 00000000 931A ST -Y,R17 \ 00000002 930A ST -Y,R16 53 BYTE i = PORTB; \ 00000004 B318 IN R17,0x18 54 PORTB = 0xF0; \ 00000006 EF00 LDI R16,240 \ 00000008 BB08 OUT 0x18,R16 55 PORTB = 0x0F; \ 0000000A E00F LDI R16,15 \ 0000000C BB08 OUT 0x18,R16 56 PORTB = i; \ 0000000E BB18 OUT 0x18,R17 57 } \ 00000010 9109 LD R16,Y+ \ 00000012 9119 LD R17,Y+ \ 00000014 9518 RETI 58 Two registers used, two registers pushed. As you see, there is no reason to even push the PSR in this case since the flags do not get updated. If you need fast interrupt response, and need to do a lot, then consider to divide the handler into two parts. First part (minimal) does minimal fast processing and at the end, it sets an external interrupt which continues the processing after the fast interrupt has exited. __no_init __register BYTE SavePortB @4; Put i in Register r4 #pragma vector=TIMER __interrupt void fast_handler(void) { SavePortB = PORTB; set_ext_interrupt_pending(); } #pragma vector=EXT_INT_HANDLER __interrupt void slow_handler(void) { // Continue slow processing after fast handler has exited. PORTB = 0xF0; PORTB = 0x0F; PORTB = i; } Since the processing is minimal in the fast handler, very few registers should be pushed by a good compiler. There is a 4kB restricted C compiler for tests. You have to personally contact IAR to get it. It is not on their web page. This does not generate assembly code , only object code. -- Best Regards Ulf at atmel dot com These comments are intended to be my own opinion and they may, or may not be shared by my employer, Atmel Sweden.
> And all that assumes that the executing code has no critical > sections implemented by disabling interrupts. Does no ARM > instruction take over 3 cycles? What about a return? What about > other interrupts and returns from them, if any. Hairy. >
Don't forget that the main reason for long worst case interrupt latencies is probably another interrupt which does not enable the global interrupt flag, this allowing nexted interrupt. This conflict will only appear AFTER customer shipment,according to Murphys law. You have to add together ALL interrupts in the system which has higher priority to find your worst case latency. This is not something that can be tested. You have to do the calculations. -- Best Regards Ulf at atmel dot com These comments are intended to be my own opinion and they may, or may not be shared by my employer, Atmel Sweden.
Ulf Samuelsson wrote:
> >> And all that assumes that the executing code has no critical >> sections implemented by disabling interrupts. Does no ARM >> instruction take over 3 cycles? What about a return? What about >> other interrupts and returns from them, if any. Hairy. > > Don't forget that the main reason for long worst case interrupt > latencies is probably another interrupt which does not enable the > global interrupt flag, this allowing nexted interrupt. This > conflict will only appear AFTER customer shipment,according to > Murphys law. You have to add together ALL interrupts in the > system which has higher priority to find your worst case latency. > This is not something that can be tested. You have to do the > calculations.
Of course it is not impossible that the OP has something that runs a basic loop and has only one interrupt in the system, in which case there will be no critical sections and the latency is controlled by the longest instruction. However the return instruction in many systems implies interrupt disable for the following instruction, as a measure to avoid stack overflow in some worst cases. There are also special cases, such as the x86 string instructions when using a repeat prefix. Don't know about the ARM. -- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> USE worldnet address!
"Ulf Samuelsson" <ulf@NOSPAMatmel.com> wrote in message
news:34lvauF4e3rq9U1@individual.net...
> > When I compiled the code and ran it, I noticed that it took about 2.5 us > to start executing my code?!?! > > (I used ATMega8 as I don't have any M32 at the moment, but I guess it > isn't relevant??) > > I checked the list file and one reason is the LENGHTY prologue added by > gcc into interrupt > > handler (17 instructions!!!), saving LOT of registers... > > > You can try a better compiler than WinAVR! >
Actually I made a quick test today. Empty handler and then I wrote the code using inline assembly. Prologue of 17 instruction dropped down to 4 and time from external trigger from >2.5 us down to 1.2 us or so (at 14.75 MHz). I was also able to snip few instruction out of my own code. So, I think there is hope. It is possible to get all done within 2.5 us. But not with handlers written using C/gcc combination. Maybe with C/IAR or some other compiler. And of course by writing those critical parts in assembly. I do understand also that this only half way there. I need to design & write the other code in such away, that this one critical handler is serviced with minimum delay... But it is possible....maybe. If there just was a AVR running with 25-30 MHz clock.
> > // IAR C interrupt handler > #pragma vector=12 > __interrupt void handler() > { > BYTE i = PORTB; > PORTB = 0xF0; > PORTB = 0x0F; > PORTB = i; > } > > > Generated code > > 51 __interrupt void handler() > \ handler: > 52 { > \ 00000000 931A ST -Y,R17 > \ 00000002 930A ST -Y,R16 > 53 BYTE i = PORTB; > \ 00000004 B318 IN R17,0x18 > 54 PORTB = 0xF0; > \ 00000006 EF00 LDI R16,240 > \ 00000008 BB08 OUT 0x18,R16 > 55 PORTB = 0x0F; > \ 0000000A E00F LDI R16,15 > \ 0000000C BB08 OUT 0x18,R16 > 56 PORTB = i; > \ 0000000E BB18 OUT 0x18,R17 > 57 } > \ 00000010 9109 LD R16,Y+ > \ 00000012 9119 LD R17,Y+ > \ 00000014 9518 RETI > 58 > > Two registers used, two registers pushed. > As you see, there is no reason to even push the PSR in this case since the > flags do not get updated. >
I must admit that this seems very reasonable.
> If you need fast interrupt response, and need to do a lot, > then consider to divide the handler into two parts.
Not a bad idea. In general. But in this case I want fast service (i.e. end result) after external interrupt. So dividing would make it even worse.
> First part (minimal) does minimal fast processing and at the end, it sets
an
> external interrupt > which continues the processing after the fast interrupt has exited. > > __no_init __register BYTE SavePortB @4; Put i in Register r4 > #pragma vector=TIMER > __interrupt void fast_handler(void) > { > SavePortB = PORTB; > set_ext_interrupt_pending(); > } > > #pragma vector=EXT_INT_HANDLER > __interrupt void slow_handler(void) > { > // Continue slow processing after fast handler has exited. > PORTB = 0xF0; > PORTB = 0x0F; > PORTB = i; > } > > Since the processing is minimal in the fast handler, very few registers > should be pushed by a good compiler. > > > There is a 4kB restricted C compiler for tests. > You have to personally contact IAR to get it. It is not on their web page. > This does not generate assembly code , only object code. > >
I have seen you recommending IAR earlier. Unfortunately I have some bad memories working with IAR compiler. ...ok several years back with Hitachi H8 ...so maybe not very relevant today ...buts at that time we were consdering to go to gcc to get rid of all the bugs in IAR libs. But all this happened in last century. So, maybe I should give it a try one of thes days.
> > -- > Best Regards > Ulf at atmel dot com > These comments are intended to be my own opinion and they > may, or may not be shared by my employer, Atmel Sweden. >
Thanks Pygmi