AVR interrupt response time

I just started my first time critical project with AVR's.
And time critical meaning interrupt response times.
So far I have been using avr-gcc (3.3.x) and I have been
pretty happy with it. And I have written ALL code in C.

I'm hoping to get some code executed within 2 us or so
after external interrupt (INT0/INT1 with ATMega32).
I wrote the code to be executed today and ended up to
appr. 20 instructions/cycles. With 16 MHz clock that
means something like 1.25 us. Nothing much to optimize
there.
From datasheets I have found out that it takes 4 cycles
minimum (?) to jump to interrupt handler. By adding some
register saving and stuff, I was expecting less than 0.5 us
to start executing my own code => resulting in <2 us.

Ok, that was what I was hoping...

When I compiled the code and ran it, I noticed that it
took about 2.5 us to start executing my code?!?!
(I used ATMega8 as I don't have any M32 at the
moment, but I guess it isn't relevant??)
I checked the list file and one reason is the
LENGHTY prologue added by gcc into interrupt
handler (17 instructions!!!), saving LOT of registers...

Two questions:
1. Even with 4 cycles + 17 instructions there is 1 us
    missing?? What else happens before my own handler
    code starts executing?
2. Is there any way to tell gcc NOT to 'push' all those
    registers in to the prologue??

And finally:
If I expect to have my own code to execute
within 0.5 us, is the assembler the only way to go??

Thanks for any info in advance....
Any links to good resources are appreciated as well.
I REALLY like to know exactly what happens there.

Pygmi

Reply by Jeroen ●January 12, 20052005-01-12

"Pygmi" <bronco_castor@hotmail.com> wrote in message
news:B2fFd.411$6X6.281@read3.inet.fi...
> I just started my first time critical project with AVR's.
> And time critical meaning interrupt response times.
> So far I have been using avr-gcc (3.3.x) and I have been
> pretty happy with it. And I have written ALL code in C.
>
> I'm hoping to get some code executed within 2 us or so
> after external interrupt (INT0/INT1 with ATMega32).
> I wrote the code to be executed today and ended up to
> appr. 20 instructions/cycles. With 16 MHz clock that
> means something like 1.25 us. Nothing much to optimize
> there.
> From datasheets I have found out that it takes 4 cycles
> minimum (?) to jump to interrupt handler. By adding some
> register saving and stuff, I was expecting less than 0.5 us
> to start executing my own code => resulting in <2 us.
>
> Ok, that was what I was hoping...
>
> When I compiled the code and ran it, I noticed that it
> took about 2.5 us to start executing my code?!?!
> (I used ATMega8 as I don't have any M32 at the
> moment, but I guess it isn't relevant??)
> I checked the list file and one reason is the
> LENGHTY prologue added by gcc into interrupt
> handler (17 instructions!!!), saving LOT of registers...
>
> Two questions:
> 1. Even with 4 cycles + 17 instructions there is 1 us
>     missing?? What else happens before my own handler
>     code starts executing?
> 2. Is there any way to tell gcc NOT to 'push' all those
>     registers in to the prologue??
>
> And finally:
> If I expect to have my own code to execute
> within 0.5 us, is the assembler the only way to go??
>
> Thanks for any info in advance....
> Any links to good resources are appreciated as well.
> I REALLY like to know exactly what happens there.
>
> Pygmi
>

The processor first synchronizes the external input to it's own clock,
that's takes at 2 clocks. The processor also has to finish the currently
executing instruction. It takes 3 cyles to go the interrupt vector, from
where it executes a jump to your ISR, another 3 cycles. This is 8 cycles to
11 cycles total time, depending on the executing instruction; or 0.6875 us.
Then it has entered your ISR; you at least need to save the statusregister
and a few registers before useful work can be done.

How did you check the response time? With a scope?

Assembly will be neccesary if you want to sqeeze out every last bit of
performance. What's the application that this is so critical?

Jeroen

Reply by Pygmi ●January 12, 20052005-01-12

"Jeroen" <jayjay.1974@xs4all.nl> wrote in message
news:41e58945$0$6208$e4fe514c@news.xs4all.nl...
>
> "Pygmi" <bronco_castor@hotmail.com> wrote in message
> news:B2fFd.411$6X6.281@read3.inet.fi...
> > I just started my first time critical project with AVR's.
> > And time critical meaning interrupt response times.
> > So far I have been using avr-gcc (3.3.x) and I have been
> > pretty happy with it. And I have written ALL code in C.
> >
> > I'm hoping to get some code executed within 2 us or so
> > after external interrupt (INT0/INT1 with ATMega32).
> > I wrote the code to be executed today and ended up to
> > appr. 20 instructions/cycles. With 16 MHz clock that
> > means something like 1.25 us. Nothing much to optimize
> > there.
> > From datasheets I have found out that it takes 4 cycles
> > minimum (?) to jump to interrupt handler. By adding some
> > register saving and stuff, I was expecting less than 0.5 us
> > to start executing my own code => resulting in <2 us.
> >
> > Ok, that was what I was hoping...
> >
> > When I compiled the code and ran it, I noticed that it
> > took about 2.5 us to start executing my code?!?!
> > (I used ATMega8 as I don't have any M32 at the
> > moment, but I guess it isn't relevant??)
> > I checked the list file and one reason is the
> > LENGHTY prologue added by gcc into interrupt
> > handler (17 instructions!!!), saving LOT of registers...
> >
> > Two questions:
> > 1. Even with 4 cycles + 17 instructions there is 1 us
> >     missing?? What else happens before my own handler
> >     code starts executing?
> > 2. Is there any way to tell gcc NOT to 'push' all those
> >     registers in to the prologue??
> >
> > And finally:
> > If I expect to have my own code to execute
> > within 0.5 us, is the assembler the only way to go??
> >
> > Thanks for any info in advance....
> > Any links to good resources are appreciated as well.
> > I REALLY like to know exactly what happens there.
> >
> > Pygmi
> >
>
> The processor first synchronizes the external input to it's own clock,
> that's takes at 2 clocks. The processor also has to finish the currently
> executing instruction. It takes 3 cyles to go the interrupt vector, from
> where it executes a jump to your ISR, another 3 cycles. This is 8 cycles
to
> 11 cycles total time, depending on the executing instruction; or 0.6875
us.
> Then it has entered your ISR; you at least need to save the statusregister
> and a few registers before useful work can be done.
>
> How did you check the response time? With a scope?
>
> Assembly will be neccesary if you want to sqeeze out every last bit of
> performance. What's the application that this is so critical?
>
> Jeroen
>
>

Thanks for the response.

Yes, I checked the response time with scope. From external
signal to first executed instruction of my "own" code in interrupt
handler.

I have a need to service ISA bus logic (I/O read/writes), and I have
been told that R/W requests should be serviced within 2.5 us
(so not actually 2 us). I'm not quite sure about the 2.5 us requirement,
but if it is valid, it seems to be too much for AVR with 16 MHz...
Maybe if this could be the only interrupt in the system or having
nested interrupts.

..or I should forget all about interrupts and do the things I need
by polling. Not very tempting.
..or just faster processor (which would mean also jump from
AVR to another architecture)
..or the solution is a dual ported RAM??
..or some other option...there are of course options...but for
additional HW cost of course

Pygmi

Reply by Mike Harrison ●January 12, 20052005-01-12

On Wed, 12 Jan 2005 19:40:17 GMT, "Pygmi" <bronco_castor@hotmail.com> wrote:

>I just started my first time critical project with AVR's.
>And time critical meaning interrupt response times.
>So far I have been using avr-gcc (3.3.x) and I have been
>pretty happy with it. And I have written ALL code in C.

For the lowest latency, the fastest way is to dedicate some registers for use only within the
interrupt code - that way you don't have to push/pop anything, just copy status to a register.

If you can tell your C compiler to never use certain registers in foreground code, and write your
int code in assembler, this will give the fastest response. 

It may be that the standard C int handler can be modified  to reduce what it saves if it doesn't use
all the regs it saves - take a look at the assembler it generates - you may be able to hand-tweak
it.

Reply by Jeroen ●January 12, 20052005-01-12

"Pygmi" <bronco_castor@hotmail.com> wrote in message
news:2fhFd.500$6X6.308@read3.inet.fi...
>
> "Jeroen" <jayjay.1974@xs4all.nl> wrote in message
> news:41e58945$0$6208$e4fe514c@news.xs4all.nl...
> >
> > "Pygmi" <bronco_castor@hotmail.com> wrote in message
> > news:B2fFd.411$6X6.281@read3.inet.fi...

...

> > 11 cycles total time, depending on the executing instruction; or 0.6875
> us.
> > Then it has entered your ISR; you at least need to save the
statusregister
> > and a few registers before useful work can be done.
> >
> > How did you check the response time? With a scope?
> >
> > Assembly will be neccesary if you want to sqeeze out every last bit of
> > performance. What's the application that this is so critical?
> >
> > Jeroen
> >
> >
>
> Thanks for the response.
>
> Yes, I checked the response time with scope. From external
> signal to first executed instruction of my "own" code in interrupt
> handler.
>
> I have a need to service ISA bus logic (I/O read/writes), and I have
> been told that R/W requests should be serviced within 2.5 us
> (so not actually 2 us). I'm not quite sure about the 2.5 us requirement,
> but if it is valid, it seems to be too much for AVR with 16 MHz...
> Maybe if this could be the only interrupt in the system or having
> nested interrupts.
>
> ..or I should forget all about interrupts and do the things I need
> by polling. Not very tempting.
> ..or just faster processor (which would mean also jump from
> AVR to another architecture)
> ..or the solution is a dual ported RAM??
> ..or some other option...there are of course options...but for
> additional HW cost of course
>
> Pygmi
>

Latency on bigger processors is usually even worse... Interrupt latency on
for example a 80386 can take hunderds of cycles.

It's better to have some hardware to interface the ISA bus. A small cheap
CPLD is best, all you really need is an address decoder and a few registers.
The ISA bus runs at 8Mhz, the AVR runs at 16Mhz; this is just 2 instructions
for each ISA bus cycle. A jump alone is 3 cycles. So it's not possible, the
AVR just can't do anything useful. Only a much faster CPU could do it, but
still then the load is still very high.

The ISA interface can be done in plain HCT logic, and will only be a few
chips. A possible solution is a GAL20V8 as adress decoder. This decoder will
generate two strobes. One to enable a '574 that stores the data from the
databus and another to pass data from the AVR to the ISA via an '244. INT0/1
on the AVR can be used to let the AVR know something has been written. An
external interrupts needs to be at least 2 AVR clock cycles before it's
recognized, but to be on the safe side, it's better to use a flipflop that's
set by the address decoder, and reset by the AVR. The output of the FF goes
the INT0/1 input. This costs only 4 chips that cost next to nothing. If
board space is at premium, a 44 pin CPLD like a MAX7000S could be used.

Jeroen

Reply by CBFalconer ●January 12, 20052005-01-12

Jeroen wrote:
> "Pygmi" <bronco_castor@hotmail.com> wrote in message
>
>> I just started my first time critical project with AVR's.
>> And time critical meaning interrupt response times.
>> So far I have been using avr-gcc (3.3.x) and I have been
>> pretty happy with it. And I have written ALL code in C.
>>
>> I'm hoping to get some code executed within 2 us or so
>> after external interrupt (INT0/INT1 with ATMega32).
>> I wrote the code to be executed today and ended up to
>> appr. 20 instructions/cycles. With 16 MHz clock that
>> means something like 1.25 us. Nothing much to optimize
>> there.
>>
>> From datasheets I have found out that it takes 4 cycles
>> minimum (?) to jump to interrupt handler. By adding some
>> register saving and stuff, I was expecting less than 0.5 us
>> to start executing my own code => resulting in <2 us.
>>
>> Ok, that was what I was hoping...
>>
>> When I compiled the code and ran it, I noticed that it
>> took about 2.5 us to start executing my code?!?!
>> (I used ATMega8 as I don't have any M32 at the
>> moment, but I guess it isn't relevant??)
>> I checked the list file and one reason is the
>> LENGHTY prologue added by gcc into interrupt
>> handler (17 instructions!!!), saving LOT of registers...
>>
>> Two questions:
>> 1. Even with 4 cycles + 17 instructions there is 1 us
>>     missing?? What else happens before my own handler
>>     code starts executing?
>> 2. Is there any way to tell gcc NOT to 'push' all those
>>     registers in to the prologue??
>>
> > And finally:
>> If I expect to have my own code to execute
>> within 0.5 us, is the assembler the only way to go??
> 
> The processor first synchronizes the external input to it's own
> clock, that's takes at 2 clocks. The processor also has to finish
> the currently executing instruction. It takes 3 cyles to go the
> interrupt vector, from where it executes a jump to your ISR,
> another 3 cycles. This is 8 cycles to 11 cycles total time,
> depending on the executing instruction; or 0.6875 us. Then it has
> entered your ISR; you at least need to save the statusregister
> and a few registers before useful work can be done.
> 
> How did you check the response time? With a scope?
> 
> Assembly will be neccesary if you want to sqeeze out every last
> bit of performance. What's the application that this is so
> critical?

And all that assumes that the executing code has no critical
sections implemented by disabling interrupts.  Does no ARM
instruction take over 3 cycles?  What about a return?  What about
other interrupts and returns from them, if any.  Hairy.

-- 
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>  USE worldnet address!

Reply by Ulf Samuelsson ●January 12, 20052005-01-12

> When I compiled the code and ran it, I noticed that it took about 2.5 us
to start executing my code?!?!
> (I used ATMega8 as I don't have any M32 at the moment, but I guess it
isn't relevant??)
> I checked the list file and one reason is the LENGHTY prologue added by
gcc into interrupt
> handler (17 instructions!!!), saving LOT of registers...


You can try a better compiler than WinAVR!


// IAR C interrupt  handler
#pragma vector=12
__interrupt void handler()
{
  BYTE i = PORTB;
  PORTB = 0xF0;
  PORTB = 0x0F;
  PORTB = i;
}


Generated code

     51          __interrupt void handler()
   \                     handler:
     52          {
   \   00000000   931A                       ST      -Y,R17
   \   00000002   930A                       ST      -Y,R16
     53            BYTE i = PORTB;
   \   00000004   B318                       IN      R17,0x18
     54            PORTB = 0xF0;
   \   00000006   EF00                       LDI     R16,240
   \   00000008   BB08                       OUT     0x18,R16
     55            PORTB = 0x0F;
   \   0000000A   E00F                       LDI     R16,15
   \   0000000C   BB08                       OUT     0x18,R16
     56            PORTB = i;
   \   0000000E   BB18                       OUT     0x18,R17
     57          }
   \   00000010   9109                       LD      R16,Y+
   \   00000012   9119                       LD      R17,Y+
   \   00000014   9518                       RETI
     58

Two registers used, two registers pushed.
As you see, there is no reason to even push the PSR in this case since the
flags do not get updated.

If you need fast interrupt response, and need to do a lot,
then consider to divide the handler into two parts.

First part (minimal) does minimal fast processing and at the end, it sets an
external interrupt
which continues the processing after the fast interrupt has exited.

__no_init __register BYTE SavePortB @4;        Put i in Register r4
#pragma vector=TIMER
__interrupt void fast_handler(void)
{
   SavePortB = PORTB;
  set_ext_interrupt_pending();
}

#pragma    vector=EXT_INT_HANDLER
__interrupt    void    slow_handler(void)
{
    // Continue slow processing after fast handler has exited.
      PORTB = 0xF0;
      PORTB = 0x0F;
      PORTB = i;
}

Since the processing is minimal in the fast handler, very few registers
should be pushed by a good compiler.


There is a 4kB restricted C compiler for tests.
You have to personally contact IAR to get it. It is not on their web page.
This does not generate assembly code , only object code.



-- 
Best Regards
Ulf at atmel dot com
These comments are intended to be my own opinion and they
may, or may not be shared by my employer, Atmel Sweden.

Reply by Ulf Samuelsson ●January 12, 20052005-01-12

> And all that assumes that the executing code has no critical
> sections implemented by disabling interrupts.  Does no ARM
> instruction take over 3 cycles?  What about a return?  What about
> other interrupts and returns from them, if any.  Hairy.
>

Don't forget that the main reason for long worst case interrupt latencies is
probably another
interrupt which does not enable the global interrupt flag, this allowing
nexted interrupt.
This conflict will only appear AFTER customer shipment,according to Murphys
law.
You have to add together ALL interrupts in the system which has higher
priority
to find your worst case latency.
This is not something that can be tested. You have to do the calculations.

-- 
Best Regards
Ulf at atmel dot com
These comments are intended to be my own opinion and they
may, or may not be shared by my employer, Atmel Sweden.

Reply by CBFalconer ●January 13, 20052005-01-13

Ulf Samuelsson wrote:
> 
>> And all that assumes that the executing code has no critical
>> sections implemented by disabling interrupts.  Does no ARM
>> instruction take over 3 cycles?  What about a return?  What about
>> other interrupts and returns from them, if any.  Hairy.
> 
> Don't forget that the main reason for long worst case interrupt
> latencies is probably another interrupt which does not enable the
> global interrupt flag, this allowing nexted interrupt. This
> conflict will only appear AFTER customer shipment,according to
> Murphys law. You have to add together ALL interrupts in the
> system which has higher priority to find your worst case latency.
> This is not something that can be tested. You have to do the
> calculations.

Of course it is not impossible that the OP has something that runs
a basic loop and has only one interrupt in the system, in which
case there will be no critical sections and the latency is
controlled by the longest instruction.  However the return
instruction in many systems implies interrupt disable for the
following instruction, as a measure to avoid stack overflow in some
worst cases.  There are also special cases, such as the x86 string
instructions when using a repeat prefix.  Don't know about the ARM.

-- 
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>  USE worldnet address!

Reply by Pygmi ●January 13, 20052005-01-13

"Ulf Samuelsson" <ulf@NOSPAMatmel.com> wrote in message
news:34lvauF4e3rq9U1@individual.net...
> > When I compiled the code and ran it, I noticed that it took about 2.5 us
> to start executing my code?!?!
> > (I used ATMega8 as I don't have any M32 at the moment, but I guess it
> isn't relevant??)
> > I checked the list file and one reason is the LENGHTY prologue added by
> gcc into interrupt
> > handler (17 instructions!!!), saving LOT of registers...
>
>
> You can try a better compiler than WinAVR!
>

Actually I made a quick test today. Empty handler and
then I wrote the code using inline assembly. Prologue of
17 instruction dropped down to 4 and time from external
trigger from >2.5 us down to 1.2 us or so (at 14.75 MHz).

I was also able to snip few instruction out of my own code.
So, I think there is hope. It is possible to get all done within
2.5 us. But not with handlers written using C/gcc combination.
Maybe with C/IAR or some other compiler. And of course
by writing those critical parts in assembly.

I do understand also that this only half way there. I need to
design & write the other code in such away, that this one
critical handler is serviced with minimum delay...
But it is possible....maybe.

If there just was a AVR running with 25-30 MHz clock.

>
> // IAR C interrupt  handler
> #pragma vector=12
> __interrupt void handler()
> {
>   BYTE i = PORTB;
>   PORTB = 0xF0;
>   PORTB = 0x0F;
>   PORTB = i;
> }
>
>
> Generated code
>
>      51          __interrupt void handler()
>    \                     handler:
>      52          {
>    \   00000000   931A                       ST      -Y,R17
>    \   00000002   930A                       ST      -Y,R16
>      53            BYTE i = PORTB;
>    \   00000004   B318                       IN      R17,0x18
>      54            PORTB = 0xF0;
>    \   00000006   EF00                       LDI     R16,240
>    \   00000008   BB08                       OUT     0x18,R16
>      55            PORTB = 0x0F;
>    \   0000000A   E00F                       LDI     R16,15
>    \   0000000C   BB08                       OUT     0x18,R16
>      56            PORTB = i;
>    \   0000000E   BB18                       OUT     0x18,R17
>      57          }
>    \   00000010   9109                       LD      R16,Y+
>    \   00000012   9119                       LD      R17,Y+
>    \   00000014   9518                       RETI
>      58
>
> Two registers used, two registers pushed.
> As you see, there is no reason to even push the PSR in this case since the
> flags do not get updated.
>

I must admit that this seems very reasonable.

> If you need fast interrupt response, and need to do a lot,
> then consider to divide the handler into two parts.

Not a bad idea. In general. But in this case I want fast service
(i.e. end result) after external interrupt. So dividing would make
it even worse.

> First part (minimal) does minimal fast processing and at the end, it sets
an
> external interrupt
> which continues the processing after the fast interrupt has exited.
>
> __no_init __register BYTE SavePortB @4;        Put i in Register r4
> #pragma vector=TIMER
> __interrupt void fast_handler(void)
> {
>    SavePortB = PORTB;
>   set_ext_interrupt_pending();
> }
>
> #pragma    vector=EXT_INT_HANDLER
> __interrupt    void    slow_handler(void)
> {
>     // Continue slow processing after fast handler has exited.
>       PORTB = 0xF0;
>       PORTB = 0x0F;
>       PORTB = i;
> }
>
> Since the processing is minimal in the fast handler, very few registers
> should be pushed by a good compiler.
>
>
> There is a 4kB restricted C compiler for tests.
> You have to personally contact IAR to get it. It is not on their web page.
> This does not generate assembly code , only object code.
>
>

I have seen you recommending IAR earlier. Unfortunately I have some
bad memories working with IAR compiler.
...ok several years back with Hitachi H8
...so maybe not very relevant today
...buts at that time we were consdering to go to gcc to get rid of all
the bugs in IAR libs. But all this happened in last century.

So, maybe I should give it a try one of thes days.

>
> -- 
> Best Regards
> Ulf at atmel dot com
> These comments are intended to be my own opinion and they
> may, or may not be shared by my employer, Atmel Sweden.
>

Thanks
Pygmi

Previous12 Next

AVR interrupt response time

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group