Atmel releasing FLASH AVR32 ?| page 7

Reply by Wilco Dijkstra ●April 6, 20072007-04-06

"Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message news:ev5v86$nt5$1@aioe.org...

I think this is the crux of the problem, so let's address this first:

> Tell me how your interrupt system will make the pipeline execute
> instructions for two interrupts A and B occuring in the same time.

Neither case can execute the instructions of both interrupts
exactly at the same time (only a multicore would execute A1
and B1 at the same time, not serially like below).

> A1:B1:A2:B2:A3:B3:A4:B4:A5:B5:A6:B6:A7:B7
>
> Instead of
>
> B1:B2:B3:B4:B5:B6:B7:A1:A2:A3:A4:A5:A6:A7
>
> Which I believe is the normal way for interrupts to behave...
>
> You may want to note the time until both threads/interrupt

The code executes in the order as you wrote above (assuming
one instruction per cycle). In both cases interrupt handling starts
and stops exactly at the same time, so there is no difference in
total interrupt latency. In both cases instructions are executed
serially, but with different interleaving. However any interleaving
(like A1:A2:A3:B1:B2:B3:B4:B5:B6:A4:A5:A6:A7:B7) is correct
as the interrupts are independent.

Now where do you see a problem? If you do, please remember
that just about all CPUs today execute interrupts serially without
any issues, and that multithreaded CPUs do interleave instructions
differently depending on circumstances (eg. other interrupts).

Wilco

Reply by Ulf Samuelsson ●April 7, 20072007-04-07

"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> skrev i meddelandet 
news:ysyRh.2432$gr2.319@newsfe4-gui.ntli.net...
>
> "Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message 
> news:ev5v86$nt5$1@aioe.org...
>
> I think this is the crux of the problem, so let's address this first:
>
>> Tell me how your interrupt system will make the pipeline execute
>> instructions for two interrupts A and B occuring in the same time.
>
> Neither case can execute the instructions of both interrupts
> exactly at the same time (only a multicore would execute A1
> and B1 at the same time, not serially like below).
>
>> A1:B1:A2:B2:A3:B3:A4:B4:A5:B5:A6:B6:A7:B7
>>
>> Instead of
>>
>> B1:B2:B3:B4:B5:B6:B7:A1:A2:A3:A4:A5:A6:A7
>>
>> Which I believe is the normal way for interrupts to behave...
>>
>> You may want to note the time until both threads/interrupt
>
> The code executes in the order as you wrote above (assuming
> one instruction per cycle). In both cases interrupt handling starts
> and stops exactly at the same time, so there is no difference in
> total interrupt latency. In both cases instructions are executed
> serially, but with different interleaving. However any interleaving
> (like A1:A2:A3:B1:B2:B3:B4:B5:B6:A4:A5:A6:A7:B7) is correct
> as the interrupts are independent.
>
> Now where do you see a problem? If you do, please remember
> that just about all CPUs today execute interrupts serially without
> any issues, and that multithreaded CPUs do interleave instructions
> differently depending on circumstances (eg. other interrupts).


If instruction A1 and B1 both read the SPI slave data from an I/O port,
the SPI masters can release the data already when B1 has completed in case 1
which is after 2 clock cycles.

If you adopt an interrupt structure, then the SPI masters can only release
the data after 8 clocks in the second case.

Your interrupt structure is in this case 4 times slower...

The latency for
                  THREADING    INTERRUPT
___________________________________
thread B             2                       1              clocks
thread A             1                       8              clocks

If you are interested in worst case performance, then the interrupt
structure is 4 times slower in reacting to the event.
If you have more interrupts, then it can take forever and ever
for the last interrupt to handle it input pin.


With the right allocation stucture you can, in a multithreaded CPU
guarantee that you are allocated a certain number of instructions
per time quanta. This is what you need to support worst case performance.

> Wilco
>

Reply by Wilco Dijkstra ●April 10, 20072007-04-10

"Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message news:ev7uof$md1$1@aioe.org...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> skrev i meddelandet 
> news:ysyRh.2432$gr2.319@newsfe4-gui.ntli.net...
>>
>> "Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message news:ev5v86$nt5$1@aioe.org...
>>
>> I think this is the crux of the problem, so let's address this first:
>>
>>> Tell me how your interrupt system will make the pipeline execute
>>> instructions for two interrupts A and B occuring in the same time.
>>
>> Neither case can execute the instructions of both interrupts
>> exactly at the same time (only a multicore would execute A1
>> and B1 at the same time, not serially like below).
>>
>>> A1:B1:A2:B2:A3:B3:A4:B4:A5:B5:A6:B6:A7:B7
>>>
>>> Instead of
>>>
>>> B1:B2:B3:B4:B5:B6:B7:A1:A2:A3:A4:A5:A6:A7
>>>
>>> Which I believe is the normal way for interrupts to behave...
>>>
>>> You may want to note the time until both threads/interrupt
>>
>> The code executes in the order as you wrote above (assuming
>> one instruction per cycle). In both cases interrupt handling starts
>> and stops exactly at the same time, so there is no difference in
>> total interrupt latency. In both cases instructions are executed
>> serially, but with different interleaving. However any interleaving
>> (like A1:A2:A3:B1:B2:B3:B4:B5:B6:A4:A5:A6:A7:B7) is correct
>> as the interrupts are independent.
>>
>> Now where do you see a problem? If you do, please remember
>> that just about all CPUs today execute interrupts serially without
>> any issues, and that multithreaded CPUs do interleave instructions
>> differently depending on circumstances (eg. other interrupts).
>
>
> If instruction A1 and B1 both read the SPI slave data from an I/O port,
> the SPI masters can release the data already when B1 has completed in case 1
> which is after 2 clock cycles.

That's a big if... You'd usually need some more instructions to signal
you've read the bit, so it is not necessarily the first instruction that is
critical. Anyway, it doesn't matter, see below.

> If you adopt an interrupt structure, then the SPI masters can only release
> the data after 8 clocks in the second case.

Correct. This will delay the next interrupt for A so that next time round
interrupts for A and B are not received at the same time.

> Your interrupt structure is in this case 4 times slower...

More accurately the first instruction has a 3 times higher latency, while
the last instruction has 75% of the latency. And when averaged over all
instructions the latency of both cases is the same...

> The latency for
>                  THREADING    INTERRUPT
> ___________________________________
> thread B             2                       1              clocks
> thread A             1                       8              clocks
>
> If you are interested in worst case performance, then the interrupt
> structure is 4 times slower in reacting to the event.

Correct, execution of the first instruction is slower indeed. However
this has nothing to do with the maximum frequency of the SPIs...

In both cases the fastest we can receive bits from the SPIs is 2 bits
every 16 cycles. Irrespectively of how many SPIs you emulate, maximum
SPI frequency depends on the total time taken of the interrupt routine.
So clearly the latency of the first instruction does not matter at all.

> If you have more interrupts, then it can take forever and ever
> for the last interrupt to handle it input pin.

Only if higher priority interrupts occur. The interrupt structure is designed
so that each interrupt can meet its worst-case deadline. This is similar to
allocating time quanta, but rather than guaranteeing a timeslot that is fast
enough to handle the worst case, you guarantee the worst case by setting
interrupt priorities. A different methodology, but the end result is the same.

> With the right allocation stucture you can, in a multithreaded CPU
> guarantee that you are allocated a certain number of instructions
> per time quanta. This is what you need to support worst case performance.

Sure (with time quanta things become more predictable but the latency
goes up too - you can't have both!). However I'm still at a loss as to why
you claim that multithreading would allow for higher frequency SPIs...

Wilco

Reply by Ulf Samuelsson ●April 11, 20072007-04-11

>> The latency for
>>                  THREADING    INTERRUPT
>> ___________________________________
>> thread B             2                       1              clocks
>> thread A             1                       8              clocks
>>
>> If you are interested in worst case performance, then the interrupt
>> structure is 4 times slower in reacting to the event.
>
> Correct, execution of the first instruction is slower indeed. However
> this has nothing to do with the maximum frequency of the SPIs...
>
> In both cases the fastest we can receive bits from the SPIs is 2 bits
> every 16 cycles. Irrespectively of how many SPIs you emulate, maximum
> SPI frequency depends on the total time taken of the interrupt routine.
> So clearly the latency of the first instruction does not matter at all.
>

You dont see how it scales.
If we assume 50/50 duyty cycle on the SPI clock.
The SPI slaves reads on positive edge and the
SPI masters alter data on the negative edge.
Data when SPI clock is low, is to be considered INVALID.

The interrupts occur on the positive edge, and the master must
keep data valid and clock high until the last interrupt has sampled its I/O.

If you have 16 SPIs, then the master cannot release the data until the last 
of the
16 interrupt routines has sampled its I/O.

So it can release after (15 * 8) + 1 = 121 clocks forcing total SPI cycle to 
be 2 * 121 = 242 cycles.
If we assume that interrupt processing take a number of clock cycles
(12 in case of Cortex) you add 16 * 12 = 192 clocks,  to 121 =  313 for half 
a period
giving total cycle = 626 cycles.

In the multithreading case, it can release after 16 cycles but is also 
limited
by the execution time of each thread so you will have an SPI cycle of 16 * 8 
= 128 cycles.

It is really very simple if you open your eyes.



>> If you have more interrupts, then it can take forever and ever
>> for the last interrupt to handle it input pin.
>
> Only if higher priority interrupts occur. The interrupt structure is 
> designed
> so that each interrupt can meet its worst-case deadline. This is similar 
> to
> allocating time quanta, but rather than guaranteeing a timeslot that is 
> fast
> enough to handle the worst case, you guarantee the worst case by setting
> interrupt priorities. A different methodology, but the end result is the 
> same.
>

No it isn't..
Worst case latency for multithreading is in this case 16 clock cycles
and 121 clock cycles for the interrupt case.
Interrupts fall to the ground when you have the same priority.

Also remember that one key function is that multithreading allows two
groups to develop S/W independent of each other and let a third party
use that S/W as a library.
If you run a real time operating systems where the two threads have to
share, then it becomes a mess, which usually results in having two CPUs
instead of two threads.


>> With the right allocation stucture you can, in a multithreaded CPU
>> guarantee that you are allocated a certain number of instructions
>> per time quanta. This is what you need to support worst case performance.
>
> Sure (with time quanta things become more predictable but the latency
> goes up too - you can't have both!). However I'm still at a loss as to why
> you claim that multithreading would allow for higher frequency SPIs...
>

Just do the numbers...


> Wilco



-- 
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB

Previous 5 67Next

Atmel releasing FLASH AVR32 ?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group