EmbeddedRelated.com
Forums

Atmel releasing FLASH AVR32 ?

Started by -jg March 19, 2007
"Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message news:ev5v86$nt5$1@aioe.org...

I think this is the crux of the problem, so let's address this first:

> Tell me how your interrupt system will make the pipeline execute > instructions for two interrupts A and B occuring in the same time.
Neither case can execute the instructions of both interrupts exactly at the same time (only a multicore would execute A1 and B1 at the same time, not serially like below).
> A1:B1:A2:B2:A3:B3:A4:B4:A5:B5:A6:B6:A7:B7 > > Instead of > > B1:B2:B3:B4:B5:B6:B7:A1:A2:A3:A4:A5:A6:A7 > > Which I believe is the normal way for interrupts to behave... > > You may want to note the time until both threads/interrupt
The code executes in the order as you wrote above (assuming one instruction per cycle). In both cases interrupt handling starts and stops exactly at the same time, so there is no difference in total interrupt latency. In both cases instructions are executed serially, but with different interleaving. However any interleaving (like A1:A2:A3:B1:B2:B3:B4:B5:B6:A4:A5:A6:A7:B7) is correct as the interrupts are independent. Now where do you see a problem? If you do, please remember that just about all CPUs today execute interrupts serially without any issues, and that multithreaded CPUs do interleave instructions differently depending on circumstances (eg. other interrupts). Wilco
"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> skrev i meddelandet 
news:ysyRh.2432$gr2.319@newsfe4-gui.ntli.net...
> > "Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message > news:ev5v86$nt5$1@aioe.org... > > I think this is the crux of the problem, so let's address this first: > >> Tell me how your interrupt system will make the pipeline execute >> instructions for two interrupts A and B occuring in the same time. > > Neither case can execute the instructions of both interrupts > exactly at the same time (only a multicore would execute A1 > and B1 at the same time, not serially like below). > >> A1:B1:A2:B2:A3:B3:A4:B4:A5:B5:A6:B6:A7:B7 >> >> Instead of >> >> B1:B2:B3:B4:B5:B6:B7:A1:A2:A3:A4:A5:A6:A7 >> >> Which I believe is the normal way for interrupts to behave... >> >> You may want to note the time until both threads/interrupt > > The code executes in the order as you wrote above (assuming > one instruction per cycle). In both cases interrupt handling starts > and stops exactly at the same time, so there is no difference in > total interrupt latency. In both cases instructions are executed > serially, but with different interleaving. However any interleaving > (like A1:A2:A3:B1:B2:B3:B4:B5:B6:A4:A5:A6:A7:B7) is correct > as the interrupts are independent. > > Now where do you see a problem? If you do, please remember > that just about all CPUs today execute interrupts serially without > any issues, and that multithreaded CPUs do interleave instructions > differently depending on circumstances (eg. other interrupts).
If instruction A1 and B1 both read the SPI slave data from an I/O port, the SPI masters can release the data already when B1 has completed in case 1 which is after 2 clock cycles. If you adopt an interrupt structure, then the SPI masters can only release the data after 8 clocks in the second case. Your interrupt structure is in this case 4 times slower... The latency for THREADING INTERRUPT ___________________________________ thread B 2 1 clocks thread A 1 8 clocks If you are interested in worst case performance, then the interrupt structure is 4 times slower in reacting to the event. If you have more interrupts, then it can take forever and ever for the last interrupt to handle it input pin. With the right allocation stucture you can, in a multithreaded CPU guarantee that you are allocated a certain number of instructions per time quanta. This is what you need to support worst case performance.
> Wilco >
"Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message news:ev7uof$md1$1@aioe.org...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> skrev i meddelandet > news:ysyRh.2432$gr2.319@newsfe4-gui.ntli.net... >> >> "Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message news:ev5v86$nt5$1@aioe.org... >> >> I think this is the crux of the problem, so let's address this first: >> >>> Tell me how your interrupt system will make the pipeline execute >>> instructions for two interrupts A and B occuring in the same time. >> >> Neither case can execute the instructions of both interrupts >> exactly at the same time (only a multicore would execute A1 >> and B1 at the same time, not serially like below). >> >>> A1:B1:A2:B2:A3:B3:A4:B4:A5:B5:A6:B6:A7:B7 >>> >>> Instead of >>> >>> B1:B2:B3:B4:B5:B6:B7:A1:A2:A3:A4:A5:A6:A7 >>> >>> Which I believe is the normal way for interrupts to behave... >>> >>> You may want to note the time until both threads/interrupt >> >> The code executes in the order as you wrote above (assuming >> one instruction per cycle). In both cases interrupt handling starts >> and stops exactly at the same time, so there is no difference in >> total interrupt latency. In both cases instructions are executed >> serially, but with different interleaving. However any interleaving >> (like A1:A2:A3:B1:B2:B3:B4:B5:B6:A4:A5:A6:A7:B7) is correct >> as the interrupts are independent. >> >> Now where do you see a problem? If you do, please remember >> that just about all CPUs today execute interrupts serially without >> any issues, and that multithreaded CPUs do interleave instructions >> differently depending on circumstances (eg. other interrupts). > > > If instruction A1 and B1 both read the SPI slave data from an I/O port, > the SPI masters can release the data already when B1 has completed in case 1 > which is after 2 clock cycles.
That's a big if... You'd usually need some more instructions to signal you've read the bit, so it is not necessarily the first instruction that is critical. Anyway, it doesn't matter, see below.
> If you adopt an interrupt structure, then the SPI masters can only release > the data after 8 clocks in the second case.
Correct. This will delay the next interrupt for A so that next time round interrupts for A and B are not received at the same time.
> Your interrupt structure is in this case 4 times slower...
More accurately the first instruction has a 3 times higher latency, while the last instruction has 75% of the latency. And when averaged over all instructions the latency of both cases is the same...
> The latency for > THREADING INTERRUPT > ___________________________________ > thread B 2 1 clocks > thread A 1 8 clocks > > If you are interested in worst case performance, then the interrupt > structure is 4 times slower in reacting to the event.
Correct, execution of the first instruction is slower indeed. However this has nothing to do with the maximum frequency of the SPIs... In both cases the fastest we can receive bits from the SPIs is 2 bits every 16 cycles. Irrespectively of how many SPIs you emulate, maximum SPI frequency depends on the total time taken of the interrupt routine. So clearly the latency of the first instruction does not matter at all.
> If you have more interrupts, then it can take forever and ever > for the last interrupt to handle it input pin.
Only if higher priority interrupts occur. The interrupt structure is designed so that each interrupt can meet its worst-case deadline. This is similar to allocating time quanta, but rather than guaranteeing a timeslot that is fast enough to handle the worst case, you guarantee the worst case by setting interrupt priorities. A different methodology, but the end result is the same.
> With the right allocation stucture you can, in a multithreaded CPU > guarantee that you are allocated a certain number of instructions > per time quanta. This is what you need to support worst case performance.
Sure (with time quanta things become more predictable but the latency goes up too - you can't have both!). However I'm still at a loss as to why you claim that multithreading would allow for higher frequency SPIs... Wilco
>> The latency for >> THREADING INTERRUPT >> ___________________________________ >> thread B 2 1 clocks >> thread A 1 8 clocks >> >> If you are interested in worst case performance, then the interrupt >> structure is 4 times slower in reacting to the event. > > Correct, execution of the first instruction is slower indeed. However > this has nothing to do with the maximum frequency of the SPIs... > > In both cases the fastest we can receive bits from the SPIs is 2 bits > every 16 cycles. Irrespectively of how many SPIs you emulate, maximum > SPI frequency depends on the total time taken of the interrupt routine. > So clearly the latency of the first instruction does not matter at all. >
You dont see how it scales. If we assume 50/50 duyty cycle on the SPI clock. The SPI slaves reads on positive edge and the SPI masters alter data on the negative edge. Data when SPI clock is low, is to be considered INVALID. The interrupts occur on the positive edge, and the master must keep data valid and clock high until the last interrupt has sampled its I/O. If you have 16 SPIs, then the master cannot release the data until the last of the 16 interrupt routines has sampled its I/O. So it can release after (15 * 8) + 1 = 121 clocks forcing total SPI cycle to be 2 * 121 = 242 cycles. If we assume that interrupt processing take a number of clock cycles (12 in case of Cortex) you add 16 * 12 = 192 clocks, to 121 = 313 for half a period giving total cycle = 626 cycles. In the multithreading case, it can release after 16 cycles but is also limited by the execution time of each thread so you will have an SPI cycle of 16 * 8 = 128 cycles. It is really very simple if you open your eyes.
>> If you have more interrupts, then it can take forever and ever >> for the last interrupt to handle it input pin. > > Only if higher priority interrupts occur. The interrupt structure is > designed > so that each interrupt can meet its worst-case deadline. This is similar > to > allocating time quanta, but rather than guaranteeing a timeslot that is > fast > enough to handle the worst case, you guarantee the worst case by setting > interrupt priorities. A different methodology, but the end result is the > same. >
No it isn't.. Worst case latency for multithreading is in this case 16 clock cycles and 121 clock cycles for the interrupt case. Interrupts fall to the ground when you have the same priority. Also remember that one key function is that multithreading allows two groups to develop S/W independent of each other and let a third party use that S/W as a library. If you run a real time operating systems where the two threads have to share, then it becomes a mess, which usually results in having two CPUs instead of two threads.
>> With the right allocation stucture you can, in a multithreaded CPU >> guarantee that you are allocated a certain number of instructions >> per time quanta. This is what you need to support worst case performance. > > Sure (with time quanta things become more predictable but the latency > goes up too - you can't have both!). However I'm still at a loss as to why > you claim that multithreading would allow for higher frequency SPIs... >
Just do the numbers...
> Wilco
-- Best Regards, Ulf Samuelsson This is intended to be my personal opinion which may, or may not be shared by my employer Atmel Nordic AB