EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Can you turn off Pipeline in ARM Cortex M3

Started by Klaus Kragelund September 8, 2015
On 09.9.2015 г. 18:07, Tom Gardner wrote:
> On 09/09/15 13:09, Dimiter_Popoff wrote: >> On 09.9.2015 г. 12:18, Tom Gardner wrote: >>> On 08/09/15 21:51, Klaus Kragelund wrote: >>>> Disabling the pipeline would make the code more determistic >>> >>> Not significantly. >>> >>> Even with i486s, with their tiny caches, the ratio between >>> mean and worst case (IIRC) interrupt latencies could be 10:1 >>> (from memory 70us vs 700us). >> >> Hmmm, these are huge figures, even the 70uS is too huge I suppose. >> A 1 MHz 6800 had IRQ latency in the range of 30uS or so. > > The figures were from memory, but were definitely in units of > time (s), not conductance (S).
Hah! I have thought for decades second was supposed to be abbreviated to a capital S.... may be it has been the case in the past, may be I remembered wrongly. I have been pretty consistent in my documentation etc., if I have used a lower case s it has been an error of mine... :-). Thanks for noting that, it may take me a while to get used to the lower case s but I'll work on it.
> > So clearly I has mis-remembered clocks as microseconds,
> ... Ah that sounds more sane. 700 is still huge but comparable to say the 70 of a 68k mcu of that era (its division was in that ballpark). Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
On Wed, 09 Sep 2015 10:18:54 +0100, Tom Gardner
<spamjunk@blueyonder.co.uk> wrote:

>Quite frankly, if you are worried about any determinism effects >due to pipelining, either you are misguided or your hardware/software >architecture needs to be changed.
The Pentium IV has a _20_ stage pipeline: a mispredicted branch on that chip has a significant impact on performance even if both branch targets are in cache. Of course, nobody in their right mind would use such a chip in a HRT system, and it is true that load stalls are far more worrisome than a pipeline flush ... but it is not "misguided" to be concerned about the pipeline length. George
On 09.9.2015 &#1075;. 16:24, John Devereux wrote:
> Dimiter_Popoff <dp@tgi-sci.com> writes: > >> On 09.9.2015 &#1075;. 12:18, Tom Gardner wrote: >>> On 08/09/15 21:51, Klaus Kragelund wrote: >>>> Disabling the pipeline would make the code more determistic >>> >>> Not significantly. >>> >>> Even with i486s, with their tiny caches, the ratio between >>> mean and worst case (IIRC) interrupt latencies could be 10:1 >>> (from memory 70us vs 700us). >> >> Hmmm, these are huge figures, even the 70uS is too huge I suppose. >> A 1 MHz 6800 had IRQ latency in the range of 30uS or so. > > We are talking about a CM3, that has sub-microsecond latency (12 cycles > from memory).
I expected something like that - do the 12 cycles include the worst case opcode in execution (probably division)? (I guess it does, trapping must take just a cycle or two). The 6800 needed 22 cycles to stack all its registers.... or may be to stack them and to fetch the vector and go there, actually I think it was the latter. Last time I have needed that figure must have been over 30 years ago, strange I remember it (unless I just think I remember something, could well be the case). Dimiter
On 09/09/15 16:51, Dimiter_Popoff wrote:
> On 09.9.2015 &#1075;. 18:07, Tom Gardner wrote: >> On 09/09/15 13:09, Dimiter_Popoff wrote: >>> On 09.9.2015 &#1075;. 12:18, Tom Gardner wrote: >>>> On 08/09/15 21:51, Klaus Kragelund wrote: >>>>> Disabling the pipeline would make the code more determistic >>>> >>>> Not significantly. >>>> >>>> Even with i486s, with their tiny caches, the ratio between >>>> mean and worst case (IIRC) interrupt latencies could be 10:1 >>>> (from memory 70us vs 700us). >>> >>> Hmmm, these are huge figures, even the 70uS is too huge I suppose. >>> A 1 MHz 6800 had IRQ latency in the range of 30uS or so. >> >> The figures were from memory, but were definitely in units of >> time (s), not conductance (S). > > Hah! I have thought for decades second was supposed to be abbreviated > to a capital S.... may be it has been the case in the past, may be > I remembered wrongly.
It never was. However, Tektronix used to get it wrong on their oscilloscopes - and as a metrology company they really ought to have know better. I suspect they finally realised the error of their ways when specifying their digitising scope's sampling rate, e.g. 100MS/s. Not that S=samples is correct!
> I have been pretty consistent in my documentation > etc., if I have used a lower case s it has been an error of mine... :-). > Thanks for noting that, it may take me a while to get used to the > lower case s but I'll work on it.
Hey! A convert! My life's work is complete :)
On 09.9.2015 &#1075;. 19:25, Tom Gardner wrote:
> On 09/09/15 16:51, Dimiter_Popoff wrote: >> On 09.9.2015 &#1075;. 18:07, Tom Gardner wrote: >>> On 09/09/15 13:09, Dimiter_Popoff wrote: >>>> On 09.9.2015 &#1075;. 12:18, Tom Gardner wrote: >>>>> On 08/09/15 21:51, Klaus Kragelund wrote: >>>>>> Disabling the pipeline would make the code more determistic >>>>> >>>>> Not significantly. >>>>> >>>>> Even with i486s, with their tiny caches, the ratio between >>>>> mean and worst case (IIRC) interrupt latencies could be 10:1 >>>>> (from memory 70us vs 700us). >>>> >>>> Hmmm, these are huge figures, even the 70uS is too huge I suppose. >>>> A 1 MHz 6800 had IRQ latency in the range of 30uS or so. >>> >>> The figures were from memory, but were definitely in units of >>> time (s), not conductance (S). >> >> Hah! I have thought for decades second was supposed to be abbreviated >> to a capital S.... may be it has been the case in the past, may be >> I remembered wrongly. > > It never was. > > However, Tektronix used to get it wrong on their oscilloscopes - and > as a metrology company they really ought to have know better. I > suspect they finally realised the error of their ways when specifying > their digitising scope's sampling rate, e.g. 100MS/s. > Not that S=samples is correct!
May be I have taken it from them. I used to look at some of their service manuals when I was making my first steps in analog design (back then I wanted to build an oscilloscope... never built it, not many projects I have left unfinished but this is one of them. Learned a lot while trying though. I may still build one, now at least I know how to :D ). Dimiter
On 09/09/15 17:41, Dimiter_Popoff wrote:
> On 09.9.2015 &#1075;. 19:25, Tom Gardner wrote: >> On 09/09/15 16:51, Dimiter_Popoff wrote: >>> On 09.9.2015 &#1075;. 18:07, Tom Gardner wrote: >>>> On 09/09/15 13:09, Dimiter_Popoff wrote: >>>>> On 09.9.2015 &#1075;. 12:18, Tom Gardner wrote: >>>>>> On 08/09/15 21:51, Klaus Kragelund wrote: >>>>>>> Disabling the pipeline would make the code more determistic >>>>>> >>>>>> Not significantly. >>>>>> >>>>>> Even with i486s, with their tiny caches, the ratio between >>>>>> mean and worst case (IIRC) interrupt latencies could be 10:1 >>>>>> (from memory 70us vs 700us). >>>>> >>>>> Hmmm, these are huge figures, even the 70uS is too huge I suppose. >>>>> A 1 MHz 6800 had IRQ latency in the range of 30uS or so. >>>> >>>> The figures were from memory, but were definitely in units of >>>> time (s), not conductance (S). >>> >>> Hah! I have thought for decades second was supposed to be abbreviated >>> to a capital S.... may be it has been the case in the past, may be >>> I remembered wrongly. >> >> It never was. >> >> However, Tektronix used to get it wrong on their oscilloscopes - and >> as a metrology company they really ought to have know better. I >> suspect they finally realised the error of their ways when specifying >> their digitising scope's sampling rate, e.g. 100MS/s. >> Not that S=samples is correct! > > May be I have taken it from them. I used to look at some of their > service manuals when I was making my first steps in analog design > (back then I wanted to build an oscilloscope... never built it, > not many projects I have left unfinished but this is one of them. > Learned a lot while trying though. I may still build one, now at > least I know how to :D ).
One of my projects, when I get A Round Tuit, is to make a 2GS/s 4GHz scope. The trick is to find a way to do it with only trivial analogue front end components. Hence no amplifiers, no ADCs, just a 50ohm input with very simple analogue components. Yes, there will be limitations, but that's half the fun :) Yes, I know you can buy remarkably fast remarkably cheap ADCs nowadays. But I want to do it for tens of dollars, not thousands :)
Den onsdag den 9. september 2015 kl. 17.57.44 UTC+2 skrev dp:
> On 09.9.2015 &#1075;. 16:24, John Devereux wrote: > > Dimiter_Popoff <dp@tgi-sci.com> writes: > > > >> On 09.9.2015 &#1075;. 12:18, Tom Gardner wrote: > >>> On 08/09/15 21:51, Klaus Kragelund wrote: > >>>> Disabling the pipeline would make the code more determistic > >>> > >>> Not significantly. > >>> > >>> Even with i486s, with their tiny caches, the ratio between > >>> mean and worst case (IIRC) interrupt latencies could be 10:1 > >>> (from memory 70us vs 700us). > >> > >> Hmmm, these are huge figures, even the 70uS is too huge I suppose. > >> A 1 MHz 6800 had IRQ latency in the range of 30uS or so. > > > > We are talking about a CM3, that has sub-microsecond latency (12 cycles > > from memory). > > I expected something like that - do the 12 cycles include the worst case > opcode in execution (probably division)? (I guess it does, trapping > must take just a cycle or two). > > The 6800 needed 22 cycles to stack all its registers.... or may be to > stack them and to fetch the vector and go there, actually I think it > was the latter. Last time I have needed that figure must have been > over 30 years ago, strange I remember it (unless I just think I > remember something, could well be the case). > > Dimiter
instructions like divide are aborted to take the interrupt, so the 12 cycles (29 with floating point) from interrupt to first isr instruction executed it the real maximum -Lasse
On Wednesday, September 9, 2015 at 7:20:16 PM UTC+2, lasselangwad...@gmail.com wrote:
> Den onsdag den 9. september 2015 kl. 17.57.44 UTC+2 skrev dp: > > On 09.9.2015 &#1075;. 16:24, John Devereux wrote: > > > Dimiter_Popoff <dp@tgi-sci.com> writes: > > > > > >> On 09.9.2015 &#1075;. 12:18, Tom Gardner wrote: > > >>> On 08/09/15 21:51, Klaus Kragelund wrote: > > >>>> Disabling the pipeline would make the code more determistic > > >>> > > >>> Not significantly. > > >>> > > >>> Even with i486s, with their tiny caches, the ratio between > > >>> mean and worst case (IIRC) interrupt latencies could be 10:1 > > >>> (from memory 70us vs 700us). > > >> > > >> Hmmm, these are huge figures, even the 70uS is too huge I suppose. > > >> A 1 MHz 6800 had IRQ latency in the range of 30uS or so. > > > > > > We are talking about a CM3, that has sub-microsecond latency (12 cycles > > > from memory). > > > > I expected something like that - do the 12 cycles include the worst case > > opcode in execution (probably division)? (I guess it does, trapping > > must take just a cycle or two). > > > > The 6800 needed 22 cycles to stack all its registers.... or may be to > > stack them and to fetch the vector and go there, actually I think it > > was the latter. Last time I have needed that figure must have been > > over 30 years ago, strange I remember it (unless I just think I > > remember something, could well be the case). > > > > Dimiter > > instructions like divide are aborted to take the interrupt, so the 12 cycles > (29 with floating point) from interrupt to first isr instruction executed it the real maximum
Yes, and that takes 600ns for at 50MHz cpu with no optimization Is it correct that the pipeline works also in the ISR latency, so that is may shorten off the 12 cycles to effective 4? (for at 3 stage pipeline) Cheers Klaus
On 09/09/15 21:47, Klaus Kragelund wrote:
> On Wednesday, September 9, 2015 at 7:20:16 PM UTC+2, > lasselangwad...@gmail.com wrote: >> Den onsdag den 9. september 2015 kl. 17.57.44 UTC+2 skrev dp: >>> On 09.9.2015 &#1075;. 16:24, John Devereux wrote: >>>> Dimiter_Popoff <dp@tgi-sci.com> writes: >>>> >>>>> On 09.9.2015 &#1075;. 12:18, Tom Gardner wrote: >>>>>> On 08/09/15 21:51, Klaus Kragelund wrote: >>>>>>> Disabling the pipeline would make the code more >>>>>>> determistic >>>>>> >>>>>> Not significantly. >>>>>> >>>>>> Even with i486s, with their tiny caches, the ratio between >>>>>> mean and worst case (IIRC) interrupt latencies could be >>>>>> 10:1 (from memory 70us vs 700us). >>>>> >>>>> Hmmm, these are huge figures, even the 70uS is too huge I >>>>> suppose. A 1 MHz 6800 had IRQ latency in the range of 30uS or >>>>> so. >>>> >>>> We are talking about a CM3, that has sub-microsecond latency >>>> (12 cycles from memory). >>> >>> I expected something like that - do the 12 cycles include the >>> worst case opcode in execution (probably division)? (I guess it >>> does, trapping must take just a cycle or two). >>> >>> The 6800 needed 22 cycles to stack all its registers.... or may >>> be to stack them and to fetch the vector and go there, actually I >>> think it was the latter. Last time I have needed that figure must >>> have been over 30 years ago, strange I remember it (unless I just >>> think I remember something, could well be the case). >>> >>> Dimiter >> >> instructions like divide are aborted to take the interrupt, so the >> 12 cycles (29 with floating point) from interrupt to first isr >> instruction executed it the real maximum > > Yes, and that takes 600ns for at 50MHz cpu with no optimization > > Is it correct that the pipeline works also in the ISR latency, so > that is may shorten off the 12 cycles to effective 4? (for at 3 stage > pipeline)
A 3-stage pipeline does not mean that the cpu does 3 instructions per clock cycle! The 12 cycle maximum latency of interrupts on an M3/M4 takes the pipeline into account, but assumes there are no latencies on memory or buses (all instructions and vectors in cache or single-cycle memory).
> > Cheers > > Klaus >
On 09.9.2015 &#1075;. 22:47, Klaus Kragelund wrote:
> .... > > Yes, and that takes 600ns for at 50MHz cpu with no optimization > > Is it correct that the pipeline works also in the ISR latency, so that is may shorten off the 12 cycles to effective 4? (for at 3 stage pipeline)
No, the pipeline needs filling so it does not improve latency. But this is not the point here. It looks as if you are trying to figure out a way to use a hammer on a screw; screws are meant to be used with screwdrivers, not with hammers. If you do not have enough timers to program them such that you will get the dead time, you can still use those you have to initiate the opening of one of the transistors and open the other after that by the core in an interrupt handler. You can only guarantee the minimum time for that, but it might be enough if the IRQ latency is in the ns range, perhaps you can afford a jitter of a miscosecond [uS is what I would have written, how am I supposed to do that now? "us" is more ambiguous? Tom?] or so. If you cannot there is nothing much better you can do with that mcu anyway. Dimiter
The 2026 Embedded Online Conference