On Tue, 08 Sep 2015 14:09:41 -0700, Klaus Kragelund wrote:> Thanks. I looked deeper into the Cortex M3. It has a 3 level pipeline, > and you are right the code will be predictable. I just need to take into > account that I cannot count on the performance boost that the pipeline > offers in all casesI am not sure that the pipeline depth is an architecture feature of M3. I'd think it's an implementation detail that could change, e.g. when your current P/N is replaced. Plus, as others have said, you are at a mercy of other implementation details: caches, if any, memory bank effects, interrupt latencies, etc. If you need cycle-accurate timing, you should use something that is guaranteed to be cycle accurate, for instance the PRU units in TI AM335x, as used in e.g. BeagleBone http://beagleboard.org/pru; or a dedicated logic/FPGA.
Can you turn off Pipeline in ARM Cortex M3
Started by ●September 8, 2015
Reply by ●September 8, 20152015-09-08
Reply by ●September 8, 20152015-09-08
On 08.9.2015 г. 23:32, Tim Wescott wrote:> On Tue, 08 Sep 2015 22:13:49 +0300, Dimiter_Popoff wrote: > >> On 08.9.2015 г. 20:55, Tim Wescott wrote: >>> On Tue, 08 Sep 2015 19:48:19 +0200, David Brown wrote: >>> ..... >>>> For modern embedded PPC cores (such as Freescale's MPC5xxx families, >>>> using the z6 core), the EIEIO instruction has been replaced by the >>>> depressingly boring MBAR opcode. It's a great step backward, in my >>>> opinion. >>> >>> Man, you go to sleep for JUST ONE DECADE and they go and change things! >>> >>> I just want to know if that mnemonic was intentional -- I know it would >>> have been if I'd been on the team and had enough influence. >>> >>> >> Oh I suspect it has been intentional - the guy who did the power >> architecture has been too good to not have a sense of humour. >> The mnemonics overall are no good (few of them have made it into my vpa, >> mostly those which are cpu unique) but this one just can't have come by >> chance :-). >> >> On the OP issue - trying to do timing in the nS range using the >> processor load/store is no good. Two output compare (OC) timer outputs >> will do what is needed, there should be plenty of these on any mcu >> nowadays (???). > > If I read it right the OP is using two output compares per half bridge, > but he is concerned about a compare happening at just the wrong moment > and having insufficient dead time. >I don't get it then, what is stopping him to have the two OC-s always offset by the dead time needed? Dimiter
Reply by ●September 9, 20152015-09-09
On 9/8/2015 4:51 PM, Klaus Kragelund wrote:> > We need deadtime for sure, I am just trying to see if I can avoid > using external circuitry to blank simultaneous LS and HS active > signals. > > The processor is running at 48MHz and most instructions are executed > in less than 2 clock cycles. So something like 40ns max per > instruction. So lets say I have a control loop that updates the > compare values at 10kHz (100us period). I could add an interrupt to > trigger at the compare value and handle the deadtime in raw code (or > for long deadtime, initiate a timer to set the time). Even if this > code would take 10-20 cycles, it's still less than 1us, so 1% of the > period. I may be able to tolerate this since this is not an high end > product > > Still, using 20 cents for deadtime circuit begins to sound like a > needed option. The above constrution is not clean and will cause > jitter in the PWM duty cycle due to issues when transitions overlapI think you are making this too complicated. How much dead time do you need? What is the range of the times you need to set and what are the absolute limits? -- Rick
Reply by ●September 9, 20152015-09-09
On 08/09/15 22:51, Klaus Kragelund wrote:> On Tuesday, September 8, 2015 at 7:23:57 PM UTC+2, Tim Wescott > wrote: >> On Tue, 08 Sep 2015 05:43:15 -0700, Klaus Kragelund wrote: >> >>> Hi >>> >>> I am not an embedded expert, so please be patient >>> >>> I have an application with 6 phase PWM and the CC2650 TI >>> processor does not have deadtime support (to avoid cross >>> conduction in a 3 stage halfbridge design) >>> >>> So, I could code this so when the timer PWM compare capture is >>> updated, I check the value that is needed to setup and adjust >>> both the lowside and highside compare values. >>> >>> That requires IF statement, and no control of where the program >>> might continue in flash and thus the 3 stage pipeline in the >>> Cortex M3 must be flushed >>> >>> A colleague said it would require a lot of code to do that. But, >>> is it possible to disable the pipeline all together, so there >>> will be no flushes and time used for this check is determined by >>> the clock frequency directly? (no optimization from the >>> pipeline) >> >> I'm pretty sure that your concern is that as you change the duty >> cycle you may update one capture compare (I'm gonna call it 'CC') >> value in a way that causes both transistors to be on at the same >> time, then have the timer fire off, then update the other one -- >> yes? What, I ask, is a bit of noxious smoke between friends? >> > > You are correct, the objective is to use the microcontroller without > the crossconduction and resulting smoke. > > The PWM frequency is above 10kHz, and the update of the compare > capture can happen almost at that frequency, so for worst case I > define that at 10kHz > > We need deadtime for sure, I am just trying to see if I can avoid > using external circuitry to blank simultaneous LS and HS active > signals.Why can't you just use two separate timer outputs with non-overlapping signals? At worst, you will need an inverter on the output if the timer block does not support inverting the output on the second block, but most microcontroller timers can do that themselves (I am not familiar with the exact device you are using).> > The processor is running at 48MHz and most instructions are executed > in less than 2 clock cycles. So something like 40ns max per > instruction. So lets say I have a control loop that updates the > compare values at 10kHz (100us period). I could add an interrupt to > trigger at the compare value and handle the deadtime in raw code (or > for long deadtime, initiate a timer to set the time). Even if this > code would take 10-20 cycles, it's still less than 1us, so 1% of the > period. I may be able to tolerate this since this is not an high end > product > > Still, using 20 cents for deadtime circuit begins to sound like a > needed option. The above constrution is not clean and will cause > jitter in the PWM duty cycle due to issues when transitions overlap > > >> My first urge is to change the hardware. This situation should not >> have been allowed to develop in the first place -- either someone >> should have used a processor with dead time control, or they should >> have used gate drive circuitry with dead time control (there are >> scads of ways to do this in hardware-only), or they should have >> made damned sure that they knew how to make it work in software. >> >> If you have any influence over the hardware at all, I would start >> by checking the schematic -- if you're lucky, someone used a gate >> driver with dead-time control, meaning you can just add the >> appropriate capacitor and you're done. Or someone may have put in >> the older-style diode-and-resistor network that accomplishes the >> same thing. >> > > I have control over the HW, but need to save every cent possible > >> If all of that failed, I would check to see if the processor >> buffers the CC numbers -- some companies design their PWM >> peripherals so that the command registers are buffered and are only >> written at a specific point in the PWM cycle. If you interrupt on >> this point, and always manage to write the command values well >> within one PWM interval, then all you need to do is make sure to >> write the correct values. >> > > As far as I can see, it does not. I would prefer center aligned PWM, > and it does not support that, so I need to make adjustments to get > that working too > > I am seriously contemplating a semi SW PWM, a function that checks > the compare values and triggering a timer that when runs out triggers > the relevant output. That way I have 100% control of the PWM outputs, > but it would take a lot of computing power, but I do not care about > that > > One could perhaps even setup the DMA trigger, so on the run-out of > the timer, the relevant output is set directly by the DMAUsing DMA to transfer new values into the PWM unit might be convenient if you don't want to respond to an interrupt, but you should also manage it from an interrupt on the timer.> >> Failing all else, I would monitor the direction that the PWM is >> going, and always write the CC commands in such an order that >> during the interval that one register has been written and the >> other hasn't, the dead time is increased rather than made >> overlapping. This may cause the occasional inefficient operation >> and some strange EMI issues, but at least it won't let out the >> magic smoke. As long as your CC registers are declared volatile >> and your hardware doesn't do anything funny then you should be OK. >> >> If you are concerned that the pipeline may disorder your ordered >> memory writes, the ARM has an instruction to flush the pipeline >> before proceeding (I'm pretty sure that it's absolutely unnecessary >> in your case -- but if you're feeling paranoid it's there.) If you >> were using a PowerPC processor then I could recommend the EIEIO >> instruction which has my FAVORITE MNEMONIC EVER, but you're not, so >> you'll have to live with whatever stogy British mnemonic goes with >> the ARM stuff. > > The pipeline question was to quantify if the flushing of it would > cause a hickup/stall of the code, but I guess not. Disabling the > pipeline would make the code more determistic >Please forget about the pipeline: 1. You cannot disable the pipeline, making the discussion pointless. 2. Disabling the pipeline would not affect the predictability and determinism of the code - the pipeline is deterministic. 3. Flushing pipelines does not disable them. 4. The determinism of the cpu is dominated by effects such as memory buses, caches, instruction pre-fetches, etc. The pipeline itself would be a minor issue even if it were non-deterministic. 5. If your design depends on accuracy, predictability or jitter in the cpu execution speed, the design is broken. You are re-arranging the chairs on the deck of the Titanic.
Reply by ●September 9, 20152015-09-09
On 08/09/15 21:51, Klaus Kragelund wrote:> Disabling the pipeline would make the code more determisticNot significantly. Even with i486s, with their tiny caches, the ratio between mean and worst case (IIRC) interrupt latencies could be 10:1 (from memory 70us vs 700us). Modern processors have much larger caches, and the variations can be far larger. The disparity between cache speeds and main memory speeds is far larger, so cache misses have a larger affect on the latency. The i960 enabled its cache to be frozen, to avoid that kind of problem. Quite frankly, if you are worried about any determinism effects due to pipelining, either you are misguided or your hardware/software architecture needs to be changed. If /variations/ in instruction timing really is that critical, then you'll have to use something like the XMOS toolchain.
Reply by ●September 9, 20152015-09-09
On 09.9.2015 г. 12:18, Tom Gardner wrote:> On 08/09/15 21:51, Klaus Kragelund wrote: >> Disabling the pipeline would make the code more determistic > > Not significantly. > > Even with i486s, with their tiny caches, the ratio between > mean and worst case (IIRC) interrupt latencies could be 10:1 > (from memory 70us vs 700us).Hmmm, these are huge figures, even the 70uS is too huge I suppose. A 1 MHz 6800 had IRQ latency in the range of 30uS or so. Dimiter
Reply by ●September 9, 20152015-09-09
On Wednesday, September 9, 2015 at 8:26:56 AM UTC+2, rickman wrote:> On 9/8/2015 4:51 PM, Klaus Kragelund wrote: > > > > We need deadtime for sure, I am just trying to see if I can avoid > > using external circuitry to blank simultaneous LS and HS active > > signals. > > > > The processor is running at 48MHz and most instructions are executed > > in less than 2 clock cycles. So something like 40ns max per > > instruction. So lets say I have a control loop that updates the > > compare values at 10kHz (100us period). I could add an interrupt to > > trigger at the compare value and handle the deadtime in raw code (or > > for long deadtime, initiate a timer to set the time). Even if this > > code would take 10-20 cycles, it's still less than 1us, so 1% of the > > period. I may be able to tolerate this since this is not an high end > > product > > > > Still, using 20 cents for deadtime circuit begins to sound like a > > needed option. The above constrution is not clean and will cause > > jitter in the PWM duty cycle due to issues when transitions overlap > > I think you are making this too complicated. How much dead time do you > need? What is the range of the times you need to set and what are the > absolute limits? >The minimum deadtime is about 1us, maximum is 2-3us Regards Klaus
Reply by ●September 9, 20152015-09-09
On Wednesday, September 9, 2015 at 10:42:33 AM UTC+2, David Brown wrote:> On 08/09/15 22:51, Klaus Kragelund wrote: > > On Tuesday, September 8, 2015 at 7:23:57 PM UTC+2, Tim Wescott > > wrote: > >> On Tue, 08 Sep 2015 05:43:15 -0700, Klaus Kragelund wrote: > >> > >>> Hi > >>> > >>> I am not an embedded expert, so please be patient > >>> > >>> I have an application with 6 phase PWM and the CC2650 TI > >>> processor does not have deadtime support (to avoid cross > >>> conduction in a 3 stage halfbridge design) > >>> > >>> So, I could code this so when the timer PWM compare capture is > >>> updated, I check the value that is needed to setup and adjust > >>> both the lowside and highside compare values. > >>> > >>> That requires IF statement, and no control of where the program > >>> might continue in flash and thus the 3 stage pipeline in the > >>> Cortex M3 must be flushed > >>> > >>> A colleague said it would require a lot of code to do that. But, > >>> is it possible to disable the pipeline all together, so there > >>> will be no flushes and time used for this check is determined by > >>> the clock frequency directly? (no optimization from the > >>> pipeline) > >> > >> I'm pretty sure that your concern is that as you change the duty > >> cycle you may update one capture compare (I'm gonna call it 'CC') > >> value in a way that causes both transistors to be on at the same > >> time, then have the timer fire off, then update the other one -- > >> yes? What, I ask, is a bit of noxious smoke between friends? > >> > > > > You are correct, the objective is to use the microcontroller without > > the crossconduction and resulting smoke. > > > > The PWM frequency is above 10kHz, and the update of the compare > > capture can happen almost at that frequency, so for worst case I > > define that at 10kHz > > > > We need deadtime for sure, I am just trying to see if I can avoid > > using external circuitry to blank simultaneous LS and HS active > > signals. > > Why can't you just use two separate timer outputs with non-overlapping > signals? At worst, you will need an inverter on the output if the timer > block does not support inverting the output on the second block, but > most microcontroller timers can do that themselves (I am not familiar > with the exact device you are using). >I can generate standard 6 phase PWM that way, but I need center aligned PWM in which the individual PWM signal goes both high and low with a wait state before next cycle Regards Klaus
Reply by ●September 9, 20152015-09-09
Dimiter_Popoff <dp@tgi-sci.com> writes:> On 09.9.2015 г. 12:18, Tom Gardner wrote: >> On 08/09/15 21:51, Klaus Kragelund wrote: >>> Disabling the pipeline would make the code more determistic >> >> Not significantly. >> >> Even with i486s, with their tiny caches, the ratio between >> mean and worst case (IIRC) interrupt latencies could be 10:1 >> (from memory 70us vs 700us). > > Hmmm, these are huge figures, even the 70uS is too huge I suppose. > A 1 MHz 6800 had IRQ latency in the range of 30uS or so.We are talking about a CM3, that has sub-microsecond latency (12 cycles from memory).> > Dimiter >-- John Devereux
Reply by ●September 9, 20152015-09-09
On 09/09/15 13:09, Dimiter_Popoff wrote:> On 09.9.2015 г. 12:18, Tom Gardner wrote: >> On 08/09/15 21:51, Klaus Kragelund wrote: >>> Disabling the pipeline would make the code more determistic >> >> Not significantly. >> >> Even with i486s, with their tiny caches, the ratio between >> mean and worst case (IIRC) interrupt latencies could be 10:1 >> (from memory 70us vs 700us). > > Hmmm, these are huge figures, even the 70uS is too huge I suppose. > A 1 MHz 6800 had IRQ latency in the range of 30uS or so.The figures were from memory, but were definitely in units of time (s), not conductance (S). I've disinterred the original, "Perils of the PC Cache" by Phillip J Koopman. In that he read data from a port and put it in a simple circular queue. Naively looking at the data book indicated it would take 104 clocks. Measuring the mean time of 1e6 iterations took 149.6 clocks. The worst case took 272 clocks. So clearly I has mis-remembered clocks as microseconds, and there was a factor of 2:1 (i.e. 100%) for the I/D caches, not 10:1. With the I/D caches turned off, he measured min-max of 484-508 clocks, i.e. 5% - a considerable improvement. But then he also measured the effects when the TLB and cache could get in the way, and found that the mean was 300 clocks and the max was 900 clocks, which is the source of my 10:1. Summary: even trivial I/D caches caused a 100% variation between mean and max. variations. Switching caches off reduced that to a 5% variation. TL;DR: forget worst-case predictablity if you have caches; loss of predictability is inherent with caches.







