EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Can you turn off Pipeline in ARM Cortex M3

Started by Klaus Kragelund September 8, 2015
Hi

I am not an embedded expert, so please be patient

I have an application with 6 phase PWM and the CC2650 TI processor does not have deadtime support (to avoid cross conduction in a 3 stage halfbridge design)

So, I could code this so when the timer PWM compare capture is updated, I check the value that is needed to setup and adjust both the lowside and highside compare values.

That requires IF statement, and no control of where the program might continue in flash and thus the 3 stage pipeline in the Cortex M3 must be flushed

A colleague said it would require a lot of code to do that. But, is it possible to disable the pipeline all together, so there will be no flushes and time used for this check is determined by the clock frequency directly? (no optimization from the pipeline) 

Regards

Klaus
On 08/09/15 14:43, Klaus Kragelund wrote:
> Hi > > I am not an embedded expert, so please be patient > > I have an application with 6 phase PWM and the CC2650 TI processor > does not have deadtime support (to avoid cross conduction in a 3 > stage halfbridge design) > > So, I could code this so when the timer PWM compare capture is > updated, I check the value that is needed to setup and adjust both > the lowside and highside compare values. > > That requires IF statement, and no control of where the program might > continue in flash and thus the 3 stage pipeline in the Cortex M3 must > be flushed > > A colleague said it would require a lot of code to do that. But, is > it possible to disable the pipeline all together, so there will be no > flushes and time used for this check is determined by the clock > frequency directly? (no optimization from the pipeline) > > Regards > > Klaus >
If I understand you correctly, what you are trying to get here is cycle-accurate deterministic instruction counts for a series of instructions - i.e., you want to be sure of /exactly/ how long those instructions will take, in order to make exactly the right changes to your lowside and highside values. If that is true, then the pipeline in the cpu is only one relatively minor issue - there are many more factors that can affect exact timing. Some factors can be eliminated or reduced (depending on the details of the chip), but not all. Putting it bluntly, you don't have that sort of control - and if you think you need it, you've got a poor design (of hardware or software). Take a step back and look at what you are really trying to do, and if you have the right approach. If you conclude that you /do/ need accurate timing, but not necessarily cycle accurate, then there are various possibilities to deal with that. Disabling the cpu's pipeline is not one of those possibilities. Post some rough code, and perhaps someone can give you some ideas. (Also note what compiler you are using, as this sort of stuff can be compiler-dependent.)
On 08/09/15 14:31, David Brown wrote:
> On 08/09/15 14:43, Klaus Kragelund wrote: >> Hi >> >> I am not an embedded expert, so please be patient >> >> I have an application with 6 phase PWM and the CC2650 TI processor >> does not have deadtime support (to avoid cross conduction in a 3 >> stage halfbridge design) >> >> So, I could code this so when the timer PWM compare capture is >> updated, I check the value that is needed to setup and adjust both >> the lowside and highside compare values. >> >> That requires IF statement, and no control of where the program might >> continue in flash and thus the 3 stage pipeline in the Cortex M3 must >> be flushed >> >> A colleague said it would require a lot of code to do that. But, is >> it possible to disable the pipeline all together, so there will be no >> flushes and time used for this check is determined by the clock >> frequency directly? (no optimization from the pipeline) >> >> Regards >> >> Klaus >> > > If I understand you correctly, what you are trying to get here is > cycle-accurate deterministic instruction counts for a series of > instructions - i.e., you want to be sure of /exactly/ how long those > instructions will take, in order to make exactly the right changes to > your lowside and highside values. > > If that is true, then the pipeline in the cpu is only one relatively > minor issue - there are many more factors that can affect exact timing. > Some factors can be eliminated or reduced (depending on the details of > the chip), but not all. > > Putting it bluntly, you don't have that sort of control - and if you > think you need it, you've got a poor design (of hardware or software). > Take a step back and look at what you are really trying to do, and if > you have the right approach. > > If you conclude that you /do/ need accurate timing, but not necessarily > cycle accurate, then there are various possibilities to deal with that. > Disabling the cpu's pipeline is not one of those possibilities. Post > some rough code, and perhaps someone can give you some ideas. (Also > note what compiler you are using, as this sort of stuff can be > compiler-dependent.)
There is, of course, a significant difference between predictability, repeatability and worst-case behaviour. I have no idea whether the OP was thinking of that. If you want the compiler to predict the number of cycles required, then the only processor/compiler that I know can do that is the XMOS series. Multicore variants are surprisingly cheap at digikey. Next time I have a hard real-time control-loop, I'll look at them very seriously.
On 9/8/2015 8:43 AM, Klaus Kragelund wrote:
> Hi > > I am not an embedded expert, so please be patient > > I have an application with 6 phase PWM and the CC2650 TI processor > does not have deadtime support (to avoid cross conduction in a 3 > stage halfbridge design) > > So, I could code this so when the timer PWM compare capture is > updated, I check the value that is needed to setup and adjust both > the lowside and highside compare values. > > That requires IF statement, and no control of where the program might > continue in flash and thus the 3 stage pipeline in the Cortex M3 must > be flushed
I don't know the details of the Cortex line, but most processors assume the processing will continue in sequence and if the branch is taken the pipeline is flushed. So this is entirely predictable if you know which way the code branches. You have not indicated exactly what the concern is. Whatever your issue with the pipeline is, I doubt you really need to "turn it off" which would slow your code to as little as 1/3.
> A colleague said it would require a lot of code to do that. But, is > it possible to disable the pipeline all together, so there will be no > flushes and time used for this check is determined by the clock > frequency directly? (no optimization from the pipeline)
You haven't given much info to go on. The ARM instruction set also includes conditional instructions which are always fetched in line, but only executed if the appropriate flag is set vs. clear. I believe the timing is always the same for those. If you code in assembly I expect you can find a suitable set of code to meet your needs whatever they may be. -- Rick
On Tue, 08 Sep 2015 05:43:15 -0700, Klaus Kragelund wrote:

> Hi > > I am not an embedded expert, so please be patient > > I have an application with 6 phase PWM and the CC2650 TI processor does > not have deadtime support (to avoid cross conduction in a 3 stage > halfbridge design) > > So, I could code this so when the timer PWM compare capture is updated, > I check the value that is needed to setup and adjust both the lowside > and highside compare values. > > That requires IF statement, and no control of where the program might > continue in flash and thus the 3 stage pipeline in the Cortex M3 must be > flushed > > A colleague said it would require a lot of code to do that. But, is it > possible to disable the pipeline all together, so there will be no > flushes and time used for this check is determined by the clock > frequency directly? (no optimization from the pipeline)
I'm pretty sure that your concern is that as you change the duty cycle you may update one capture compare (I'm gonna call it 'CC') value in a way that causes both transistors to be on at the same time, then have the timer fire off, then update the other one -- yes? What, I ask, is a bit of noxious smoke between friends? My first urge is to change the hardware. This situation should not have been allowed to develop in the first place -- either someone should have used a processor with dead time control, or they should have used gate drive circuitry with dead time control (there are scads of ways to do this in hardware-only), or they should have made damned sure that they knew how to make it work in software. If you have any influence over the hardware at all, I would start by checking the schematic -- if you're lucky, someone used a gate driver with dead-time control, meaning you can just add the appropriate capacitor and you're done. Or someone may have put in the older-style diode-and-resistor network that accomplishes the same thing. If all of that failed, I would check to see if the processor buffers the CC numbers -- some companies design their PWM peripherals so that the command registers are buffered and are only written at a specific point in the PWM cycle. If you interrupt on this point, and always manage to write the command values well within one PWM interval, then all you need to do is make sure to write the correct values. Failing all else, I would monitor the direction that the PWM is going, and always write the CC commands in such an order that during the interval that one register has been written and the other hasn't, the dead time is increased rather than made overlapping. This may cause the occasional inefficient operation and some strange EMI issues, but at least it won't let out the magic smoke. As long as your CC registers are declared volatile and your hardware doesn't do anything funny then you should be OK. If you are concerned that the pipeline may disorder your ordered memory writes, the ARM has an instruction to flush the pipeline before proceeding (I'm pretty sure that it's absolutely unnecessary in your case -- but if you're feeling paranoid it's there.) If you were using a PowerPC processor then I could recommend the EIEIO instruction which has my FAVORITE MNEMONIC EVER, but you're not, so you'll have to live with whatever stogy British mnemonic goes with the ARM stuff. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On 08/09/15 16:44, Tom Gardner wrote:
> On 08/09/15 14:31, David Brown wrote: >> On 08/09/15 14:43, Klaus Kragelund wrote: >>> Hi >>> >>> I am not an embedded expert, so please be patient >>> >>> I have an application with 6 phase PWM and the CC2650 TI processor >>> does not have deadtime support (to avoid cross conduction in a 3 >>> stage halfbridge design) >>> >>> So, I could code this so when the timer PWM compare capture is >>> updated, I check the value that is needed to setup and adjust both >>> the lowside and highside compare values. >>> >>> That requires IF statement, and no control of where the program might >>> continue in flash and thus the 3 stage pipeline in the Cortex M3 must >>> be flushed >>> >>> A colleague said it would require a lot of code to do that. But, is >>> it possible to disable the pipeline all together, so there will be no >>> flushes and time used for this check is determined by the clock >>> frequency directly? (no optimization from the pipeline) >>> >>> Regards >>> >>> Klaus >>> >> >> If I understand you correctly, what you are trying to get here is >> cycle-accurate deterministic instruction counts for a series of >> instructions - i.e., you want to be sure of /exactly/ how long those >> instructions will take, in order to make exactly the right changes to >> your lowside and highside values. >> >> If that is true, then the pipeline in the cpu is only one relatively >> minor issue - there are many more factors that can affect exact timing. >> Some factors can be eliminated or reduced (depending on the details of >> the chip), but not all. >> >> Putting it bluntly, you don't have that sort of control - and if you >> think you need it, you've got a poor design (of hardware or software). >> Take a step back and look at what you are really trying to do, and if >> you have the right approach. >> >> If you conclude that you /do/ need accurate timing, but not necessarily >> cycle accurate, then there are various possibilities to deal with that. >> Disabling the cpu's pipeline is not one of those possibilities. Post >> some rough code, and perhaps someone can give you some ideas. (Also >> note what compiler you are using, as this sort of stuff can be >> compiler-dependent.) > > There is, of course, a significant difference between > predictability, repeatability and worst-case behaviour. > I have no idea whether the OP was thinking of that.
Absolutely - and once the OP has thought about the real issues and what he actually needs, we can suggest ideas to implement it.
> > If you want the compiler to predict the number of cycles > required, then the only processor/compiler that I know can > do that is the XMOS series. Multicore variants are surprisingly > cheap at digikey. Next time I have a hard real-time control-loop, > I'll look at them very seriously. >
I have used XMOS devices a little, a few years ago. They are definitely an interesting architecture (my boss always worries when a developer describes a chip or a project as "interesting" :-). The development tools were a bit problematic at that time, and their example code was a bit of a mess, but I believe things have improved since then. I would enjoy doing another project with them. Just beware that they have quite limited memory that is needed for both program and data - although XMOS are keen on doing both USB and Ethernet in software, the chips don't have enough RAM to do much with such interfaces.
On 08/09/15 19:23, Tim Wescott wrote:
> On Tue, 08 Sep 2015 05:43:15 -0700, Klaus Kragelund wrote: > >> Hi >> >> I am not an embedded expert, so please be patient >> >> I have an application with 6 phase PWM and the CC2650 TI processor does >> not have deadtime support (to avoid cross conduction in a 3 stage >> halfbridge design) >> >> So, I could code this so when the timer PWM compare capture is updated, >> I check the value that is needed to setup and adjust both the lowside >> and highside compare values. >> >> That requires IF statement, and no control of where the program might >> continue in flash and thus the 3 stage pipeline in the Cortex M3 must be >> flushed >> >> A colleague said it would require a lot of code to do that. But, is it >> possible to disable the pipeline all together, so there will be no >> flushes and time used for this check is determined by the clock >> frequency directly? (no optimization from the pipeline) > > I'm pretty sure that your concern is that as you change the duty cycle > you may update one capture compare (I'm gonna call it 'CC') value in a > way that causes both transistors to be on at the same time, then have the > timer fire off, then update the other one -- yes? What, I ask, is a bit > of noxious smoke between friends? > > My first urge is to change the hardware. This situation should not have > been allowed to develop in the first place -- either someone should have > used a processor with dead time control, or they should have used gate > drive circuitry with dead time control (there are scads of ways to do > this in hardware-only), or they should have made damned sure that they > knew how to make it work in software. > > If you have any influence over the hardware at all, I would start by > checking the schematic -- if you're lucky, someone used a gate driver > with dead-time control, meaning you can just add the appropriate > capacitor and you're done. Or someone may have put in the older-style > diode-and-resistor network that accomplishes the same thing. > > If all of that failed, I would check to see if the processor buffers the > CC numbers -- some companies design their PWM peripherals so that the > command registers are buffered and are only written at a specific point > in the PWM cycle. If you interrupt on this point, and always manage to > write the command values well within one PWM interval, then all you need > to do is make sure to write the correct values. > > Failing all else, I would monitor the direction that the PWM is going, > and always write the CC commands in such an order that during the > interval that one register has been written and the other hasn't, the > dead time is increased rather than made overlapping. This may cause the > occasional inefficient operation and some strange EMI issues, but at > least it won't let out the magic smoke. As long as your CC registers are > declared volatile and your hardware doesn't do anything funny then you > should be OK. > > If you are concerned that the pipeline may disorder your ordered memory > writes, the ARM has an instruction to flush the pipeline before > proceeding (I'm pretty sure that it's absolutely unnecessary in your case > -- but if you're feeling paranoid it's there.) If you were using a > PowerPC processor then I could recommend the EIEIO instruction which has > my FAVORITE MNEMONIC EVER, but you're not, so you'll have to live with > whatever stogy British mnemonic goes with the ARM stuff. >
For modern embedded PPC cores (such as Freescale's MPC5xxx families, using the z6 core), the EIEIO instruction has been replaced by the depressingly boring MBAR opcode. It's a great step backward, in my opinion.
On Tue, 08 Sep 2015 19:48:19 +0200, David Brown wrote:

> On 08/09/15 19:23, Tim Wescott wrote: >> On Tue, 08 Sep 2015 05:43:15 -0700, Klaus Kragelund wrote: >> >>> Hi >>> >>> I am not an embedded expert, so please be patient >>> >>> I have an application with 6 phase PWM and the CC2650 TI processor >>> does not have deadtime support (to avoid cross conduction in a 3 stage >>> halfbridge design) >>> >>> So, I could code this so when the timer PWM compare capture is >>> updated, I check the value that is needed to setup and adjust both the >>> lowside and highside compare values. >>> >>> That requires IF statement, and no control of where the program might >>> continue in flash and thus the 3 stage pipeline in the Cortex M3 must >>> be flushed >>> >>> A colleague said it would require a lot of code to do that. But, is it >>> possible to disable the pipeline all together, so there will be no >>> flushes and time used for this check is determined by the clock >>> frequency directly? (no optimization from the pipeline) >> >> I'm pretty sure that your concern is that as you change the duty cycle >> you may update one capture compare (I'm gonna call it 'CC') value in a >> way that causes both transistors to be on at the same time, then have >> the timer fire off, then update the other one -- yes? What, I ask, is >> a bit of noxious smoke between friends? >> >> My first urge is to change the hardware. This situation should not >> have been allowed to develop in the first place -- either someone >> should have used a processor with dead time control, or they should >> have used gate drive circuitry with dead time control (there are scads >> of ways to do this in hardware-only), or they should have made damned >> sure that they knew how to make it work in software. >> >> If you have any influence over the hardware at all, I would start by >> checking the schematic -- if you're lucky, someone used a gate driver >> with dead-time control, meaning you can just add the appropriate >> capacitor and you're done. Or someone may have put in the older-style >> diode-and-resistor network that accomplishes the same thing. >> >> If all of that failed, I would check to see if the processor buffers >> the CC numbers -- some companies design their PWM peripherals so that >> the command registers are buffered and are only written at a specific >> point in the PWM cycle. If you interrupt on this point, and always >> manage to write the command values well within one PWM interval, then >> all you need to do is make sure to write the correct values. >> >> Failing all else, I would monitor the direction that the PWM is going, >> and always write the CC commands in such an order that during the >> interval that one register has been written and the other hasn't, the >> dead time is increased rather than made overlapping. This may cause >> the occasional inefficient operation and some strange EMI issues, but >> at least it won't let out the magic smoke. As long as your CC >> registers are declared volatile and your hardware doesn't do anything >> funny then you should be OK. >> >> If you are concerned that the pipeline may disorder your ordered memory >> writes, the ARM has an instruction to flush the pipeline before >> proceeding (I'm pretty sure that it's absolutely unnecessary in your >> case -- but if you're feeling paranoid it's there.) If you were using >> a PowerPC processor then I could recommend the EIEIO instruction which >> has my FAVORITE MNEMONIC EVER, but you're not, so you'll have to live >> with whatever stogy British mnemonic goes with the ARM stuff. >> >> > For modern embedded PPC cores (such as Freescale's MPC5xxx families, > using the z6 core), the EIEIO instruction has been replaced by the > depressingly boring MBAR opcode. It's a great step backward, in my > opinion.
Man, you go to sleep for JUST ONE DECADE and they go and change things! I just want to know if that mnemonic was intentional -- I know it would have been if I'd been on the team and had enough influence. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On 08.9.2015 г. 20:55, Tim Wescott wrote:
> On Tue, 08 Sep 2015 19:48:19 +0200, David Brown wrote: >..... >> For modern embedded PPC cores (such as Freescale's MPC5xxx families, >> using the z6 core), the EIEIO instruction has been replaced by the >> depressingly boring MBAR opcode. It's a great step backward, in my >> opinion. > > Man, you go to sleep for JUST ONE DECADE and they go and change things! > > I just want to know if that mnemonic was intentional -- I know it would > have been if I'd been on the team and had enough influence. >
Oh I suspect it has been intentional - the guy who did the power architecture has been too good to not have a sense of humour. The mnemonics overall are no good (few of them have made it into my vpa, mostly those which are cpu unique) but this one just can't have come by chance :-). On the OP issue - trying to do timing in the nS range using the processor load/store is no good. Two output compare (OC) timer outputs will do what is needed, there should be plenty of these on any mcu nowadays (???). Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
On 08/09/15 18:42, David Brown wrote:
> On 08/09/15 16:44, Tom Gardner wrote: >> On 08/09/15 14:31, David Brown wrote: >>> On 08/09/15 14:43, Klaus Kragelund wrote: >>>> Hi >>>> >>>> I am not an embedded expert, so please be patient >>>> >>>> I have an application with 6 phase PWM and the CC2650 TI processor >>>> does not have deadtime support (to avoid cross conduction in a 3 >>>> stage halfbridge design) >>>> >>>> So, I could code this so when the timer PWM compare capture is >>>> updated, I check the value that is needed to setup and adjust both >>>> the lowside and highside compare values. >>>> >>>> That requires IF statement, and no control of where the program might >>>> continue in flash and thus the 3 stage pipeline in the Cortex M3 must >>>> be flushed >>>> >>>> A colleague said it would require a lot of code to do that. But, is >>>> it possible to disable the pipeline all together, so there will be no >>>> flushes and time used for this check is determined by the clock >>>> frequency directly? (no optimization from the pipeline) >>>> >>>> Regards >>>> >>>> Klaus >>>> >>> >>> If I understand you correctly, what you are trying to get here is >>> cycle-accurate deterministic instruction counts for a series of >>> instructions - i.e., you want to be sure of /exactly/ how long those >>> instructions will take, in order to make exactly the right changes to >>> your lowside and highside values. >>> >>> If that is true, then the pipeline in the cpu is only one relatively >>> minor issue - there are many more factors that can affect exact timing. >>> Some factors can be eliminated or reduced (depending on the details of >>> the chip), but not all. >>> >>> Putting it bluntly, you don't have that sort of control - and if you >>> think you need it, you've got a poor design (of hardware or software). >>> Take a step back and look at what you are really trying to do, and if >>> you have the right approach. >>> >>> If you conclude that you /do/ need accurate timing, but not necessarily >>> cycle accurate, then there are various possibilities to deal with that. >>> Disabling the cpu's pipeline is not one of those possibilities. Post >>> some rough code, and perhaps someone can give you some ideas. (Also >>> note what compiler you are using, as this sort of stuff can be >>> compiler-dependent.) >> >> There is, of course, a significant difference between >> predictability, repeatability and worst-case behaviour. >> I have no idea whether the OP was thinking of that. > > Absolutely - and once the OP has thought about the real issues and what he > actually needs, we can suggest ideas to implement it.
It would help if he told us his goal or problem, not his solution. 'Twas ever thus.
>> If you want the compiler to predict the number of cycles >> required, then the only processor/compiler that I know can >> do that is the XMOS series. Multicore variants are surprisingly >> cheap at digikey. Next time I have a hard real-time control-loop, >> I'll look at them very seriously. >> > > I have used XMOS devices a little, a few years ago. They are definitely an > interesting architecture (my boss always worries when a developer describes a > chip or a project as "interesting" :-).
:)
> The development tools were a bit > problematic at that time, and their example code was a bit of a mess, but I > believe things have improved since then. I would enjoy doing another project > with them. Just beware that they have quite limited memory that is needed for > both program and data - although XMOS are keen on doing both USB and Ethernet in > software, the chips don't have enough RAM to do much with such interfaces.
Just so. But I'll take the stance that a hard real-time kernel should be small, and that usb/ethernet should be out of that loop.

The 2024 Embedded Online Conference