Can you turn off Pipeline in ARM Cortex M3

Hi

I am not an embedded expert, so please be patient

I have an application with 6 phase PWM and the CC2650 TI processor does not have deadtime support (to avoid cross conduction in a 3 stage halfbridge design)

So, I could code this so when the timer PWM compare capture is updated, I check the value that is needed to setup and adjust both the lowside and highside compare values.

That requires IF statement, and no control of where the program might continue in flash and thus the 3 stage pipeline in the Cortex M3 must be flushed

A colleague said it would require a lot of code to do that. But, is it possible to disable the pipeline all together, so there will be no flushes and time used for this check is determined by the clock frequency directly? (no optimization from the pipeline) 

Regards

Klaus

Reply by David Brown ●September 8, 20152015-09-08

On 08/09/15 14:43, Klaus Kragelund wrote:
> Hi
> 
> I am not an embedded expert, so please be patient
> 
> I have an application with 6 phase PWM and the CC2650 TI processor
> does not have deadtime support (to avoid cross conduction in a 3
> stage halfbridge design)
> 
> So, I could code this so when the timer PWM compare capture is
> updated, I check the value that is needed to setup and adjust both
> the lowside and highside compare values.
> 
> That requires IF statement, and no control of where the program might
> continue in flash and thus the 3 stage pipeline in the Cortex M3 must
> be flushed
> 
> A colleague said it would require a lot of code to do that. But, is
> it possible to disable the pipeline all together, so there will be no
> flushes and time used for this check is determined by the clock
> frequency directly? (no optimization from the pipeline)
> 
> Regards
> 
> Klaus
> 

If I understand you correctly, what you are trying to get here is
cycle-accurate deterministic instruction counts for a series of
instructions - i.e., you want to be sure of /exactly/ how long those
instructions will take, in order to make exactly the right changes to
your lowside and highside values.

If that is true, then the pipeline in the cpu is only one relatively
minor issue - there are many more factors that can affect exact timing.
 Some factors can be eliminated or reduced (depending on the details of
the chip), but not all.

Putting it bluntly, you don't have that sort of control - and if you
think you need it, you've got a poor design (of hardware or software).
Take a step back and look at what you are really trying to do, and if
you have the right approach.

If you conclude that you /do/ need accurate timing, but not necessarily
cycle accurate, then there are various possibilities to deal with that.
 Disabling the cpu's pipeline is not one of those possibilities.  Post
some rough code, and perhaps someone can give you some ideas.  (Also
note what compiler you are using, as this sort of stuff can be
compiler-dependent.)

Reply by Tom Gardner ●September 8, 20152015-09-08

On 08/09/15 14:31, David Brown wrote:
> On 08/09/15 14:43, Klaus Kragelund wrote:
>> Hi
>>
>> I am not an embedded expert, so please be patient
>>
>> I have an application with 6 phase PWM and the CC2650 TI processor
>> does not have deadtime support (to avoid cross conduction in a 3
>> stage halfbridge design)
>>
>> So, I could code this so when the timer PWM compare capture is
>> updated, I check the value that is needed to setup and adjust both
>> the lowside and highside compare values.
>>
>> That requires IF statement, and no control of where the program might
>> continue in flash and thus the 3 stage pipeline in the Cortex M3 must
>> be flushed
>>
>> A colleague said it would require a lot of code to do that. But, is
>> it possible to disable the pipeline all together, so there will be no
>> flushes and time used for this check is determined by the clock
>> frequency directly? (no optimization from the pipeline)
>>
>> Regards
>>
>> Klaus
>>
>
> If I understand you correctly, what you are trying to get here is
> cycle-accurate deterministic instruction counts for a series of
> instructions - i.e., you want to be sure of /exactly/ how long those
> instructions will take, in order to make exactly the right changes to
> your lowside and highside values.
>
> If that is true, then the pipeline in the cpu is only one relatively
> minor issue - there are many more factors that can affect exact timing.
>   Some factors can be eliminated or reduced (depending on the details of
> the chip), but not all.
>
> Putting it bluntly, you don't have that sort of control - and if you
> think you need it, you've got a poor design (of hardware or software).
> Take a step back and look at what you are really trying to do, and if
> you have the right approach.
>
> If you conclude that you /do/ need accurate timing, but not necessarily
> cycle accurate, then there are various possibilities to deal with that.
>   Disabling the cpu's pipeline is not one of those possibilities.  Post
> some rough code, and perhaps someone can give you some ideas.  (Also
> note what compiler you are using, as this sort of stuff can be
> compiler-dependent.)

There is, of course, a significant difference between
predictability, repeatability and worst-case behaviour.
I have no idea whether the OP was thinking of that.

If you want the compiler to predict the number of cycles
required, then the only processor/compiler that I know can
do that is the XMOS series. Multicore variants are surprisingly
cheap at digikey. Next time I have a hard real-time control-loop,
I'll look at them very seriously.

Reply by rickman ●September 8, 20152015-09-08

On 9/8/2015 8:43 AM, Klaus Kragelund wrote:
> Hi
>
> I am not an embedded expert, so please be patient
>
> I have an application with 6 phase PWM and the CC2650 TI processor
> does not have deadtime support (to avoid cross conduction in a 3
> stage halfbridge design)
>
> So, I could code this so when the timer PWM compare capture is
> updated, I check the value that is needed to setup and adjust both
> the lowside and highside compare values.
>
> That requires IF statement, and no control of where the program might
> continue in flash and thus the 3 stage pipeline in the Cortex M3 must
> be flushed

I don't know the details of the Cortex line, but most processors assume 
the processing will continue in sequence and if the branch is taken the 
pipeline is flushed.  So this is entirely predictable if you know which 
way the code branches.  You have not indicated exactly what the concern 
is.  Whatever your issue with the pipeline is, I doubt you really need 
to "turn it off" which would slow your code to as little as 1/3.

> A colleague said it would require a lot of code to do that. But, is
> it possible to disable the pipeline all together, so there will be no
> flushes and time used for this check is determined by the clock
> frequency directly? (no optimization from the pipeline)

You haven't given much info to go on.  The ARM instruction set also 
includes conditional instructions which are always fetched in line, but 
only executed if the appropriate flag is set vs. clear.  I believe the 
timing is always the same for those.  If you code in assembly I expect 
you can find a suitable set of code to meet your needs whatever they may 
be.

-- 

Rick

Reply by Tim Wescott ●September 8, 20152015-09-08

On Tue, 08 Sep 2015 05:43:15 -0700, Klaus Kragelund wrote:

> Hi
> 
> I am not an embedded expert, so please be patient
> 
> I have an application with 6 phase PWM and the CC2650 TI processor does
> not have deadtime support (to avoid cross conduction in a 3 stage
> halfbridge design)
> 
> So, I could code this so when the timer PWM compare capture is updated,
> I check the value that is needed to setup and adjust both the lowside
> and highside compare values.
> 
> That requires IF statement, and no control of where the program might
> continue in flash and thus the 3 stage pipeline in the Cortex M3 must be
> flushed
> 
> A colleague said it would require a lot of code to do that. But, is it
> possible to disable the pipeline all together, so there will be no
> flushes and time used for this check is determined by the clock
> frequency directly? (no optimization from the pipeline)

I'm pretty sure that your concern is that as you change the duty cycle 
you may update one capture compare (I'm gonna call it 'CC') value in a 
way that causes both transistors to be on at the same time, then have the 
timer fire off, then update the other one -- yes?  What, I ask, is a bit 
of noxious smoke between friends?

My first urge is to change the hardware.  This situation should not have 
been allowed to develop in the first place -- either someone should have 
used a processor with dead time control, or they should have used gate 
drive circuitry with dead time control (there are scads of ways to do 
this in hardware-only), or they should have made damned sure that they 
knew how to make it work in software.

If you have any influence over the hardware at all, I would start by 
checking the schematic -- if you're lucky, someone used a gate driver 
with dead-time control, meaning you can just add the appropriate 
capacitor and you're done.  Or someone may have put in the older-style 
diode-and-resistor network that accomplishes the same thing.

If all of that failed, I would check to see if the processor buffers the 
CC numbers -- some companies design their PWM peripherals so that the 
command registers are buffered and are only written at a specific point 
in the PWM cycle.  If you interrupt on this point, and always manage to 
write the command values well within one PWM interval, then all you need 
to do is make sure to write the correct values.

Failing all else, I would monitor the direction that the PWM is going, 
and always write the CC commands in such an order that during the 
interval that one register has been written and the other hasn't, the 
dead time is increased rather than made overlapping.  This may cause the 
occasional inefficient operation and some strange EMI issues, but at 
least it won't let out the magic smoke.  As long as your CC registers are 
declared volatile and your hardware doesn't do anything funny then you 
should be OK.

If you are concerned that the pipeline may disorder your ordered memory 
writes, the ARM has an instruction to flush the pipeline before 
proceeding (I'm pretty sure that it's absolutely unnecessary in your case 
-- but if you're feeling paranoid it's there.)  If you were using a 
PowerPC processor then I could recommend the EIEIO instruction which has 
my FAVORITE MNEMONIC EVER, but you're not, so you'll have to live with 
whatever stogy British mnemonic goes with the ARM stuff.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by David Brown ●September 8, 20152015-09-08

On 08/09/15 16:44, Tom Gardner wrote:
> On 08/09/15 14:31, David Brown wrote:
>> On 08/09/15 14:43, Klaus Kragelund wrote:
>>> Hi
>>>
>>> I am not an embedded expert, so please be patient
>>>
>>> I have an application with 6 phase PWM and the CC2650 TI processor
>>> does not have deadtime support (to avoid cross conduction in a 3
>>> stage halfbridge design)
>>>
>>> So, I could code this so when the timer PWM compare capture is
>>> updated, I check the value that is needed to setup and adjust both
>>> the lowside and highside compare values.
>>>
>>> That requires IF statement, and no control of where the program might
>>> continue in flash and thus the 3 stage pipeline in the Cortex M3 must
>>> be flushed
>>>
>>> A colleague said it would require a lot of code to do that. But, is
>>> it possible to disable the pipeline all together, so there will be no
>>> flushes and time used for this check is determined by the clock
>>> frequency directly? (no optimization from the pipeline)
>>>
>>> Regards
>>>
>>> Klaus
>>>
>>
>> If I understand you correctly, what you are trying to get here is
>> cycle-accurate deterministic instruction counts for a series of
>> instructions - i.e., you want to be sure of /exactly/ how long those
>> instructions will take, in order to make exactly the right changes to
>> your lowside and highside values.
>>
>> If that is true, then the pipeline in the cpu is only one relatively
>> minor issue - there are many more factors that can affect exact timing.
>>   Some factors can be eliminated or reduced (depending on the details of
>> the chip), but not all.
>>
>> Putting it bluntly, you don't have that sort of control - and if you
>> think you need it, you've got a poor design (of hardware or software).
>> Take a step back and look at what you are really trying to do, and if
>> you have the right approach.
>>
>> If you conclude that you /do/ need accurate timing, but not necessarily
>> cycle accurate, then there are various possibilities to deal with that.
>>   Disabling the cpu's pipeline is not one of those possibilities.  Post
>> some rough code, and perhaps someone can give you some ideas.  (Also
>> note what compiler you are using, as this sort of stuff can be
>> compiler-dependent.)
>
> There is, of course, a significant difference between
> predictability, repeatability and worst-case behaviour.
> I have no idea whether the OP was thinking of that.

Absolutely - and once the OP has thought about the real issues and what 
he actually needs, we can suggest ideas to implement it.

>
> If you want the compiler to predict the number of cycles
> required, then the only processor/compiler that I know can
> do that is the XMOS series. Multicore variants are surprisingly
> cheap at digikey. Next time I have a hard real-time control-loop,
> I'll look at them very seriously.
>

I have used XMOS devices a little, a few years ago.  They are definitely 
an interesting architecture (my boss always worries when a developer 
describes a chip or a project as "interesting" :-).  The development 
tools were a bit problematic at that time, and their example code was a 
bit of a mess, but I believe things have improved since then.  I would 
enjoy doing another project with them.  Just beware that they have quite 
limited memory that is needed for both program and data - although XMOS 
are keen on doing both USB and Ethernet in software, the chips don't 
have enough RAM to do much with such interfaces.

Reply by David Brown ●September 8, 20152015-09-08

On 08/09/15 19:23, Tim Wescott wrote:
> On Tue, 08 Sep 2015 05:43:15 -0700, Klaus Kragelund wrote:
>
>> Hi
>>
>> I am not an embedded expert, so please be patient
>>
>> I have an application with 6 phase PWM and the CC2650 TI processor does
>> not have deadtime support (to avoid cross conduction in a 3 stage
>> halfbridge design)
>>
>> So, I could code this so when the timer PWM compare capture is updated,
>> I check the value that is needed to setup and adjust both the lowside
>> and highside compare values.
>>
>> That requires IF statement, and no control of where the program might
>> continue in flash and thus the 3 stage pipeline in the Cortex M3 must be
>> flushed
>>
>> A colleague said it would require a lot of code to do that. But, is it
>> possible to disable the pipeline all together, so there will be no
>> flushes and time used for this check is determined by the clock
>> frequency directly? (no optimization from the pipeline)
>
> I'm pretty sure that your concern is that as you change the duty cycle
> you may update one capture compare (I'm gonna call it 'CC') value in a
> way that causes both transistors to be on at the same time, then have the
> timer fire off, then update the other one -- yes?  What, I ask, is a bit
> of noxious smoke between friends?
>
> My first urge is to change the hardware.  This situation should not have
> been allowed to develop in the first place -- either someone should have
> used a processor with dead time control, or they should have used gate
> drive circuitry with dead time control (there are scads of ways to do
> this in hardware-only), or they should have made damned sure that they
> knew how to make it work in software.
>
> If you have any influence over the hardware at all, I would start by
> checking the schematic -- if you're lucky, someone used a gate driver
> with dead-time control, meaning you can just add the appropriate
> capacitor and you're done.  Or someone may have put in the older-style
> diode-and-resistor network that accomplishes the same thing.
>
> If all of that failed, I would check to see if the processor buffers the
> CC numbers -- some companies design their PWM peripherals so that the
> command registers are buffered and are only written at a specific point
> in the PWM cycle.  If you interrupt on this point, and always manage to
> write the command values well within one PWM interval, then all you need
> to do is make sure to write the correct values.
>
> Failing all else, I would monitor the direction that the PWM is going,
> and always write the CC commands in such an order that during the
> interval that one register has been written and the other hasn't, the
> dead time is increased rather than made overlapping.  This may cause the
> occasional inefficient operation and some strange EMI issues, but at
> least it won't let out the magic smoke.  As long as your CC registers are
> declared volatile and your hardware doesn't do anything funny then you
> should be OK.
>
> If you are concerned that the pipeline may disorder your ordered memory
> writes, the ARM has an instruction to flush the pipeline before
> proceeding (I'm pretty sure that it's absolutely unnecessary in your case
> -- but if you're feeling paranoid it's there.)  If you were using a
> PowerPC processor then I could recommend the EIEIO instruction which has
> my FAVORITE MNEMONIC EVER, but you're not, so you'll have to live with
> whatever stogy British mnemonic goes with the ARM stuff.
>

For modern embedded PPC cores (such as Freescale's MPC5xxx families, 
using the z6 core), the EIEIO instruction has been replaced by the 
depressingly boring MBAR opcode.  It's a great step backward, in my opinion.

Reply by Tim Wescott ●September 8, 20152015-09-08

On Tue, 08 Sep 2015 19:48:19 +0200, David Brown wrote:

> On 08/09/15 19:23, Tim Wescott wrote:
>> On Tue, 08 Sep 2015 05:43:15 -0700, Klaus Kragelund wrote:
>>
>>> Hi
>>>
>>> I am not an embedded expert, so please be patient
>>>
>>> I have an application with 6 phase PWM and the CC2650 TI processor
>>> does not have deadtime support (to avoid cross conduction in a 3 stage
>>> halfbridge design)
>>>
>>> So, I could code this so when the timer PWM compare capture is
>>> updated, I check the value that is needed to setup and adjust both the
>>> lowside and highside compare values.
>>>
>>> That requires IF statement, and no control of where the program might
>>> continue in flash and thus the 3 stage pipeline in the Cortex M3 must
>>> be flushed
>>>
>>> A colleague said it would require a lot of code to do that. But, is it
>>> possible to disable the pipeline all together, so there will be no
>>> flushes and time used for this check is determined by the clock
>>> frequency directly? (no optimization from the pipeline)
>>
>> I'm pretty sure that your concern is that as you change the duty cycle
>> you may update one capture compare (I'm gonna call it 'CC') value in a
>> way that causes both transistors to be on at the same time, then have
>> the timer fire off, then update the other one -- yes?  What, I ask, is
>> a bit of noxious smoke between friends?
>>
>> My first urge is to change the hardware.  This situation should not
>> have been allowed to develop in the first place -- either someone
>> should have used a processor with dead time control, or they should
>> have used gate drive circuitry with dead time control (there are scads
>> of ways to do this in hardware-only), or they should have made damned
>> sure that they knew how to make it work in software.
>>
>> If you have any influence over the hardware at all, I would start by
>> checking the schematic -- if you're lucky, someone used a gate driver
>> with dead-time control, meaning you can just add the appropriate
>> capacitor and you're done.  Or someone may have put in the older-style
>> diode-and-resistor network that accomplishes the same thing.
>>
>> If all of that failed, I would check to see if the processor buffers
>> the CC numbers -- some companies design their PWM peripherals so that
>> the command registers are buffered and are only written at a specific
>> point in the PWM cycle.  If you interrupt on this point, and always
>> manage to write the command values well within one PWM interval, then
>> all you need to do is make sure to write the correct values.
>>
>> Failing all else, I would monitor the direction that the PWM is going,
>> and always write the CC commands in such an order that during the
>> interval that one register has been written and the other hasn't, the
>> dead time is increased rather than made overlapping.  This may cause
>> the occasional inefficient operation and some strange EMI issues, but
>> at least it won't let out the magic smoke.  As long as your CC
>> registers are declared volatile and your hardware doesn't do anything
>> funny then you should be OK.
>>
>> If you are concerned that the pipeline may disorder your ordered memory
>> writes, the ARM has an instruction to flush the pipeline before
>> proceeding (I'm pretty sure that it's absolutely unnecessary in your
>> case -- but if you're feeling paranoid it's there.)  If you were using
>> a PowerPC processor then I could recommend the EIEIO instruction which
>> has my FAVORITE MNEMONIC EVER, but you're not, so you'll have to live
>> with whatever stogy British mnemonic goes with the ARM stuff.
>>
>>
> For modern embedded PPC cores (such as Freescale's MPC5xxx families,
> using the z6 core), the EIEIO instruction has been replaced by the
> depressingly boring MBAR opcode.  It's a great step backward, in my
> opinion.

Man, you go to sleep for JUST ONE DECADE and they go and change things!

I just want to know if that mnemonic was intentional -- I know it would 
have been if I'd been on the team and had enough influence.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Dimiter_Popoff ●September 8, 20152015-09-08

On 08.9.2015 &#1075;. 20:55, Tim Wescott wrote:
> On Tue, 08 Sep 2015 19:48:19 +0200, David Brown wrote:
>.....
>> For modern embedded PPC cores (such as Freescale's MPC5xxx families,
>> using the z6 core), the EIEIO instruction has been replaced by the
>> depressingly boring MBAR opcode.  It's a great step backward, in my
>> opinion.
>
> Man, you go to sleep for JUST ONE DECADE and they go and change things!
>
> I just want to know if that mnemonic was intentional -- I know it would
> have been if I'd been on the team and had enough influence.
>

Oh I suspect it has been intentional - the guy who did the power
architecture has been too good to not have a sense of humour.
The mnemonics overall are no good (few of them have made it into
my vpa, mostly those which are cpu unique) but this one just can't
have come by chance :-).

On the OP issue - trying to do timing in the nS range using
the processor load/store is no good. Two output compare (OC)
timer outputs will do what is needed, there should be plenty of
these on any mcu nowadays (???).

Dimiter

------------------------------------------------------
Dimiter Popoff, TGI             http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/

Reply by Tom Gardner ●September 8, 20152015-09-08

On 08/09/15 18:42, David Brown wrote:
> On 08/09/15 16:44, Tom Gardner wrote:
>> On 08/09/15 14:31, David Brown wrote:
>>> On 08/09/15 14:43, Klaus Kragelund wrote:
>>>> Hi
>>>>
>>>> I am not an embedded expert, so please be patient
>>>>
>>>> I have an application with 6 phase PWM and the CC2650 TI processor
>>>> does not have deadtime support (to avoid cross conduction in a 3
>>>> stage halfbridge design)
>>>>
>>>> So, I could code this so when the timer PWM compare capture is
>>>> updated, I check the value that is needed to setup and adjust both
>>>> the lowside and highside compare values.
>>>>
>>>> That requires IF statement, and no control of where the program might
>>>> continue in flash and thus the 3 stage pipeline in the Cortex M3 must
>>>> be flushed
>>>>
>>>> A colleague said it would require a lot of code to do that. But, is
>>>> it possible to disable the pipeline all together, so there will be no
>>>> flushes and time used for this check is determined by the clock
>>>> frequency directly? (no optimization from the pipeline)
>>>>
>>>> Regards
>>>>
>>>> Klaus
>>>>
>>>
>>> If I understand you correctly, what you are trying to get here is
>>> cycle-accurate deterministic instruction counts for a series of
>>> instructions - i.e., you want to be sure of /exactly/ how long those
>>> instructions will take, in order to make exactly the right changes to
>>> your lowside and highside values.
>>>
>>> If that is true, then the pipeline in the cpu is only one relatively
>>> minor issue - there are many more factors that can affect exact timing.
>>>   Some factors can be eliminated or reduced (depending on the details of
>>> the chip), but not all.
>>>
>>> Putting it bluntly, you don't have that sort of control - and if you
>>> think you need it, you've got a poor design (of hardware or software).
>>> Take a step back and look at what you are really trying to do, and if
>>> you have the right approach.
>>>
>>> If you conclude that you /do/ need accurate timing, but not necessarily
>>> cycle accurate, then there are various possibilities to deal with that.
>>>   Disabling the cpu's pipeline is not one of those possibilities.  Post
>>> some rough code, and perhaps someone can give you some ideas.  (Also
>>> note what compiler you are using, as this sort of stuff can be
>>> compiler-dependent.)
>>
>> There is, of course, a significant difference between
>> predictability, repeatability and worst-case behaviour.
>> I have no idea whether the OP was thinking of that.
>
> Absolutely - and once the OP has thought about the real issues and what he
> actually needs, we can suggest ideas to implement it.

It would help if he told us his goal or problem, not his
solution. 'Twas ever thus.


>> If you want the compiler to predict the number of cycles
>> required, then the only processor/compiler that I know can
>> do that is the XMOS series. Multicore variants are surprisingly
>> cheap at digikey. Next time I have a hard real-time control-loop,
>> I'll look at them very seriously.
>>
>
> I have used XMOS devices a little, a few years ago.  They are definitely an
> interesting architecture (my boss always worries when a developer describes a
> chip or a project as "interesting" :-).

:)

> The development tools were a bit
> problematic at that time, and their example code was a bit of a mess, but I
> believe things have improved since then.  I would enjoy doing another project
> with them.  Just beware that they have quite limited memory that is needed for
> both program and data - although XMOS are keen on doing both USB and Ethernet in
> software, the chips don't have enough RAM to do much with such interfaces.

Just so. But I'll take the stance that a hard real-time kernel
should be small, and that usb/ethernet should be out of that loop.

Previous12 3 4 5 6 Next

Can you turn off Pipeline in ARM Cortex M3

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group