Short blocking delay in Cortex-M0+| page 2

Reply by Dave Nadler ●February 24, 20172017-02-24

On Friday, February 24, 2017 at 12:58:26 PM UTC-5, Stefan Reuther wrote:
> If you want a piece of assembly to show up in your binary, write that
> piece of assembly (and not some piece of C code that happens to produce
> that piece of assembly today). Even with volatile, the compiler is free
> to decide between a register or a variable, possibly different increment
> and compare instructions, unrolling the loop, etc.

Even that is not enough; GCC will optimize out assembler NOPs if you're
not careful. For Cortex-M4, NOPs get eaten in the pipeline; can't remember
about M0.

Regardless, if you want a reliable delay: 

1) You MUST check the delay on a scope, in RELEASE build with
the optimization settings you expect for production.

2) You MUST recheck with the scope after any toolchain update or optimization
setting change.

Hope that helps!
Best Regards, Dave

PS: Just last week I had to do a short delay with a volatile iterator
for loop, and check it on the scope ;-)

Reply by mrfirmware ●March 22, 20172017-03-22

On Thursday, February 23, 2017 at 12:00:23 PM UTC-5, John Speth wrote:
> On 2/23/2017 8:15 AM, pozz wrote:
> > During startup, I need a short and not precise delay, before configuring
> > clocks, timers and other peripherals (at startup the CPU runs with
> > internal clock).
> >
> > What do you suggest?
> >
> > I think there's a simpler method than configuring a hardware timer.
> >
> > I need to check the status of an input pin, *after* enabling internal
> > pull-up. I'd like to introduce a short delay after enabling internal
> > pull-up, otherwise there's a risk I will read a transient level (maybe 0
> > or 1).
> 
> First thing after main() starts, configure your pin and run a spin loop 
> based delay, then read the pin.  There's probably no need for a timer at 
> that stage of start up.
> 
> void delay(void)
> {
>    // Use volatile so the optimizer will not nullify this code
>    volatile int i;
>    for(i = 0; i < YOUR_DELAY; i++);
> }
> 
> JJS

How is it legal for the compiler to optimize out the iterator? If it does, your compiler is broken. Don't confuse <optimized out> from gdb with "I removed your code." That for loop will execute because you told the compiler to loop for YOUR_DELAY counts. It must emit that code even if 'i' is not visible in the debugger. Volatile is only needed when the compiler cannot see any possible path for the variable to change with in the current scope of execution, e.g. ISR, another thread, or a hardware mapped address.

As for a delay in this case I'd read that GPIO port N times to create the delay. The port read will have a longer execution duration and you may be able to empirically determine how many reads it takes before it stabilizes. Often times port reads can be rather CPU clock independent and depend more on the port peripheral clock so you may even be able to get reliable timing regardless of the CPU clock.

Reply by mrfirmware ●March 22, 20172017-03-22

On Wednesday, March 22, 2017 at 12:29:29 PM UTC-4, mrfirmware wrote:
> On Thursday, February 23, 2017 at 12:00:23 PM UTC-5, John Speth wrote:
> > On 2/23/2017 8:15 AM, pozz wrote:
> > > During startup, I need a short and not precise delay, before configuring
> > > clocks, timers and other peripherals (at startup the CPU runs with
> > > internal clock).
> > >
> > > What do you suggest?
> > >
> > > I think there's a simpler method than configuring a hardware timer.
> > >
> > > I need to check the status of an input pin, *after* enabling internal
> > > pull-up. I'd like to introduce a short delay after enabling internal
> > > pull-up, otherwise there's a risk I will read a transient level (maybe 0
> > > or 1).
> > 
> > First thing after main() starts, configure your pin and run a spin loop 
> > based delay, then read the pin.  There's probably no need for a timer at 
> > that stage of start up.
> > 
> > void delay(void)
> > {
> >    // Use volatile so the optimizer will not nullify this code
> >    volatile int i;
> >    for(i = 0; i < YOUR_DELAY; i++);
> > }
> > 
> > JJS
> 
> How is it legal for the compiler to optimize out the iterator? If it does, your compiler is broken. Don't confuse <optimized out> from gdb with "I removed your code." That for loop will execute because you told the compiler to loop for YOUR_DELAY counts. It must emit that code even if 'i' is not visible in the debugger. Volatile is only needed when the compiler cannot see any possible path for the variable to change with in the current scope of execution, e.g. ISR, another thread, or a hardware mapped address.
> 
> As for a delay in this case I'd read that GPIO port N times to create the delay. The port read will have a longer execution duration and you may be able to empirically determine how many reads it takes before it stabilizes. Often times port reads can be rather CPU clock independent and depend more on the port peripheral clock so you may even be able to get reliable timing regardless of the CPU clock.

Gah! I forgot about optimizers. Yes the -O2 flag will remove the for loop. It's not correct C though to do that. The port read is immune to optimizer magic.

Reply by Hans-Bernhard Bröker ●March 22, 20172017-03-22

Am 22.03.2017 um 17:48 schrieb mrfirmware:

> Gah! I forgot about optimizers. Yes the -O2 flag will remove the for
> loop. It's not correct C though to do that.

Actually in the case at hand, it can still be considered correct, 
depending how you interpret some rather involved wording in the standard.

> The port read is immune to optimizer magic.

If there were one in the code under consideration, it might be.  But 
there isn't, so the entire loop is fair game.

Reply by David Brown ●March 23, 20172017-03-23

On 22/03/17 17:48, mrfirmware wrote:
> On Wednesday, March 22, 2017 at 12:29:29 PM UTC-4, mrfirmware wrote:
>> On Thursday, February 23, 2017 at 12:00:23 PM UTC-5, John Speth wrote:
>>> On 2/23/2017 8:15 AM, pozz wrote:
>>>> During startup, I need a short and not precise delay, before configuring
>>>> clocks, timers and other peripherals (at startup the CPU runs with
>>>> internal clock).
>>>>
>>>> What do you suggest?
>>>>
>>>> I think there's a simpler method than configuring a hardware timer.
>>>>
>>>> I need to check the status of an input pin, *after* enabling internal
>>>> pull-up. I'd like to introduce a short delay after enabling internal
>>>> pull-up, otherwise there's a risk I will read a transient level (maybe 0
>>>> or 1).
>>>
>>> First thing after main() starts, configure your pin and run a spin loop 
>>> based delay, then read the pin.  There's probably no need for a timer at 
>>> that stage of start up.
>>>
>>> void delay(void)
>>> {
>>>    // Use volatile so the optimizer will not nullify this code
>>>    volatile int i;
>>>    for(i = 0; i < YOUR_DELAY; i++);
>>> }
>>>
>>> JJS
>>
>> How is it legal for the compiler to optimize out the iterator? If it
>> does, your compiler is broken. Don't confuse <optimized out> from gdb
>> with "I removed your code." That for loop will execute because you told
>> the compiler to loop for YOUR_DELAY counts. It must emit that code even
>> if 'i' is not visible in the debugger. Volatile is only needed when the
>> compiler cannot see any possible path for the variable to change with in
>> the current scope of execution, e.g. ISR, another thread, or a hardware
>> mapped address.
>>
>> As for a delay in this case I'd read that GPIO port N times to create
>> the delay. The port read will have a longer execution duration and you
>> may be able to empirically determine how many reads it takes before it
>> stabilizes. Often times port reads can be rather CPU clock independent
>> and depend more on the port peripheral clock so you may even be able to
>> get reliable timing regardless of the CPU clock.
> 
> Gah! I forgot about optimizers. Yes the -O2 flag will remove the for
> loop. It's not correct C though to do that. The port read is immune to
> optimizer magic.
> 

It is /entirely/ correct for the compiler to remove a delay loop like
this if there is no "volatile" involved.  For C, there are certain
"observable behaviours" in a program.  These are:

1. Accesses to volatile objects (reads or writes).  Ordering is
important, but timing is not.
2. Program start and exit.
3. Input and output of "interactive devices" through <stdio.h> functions
like printf(), fread() and fwrite().
4. The data written to files, at program termination.
5. Any function calls where the functions might do one of the above four
things.

The compiler has to generate code that produces the same results with
respect to these "observable behaviours" as the "C abstract machine" does.

And that is /it/.  The compiler is free to do anything it wants, as long
as these rules are followed.

For embedded systems, this usually boils down to just "program start"
and "volatile accesses".  Even when you use "printf", the key point is
that this will eventually use volatile accesses to send data to UART
hardware.  And function calls are considered "observable behaviour" if
the compiler does not know for sure that they involve no volatile accesses.

A loop that simple counts up does not access anything volatile - the
compiler can freely remove it.  The compiler can shuffle things around
any way it wants, as long as it does not break the ordering of volatile
accesses.  It can remove or simplify any code it wants, as long as the
results are that the same data is written out to volatile objects and
the same reads are made of volatile objects.

It can do this regardless of any optimisation settings.  Using -O0 or no
optimisation flags does /not/ turn off optimisations - it merely tells
the compiler not to work too hard on optimising.  Turning on -O2 can
never make correct code (in the C sense) into incorrect code - it just
asks the compiler to work harder to generate more efficient code that
does exactly the same thing as before.

And note that C has no concept of time - there is no way in C to express
a delay, or to suggest that some code has to be faster or slower than
other code.  There is only ordering of volatiles and the expectation
that a decent compiler won't insert extra slow code.

Reply by Tim Wescott ●March 23, 20172017-03-23

On Thu, 23 Mar 2017 10:05:51 +0100, David Brown wrote:

> Turning on -O2 can never make correct code (in the C sense) into
> incorrect code - it just asks the compiler to work harder to
> generate more efficient code that does exactly the same thing as
> before.

Just wanted to double down on that -- if bumping up the optimization 
level screws the pooch, then your code is at fault.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Reply by mrfirmware ●March 23, 20172017-03-23

On Thursday, March 23, 2017 at 1:54:41 PM UTC-4, Tim Wescott wrote:
> On Thu, 23 Mar 2017 10:05:51 +0100, David Brown wrote:
> 
> > Turning on -O2 can never make correct code (in the C sense) into
> > incorrect code - it just asks the compiler to work harder to
> > generate more efficient code that does exactly the same thing as
> > before.
> 
> Just wanted to double down on that -- if bumping up the optimization 
> level screws the pooch, then your code is at fault.
> 
> -- 
> 
> Tim Wescott
> Wescott Design Services
> http://www.wescottdesign.com
> 
> I'm looking for work -- see my website!

Yeah, in retrospect what was I thinking? I guess this is why I just read or write the port pin multiple times to sample reliably/create a pulse. Or snapshot a free running hw timer and wait for it to exceed some offset from the snapshot time.

Reply by pozz ●March 24, 20172017-03-24

Il 23/02/2017 17:15, pozz ha scritto:
> During startup, I need a short and not precise delay, before configuring
> clocks, timers and other peripherals (at startup the CPU runs with
> internal clock).
>
> What do you suggest?
>
> I think there's a simpler method than configuring a hardware timer.
>
> I need to check the status of an input pin, *after* enabling internal
> pull-up. I'd like to introduce a short delay after enabling internal
> pull-up, otherwise there's a risk I will read a transient level (maybe 0
> or 1).

I found some interesting code in Atmel Software Framework. It is based 
on the following function defined in delay.c file:

__attribute__((optimize("-Os")))
__attribute__ ((section(".ramfunc")))
void portable_delay_cycles(unsigned long n)
{
     (void)n;

     __asm (
         "loop: DMB	\n"
         "SUB r0, r0, #1 \n"
         "CMP r0, #0  \n"
         "BNE loop         "
     );
}


In delay.h there are some macros:

void portable_delay_cycles(unsigned long n);

#define cpu_ms_2_cy(ms, f_cpu)  \
         (((uint64_t)(ms) * (f_cpu) + (uint64_t)(7e3-1ul)) / (uint64_t)7e3)
#define cpu_us_2_cy(us, f_cpu)  \
         (((uint64_t)(us) * (f_cpu) + (uint64_t)(7e6-1ul)) / (uint64_t)7e6)

#define delay_cycles               portable_delay_cycles

#define cpu_delay_s(delay) delay_cycles(cpu_ms_2_cy(1000 * delay, F_CPU))
#define cpu_delay_ms(delay) delay_cycles(cpu_ms_2_cy(delay, F_CPU))
#define cpu_delay_us(delay) delay_cycles(cpu_us_2_cy(delay, F_CPU))


However I didn't measure how precise are the delays generated by those 
functions.

Reply by David Brown ●March 24, 20172017-03-24

On 24/03/17 12:23, pozz wrote:
> Il 23/02/2017 17:15, pozz ha scritto:
>> During startup, I need a short and not precise delay, before configuring
>> clocks, timers and other peripherals (at startup the CPU runs with
>> internal clock).
>>
>> What do you suggest?
>>
>> I think there's a simpler method than configuring a hardware timer.
>>
>> I need to check the status of an input pin, *after* enabling internal
>> pull-up. I'd like to introduce a short delay after enabling internal
>> pull-up, otherwise there's a risk I will read a transient level (maybe 0
>> or 1).
> 
> I found some interesting code in Atmel Software Framework. It is based
> on the following function defined in delay.c file:
> 
> __attribute__((optimize("-Os")))
> __attribute__ ((section(".ramfunc")))
> void portable_delay_cycles(unsigned long n)
> {
>     (void)n;
> 
>     __asm (
>         "loop: DMB    \n"
>         "SUB r0, r0, #1 \n"
>         "CMP r0, #0  \n"
>         "BNE loop         "
>     );
> }
> 

Someone should teach the Atmel folk about gcc inline assembly...

They might also realise that "written in AVR32 assembly" and "portable"
don't really go together!

> 
> In delay.h there are some macros:
> 
> void portable_delay_cycles(unsigned long n);
> 
> #define cpu_ms_2_cy(ms, f_cpu)  \
>         (((uint64_t)(ms) * (f_cpu) + (uint64_t)(7e3-1ul)) / (uint64_t)7e3)
> #define cpu_us_2_cy(us, f_cpu)  \
>         (((uint64_t)(us) * (f_cpu) + (uint64_t)(7e6-1ul)) / (uint64_t)7e6)
> 
> #define delay_cycles               portable_delay_cycles
> 
> #define cpu_delay_s(delay) delay_cycles(cpu_ms_2_cy(1000 * delay, F_CPU))
> #define cpu_delay_ms(delay) delay_cycles(cpu_ms_2_cy(delay, F_CPU))
> #define cpu_delay_us(delay) delay_cycles(cpu_us_2_cy(delay, F_CPU))
> 
> 
> However I didn't measure how precise are the delays generated by those
> functions.

For a multi-MHz cpu clock, they should certainly be better than &#4294967295;s
precision.  The point of the function is to put the code in ram, so that
its timing is not dependent on things like flash access delays.

Of course, if interrupts are enabled, it's a different matter.

Reply by pozz ●March 24, 20172017-03-24

Il 24/03/2017 12:57, David Brown ha scritto:
> On 24/03/17 12:23, pozz wrote:
>> Il 23/02/2017 17:15, pozz ha scritto:
>>> During startup, I need a short and not precise delay, before configuring
>>> clocks, timers and other peripherals (at startup the CPU runs with
>>> internal clock).
>>>
>>> What do you suggest?
>>>
>>> I think there's a simpler method than configuring a hardware timer.
>>>
>>> I need to check the status of an input pin, *after* enabling internal
>>> pull-up. I'd like to introduce a short delay after enabling internal
>>> pull-up, otherwise there's a risk I will read a transient level (maybe 0
>>> or 1).
>>
>> I found some interesting code in Atmel Software Framework. It is based
>> on the following function defined in delay.c file:
>>
>> __attribute__((optimize("-Os")))
>> __attribute__ ((section(".ramfunc")))
>> void portable_delay_cycles(unsigned long n)
>> {
>>     (void)n;
>>
>>     __asm (
>>         "loop: DMB    \n"
>>         "SUB r0, r0, #1 \n"
>>         "CMP r0, #0  \n"
>>         "BNE loop         "
>>     );
>> }
>>
>
> Someone should teach the Atmel folk about gcc inline assembly...

What is wrong with that code?

Previous 123 Next

Short blocking delay in Cortex-M0+

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group