EmbeddedRelated.com
Forums

Short blocking delay in Cortex-M0+

Started by pozz February 23, 2017
On Friday, February 24, 2017 at 12:58:26 PM UTC-5, Stefan Reuther wrote:
> If you want a piece of assembly to show up in your binary, write that > piece of assembly (and not some piece of C code that happens to produce > that piece of assembly today). Even with volatile, the compiler is free > to decide between a register or a variable, possibly different increment > and compare instructions, unrolling the loop, etc.
Even that is not enough; GCC will optimize out assembler NOPs if you're not careful. For Cortex-M4, NOPs get eaten in the pipeline; can't remember about M0. Regardless, if you want a reliable delay: 1) You MUST check the delay on a scope, in RELEASE build with the optimization settings you expect for production. 2) You MUST recheck with the scope after any toolchain update or optimization setting change. Hope that helps! Best Regards, Dave PS: Just last week I had to do a short delay with a volatile iterator for loop, and check it on the scope ;-)
On Thursday, February 23, 2017 at 12:00:23 PM UTC-5, John Speth wrote:
> On 2/23/2017 8:15 AM, pozz wrote: > > During startup, I need a short and not precise delay, before configuring > > clocks, timers and other peripherals (at startup the CPU runs with > > internal clock). > > > > What do you suggest? > > > > I think there's a simpler method than configuring a hardware timer. > > > > I need to check the status of an input pin, *after* enabling internal > > pull-up. I'd like to introduce a short delay after enabling internal > > pull-up, otherwise there's a risk I will read a transient level (maybe 0 > > or 1). > > First thing after main() starts, configure your pin and run a spin loop > based delay, then read the pin. There's probably no need for a timer at > that stage of start up. > > void delay(void) > { > // Use volatile so the optimizer will not nullify this code > volatile int i; > for(i = 0; i < YOUR_DELAY; i++); > } > > JJS
How is it legal for the compiler to optimize out the iterator? If it does, your compiler is broken. Don't confuse <optimized out> from gdb with "I removed your code." That for loop will execute because you told the compiler to loop for YOUR_DELAY counts. It must emit that code even if 'i' is not visible in the debugger. Volatile is only needed when the compiler cannot see any possible path for the variable to change with in the current scope of execution, e.g. ISR, another thread, or a hardware mapped address. As for a delay in this case I'd read that GPIO port N times to create the delay. The port read will have a longer execution duration and you may be able to empirically determine how many reads it takes before it stabilizes. Often times port reads can be rather CPU clock independent and depend more on the port peripheral clock so you may even be able to get reliable timing regardless of the CPU clock.
On Wednesday, March 22, 2017 at 12:29:29 PM UTC-4, mrfirmware wrote:
> On Thursday, February 23, 2017 at 12:00:23 PM UTC-5, John Speth wrote: > > On 2/23/2017 8:15 AM, pozz wrote: > > > During startup, I need a short and not precise delay, before configuring > > > clocks, timers and other peripherals (at startup the CPU runs with > > > internal clock). > > > > > > What do you suggest? > > > > > > I think there's a simpler method than configuring a hardware timer. > > > > > > I need to check the status of an input pin, *after* enabling internal > > > pull-up. I'd like to introduce a short delay after enabling internal > > > pull-up, otherwise there's a risk I will read a transient level (maybe 0 > > > or 1). > > > > First thing after main() starts, configure your pin and run a spin loop > > based delay, then read the pin. There's probably no need for a timer at > > that stage of start up. > > > > void delay(void) > > { > > // Use volatile so the optimizer will not nullify this code > > volatile int i; > > for(i = 0; i < YOUR_DELAY; i++); > > } > > > > JJS > > How is it legal for the compiler to optimize out the iterator? If it does, your compiler is broken. Don't confuse <optimized out> from gdb with "I removed your code." That for loop will execute because you told the compiler to loop for YOUR_DELAY counts. It must emit that code even if 'i' is not visible in the debugger. Volatile is only needed when the compiler cannot see any possible path for the variable to change with in the current scope of execution, e.g. ISR, another thread, or a hardware mapped address. > > As for a delay in this case I'd read that GPIO port N times to create the delay. The port read will have a longer execution duration and you may be able to empirically determine how many reads it takes before it stabilizes. Often times port reads can be rather CPU clock independent and depend more on the port peripheral clock so you may even be able to get reliable timing regardless of the CPU clock.
Gah! I forgot about optimizers. Yes the -O2 flag will remove the for loop. It's not correct C though to do that. The port read is immune to optimizer magic.
Am 22.03.2017 um 17:48 schrieb mrfirmware:

> Gah! I forgot about optimizers. Yes the -O2 flag will remove the for > loop. It's not correct C though to do that.
Actually in the case at hand, it can still be considered correct, depending how you interpret some rather involved wording in the standard.
> The port read is immune to optimizer magic.
If there were one in the code under consideration, it might be. But there isn't, so the entire loop is fair game.
On 22/03/17 17:48, mrfirmware wrote:
> On Wednesday, March 22, 2017 at 12:29:29 PM UTC-4, mrfirmware wrote: >> On Thursday, February 23, 2017 at 12:00:23 PM UTC-5, John Speth wrote: >>> On 2/23/2017 8:15 AM, pozz wrote: >>>> During startup, I need a short and not precise delay, before configuring >>>> clocks, timers and other peripherals (at startup the CPU runs with >>>> internal clock). >>>> >>>> What do you suggest? >>>> >>>> I think there's a simpler method than configuring a hardware timer. >>>> >>>> I need to check the status of an input pin, *after* enabling internal >>>> pull-up. I'd like to introduce a short delay after enabling internal >>>> pull-up, otherwise there's a risk I will read a transient level (maybe 0 >>>> or 1). >>> >>> First thing after main() starts, configure your pin and run a spin loop >>> based delay, then read the pin. There's probably no need for a timer at >>> that stage of start up. >>> >>> void delay(void) >>> { >>> // Use volatile so the optimizer will not nullify this code >>> volatile int i; >>> for(i = 0; i < YOUR_DELAY; i++); >>> } >>> >>> JJS >> >> How is it legal for the compiler to optimize out the iterator? If it >> does, your compiler is broken. Don't confuse <optimized out> from gdb >> with "I removed your code." That for loop will execute because you told >> the compiler to loop for YOUR_DELAY counts. It must emit that code even >> if 'i' is not visible in the debugger. Volatile is only needed when the >> compiler cannot see any possible path for the variable to change with in >> the current scope of execution, e.g. ISR, another thread, or a hardware >> mapped address. >> >> As for a delay in this case I'd read that GPIO port N times to create >> the delay. The port read will have a longer execution duration and you >> may be able to empirically determine how many reads it takes before it >> stabilizes. Often times port reads can be rather CPU clock independent >> and depend more on the port peripheral clock so you may even be able to >> get reliable timing regardless of the CPU clock. > > Gah! I forgot about optimizers. Yes the -O2 flag will remove the for > loop. It's not correct C though to do that. The port read is immune to > optimizer magic. >
It is /entirely/ correct for the compiler to remove a delay loop like this if there is no "volatile" involved. For C, there are certain "observable behaviours" in a program. These are: 1. Accesses to volatile objects (reads or writes). Ordering is important, but timing is not. 2. Program start and exit. 3. Input and output of "interactive devices" through <stdio.h> functions like printf(), fread() and fwrite(). 4. The data written to files, at program termination. 5. Any function calls where the functions might do one of the above four things. The compiler has to generate code that produces the same results with respect to these "observable behaviours" as the "C abstract machine" does. And that is /it/. The compiler is free to do anything it wants, as long as these rules are followed. For embedded systems, this usually boils down to just "program start" and "volatile accesses". Even when you use "printf", the key point is that this will eventually use volatile accesses to send data to UART hardware. And function calls are considered "observable behaviour" if the compiler does not know for sure that they involve no volatile accesses. A loop that simple counts up does not access anything volatile - the compiler can freely remove it. The compiler can shuffle things around any way it wants, as long as it does not break the ordering of volatile accesses. It can remove or simplify any code it wants, as long as the results are that the same data is written out to volatile objects and the same reads are made of volatile objects. It can do this regardless of any optimisation settings. Using -O0 or no optimisation flags does /not/ turn off optimisations - it merely tells the compiler not to work too hard on optimising. Turning on -O2 can never make correct code (in the C sense) into incorrect code - it just asks the compiler to work harder to generate more efficient code that does exactly the same thing as before. And note that C has no concept of time - there is no way in C to express a delay, or to suggest that some code has to be faster or slower than other code. There is only ordering of volatiles and the expectation that a decent compiler won't insert extra slow code.
On Thu, 23 Mar 2017 10:05:51 +0100, David Brown wrote:

> Turning on -O2 can never make correct code (in the C sense) into > incorrect code - it just asks the compiler to work harder to > generate more efficient code that does exactly the same thing as > before.
Just wanted to double down on that -- if bumping up the optimization level screws the pooch, then your code is at fault. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
On Thursday, March 23, 2017 at 1:54:41 PM UTC-4, Tim Wescott wrote:
> On Thu, 23 Mar 2017 10:05:51 +0100, David Brown wrote: > > > Turning on -O2 can never make correct code (in the C sense) into > > incorrect code - it just asks the compiler to work harder to > > generate more efficient code that does exactly the same thing as > > before. > > Just wanted to double down on that -- if bumping up the optimization > level screws the pooch, then your code is at fault. > > -- > > Tim Wescott > Wescott Design Services > http://www.wescottdesign.com > > I'm looking for work -- see my website!
Yeah, in retrospect what was I thinking? I guess this is why I just read or write the port pin multiple times to sample reliably/create a pulse. Or snapshot a free running hw timer and wait for it to exceed some offset from the snapshot time.
Il 23/02/2017 17:15, pozz ha scritto:
> During startup, I need a short and not precise delay, before configuring > clocks, timers and other peripherals (at startup the CPU runs with > internal clock). > > What do you suggest? > > I think there's a simpler method than configuring a hardware timer. > > I need to check the status of an input pin, *after* enabling internal > pull-up. I'd like to introduce a short delay after enabling internal > pull-up, otherwise there's a risk I will read a transient level (maybe 0 > or 1).
I found some interesting code in Atmel Software Framework. It is based on the following function defined in delay.c file: __attribute__((optimize("-Os"))) __attribute__ ((section(".ramfunc"))) void portable_delay_cycles(unsigned long n) { (void)n; __asm ( "loop: DMB \n" "SUB r0, r0, #1 \n" "CMP r0, #0 \n" "BNE loop " ); } In delay.h there are some macros: void portable_delay_cycles(unsigned long n); #define cpu_ms_2_cy(ms, f_cpu) \ (((uint64_t)(ms) * (f_cpu) + (uint64_t)(7e3-1ul)) / (uint64_t)7e3) #define cpu_us_2_cy(us, f_cpu) \ (((uint64_t)(us) * (f_cpu) + (uint64_t)(7e6-1ul)) / (uint64_t)7e6) #define delay_cycles portable_delay_cycles #define cpu_delay_s(delay) delay_cycles(cpu_ms_2_cy(1000 * delay, F_CPU)) #define cpu_delay_ms(delay) delay_cycles(cpu_ms_2_cy(delay, F_CPU)) #define cpu_delay_us(delay) delay_cycles(cpu_us_2_cy(delay, F_CPU)) However I didn't measure how precise are the delays generated by those functions.
On 24/03/17 12:23, pozz wrote:
> Il 23/02/2017 17:15, pozz ha scritto: >> During startup, I need a short and not precise delay, before configuring >> clocks, timers and other peripherals (at startup the CPU runs with >> internal clock). >> >> What do you suggest? >> >> I think there's a simpler method than configuring a hardware timer. >> >> I need to check the status of an input pin, *after* enabling internal >> pull-up. I'd like to introduce a short delay after enabling internal >> pull-up, otherwise there's a risk I will read a transient level (maybe 0 >> or 1). > > I found some interesting code in Atmel Software Framework. It is based > on the following function defined in delay.c file: > > __attribute__((optimize("-Os"))) > __attribute__ ((section(".ramfunc"))) > void portable_delay_cycles(unsigned long n) > { > (void)n; > > __asm ( > "loop: DMB \n" > "SUB r0, r0, #1 \n" > "CMP r0, #0 \n" > "BNE loop " > ); > } >
Someone should teach the Atmel folk about gcc inline assembly... They might also realise that "written in AVR32 assembly" and "portable" don't really go together!
> > In delay.h there are some macros: > > void portable_delay_cycles(unsigned long n); > > #define cpu_ms_2_cy(ms, f_cpu) \ > (((uint64_t)(ms) * (f_cpu) + (uint64_t)(7e3-1ul)) / (uint64_t)7e3) > #define cpu_us_2_cy(us, f_cpu) \ > (((uint64_t)(us) * (f_cpu) + (uint64_t)(7e6-1ul)) / (uint64_t)7e6) > > #define delay_cycles portable_delay_cycles > > #define cpu_delay_s(delay) delay_cycles(cpu_ms_2_cy(1000 * delay, F_CPU)) > #define cpu_delay_ms(delay) delay_cycles(cpu_ms_2_cy(delay, F_CPU)) > #define cpu_delay_us(delay) delay_cycles(cpu_us_2_cy(delay, F_CPU)) > > > However I didn't measure how precise are the delays generated by those > functions.
For a multi-MHz cpu clock, they should certainly be better than &#4294967295;s precision. The point of the function is to put the code in ram, so that its timing is not dependent on things like flash access delays. Of course, if interrupts are enabled, it's a different matter.
Il 24/03/2017 12:57, David Brown ha scritto:
> On 24/03/17 12:23, pozz wrote: >> Il 23/02/2017 17:15, pozz ha scritto: >>> During startup, I need a short and not precise delay, before configuring >>> clocks, timers and other peripherals (at startup the CPU runs with >>> internal clock). >>> >>> What do you suggest? >>> >>> I think there's a simpler method than configuring a hardware timer. >>> >>> I need to check the status of an input pin, *after* enabling internal >>> pull-up. I'd like to introduce a short delay after enabling internal >>> pull-up, otherwise there's a risk I will read a transient level (maybe 0 >>> or 1). >> >> I found some interesting code in Atmel Software Framework. It is based >> on the following function defined in delay.c file: >> >> __attribute__((optimize("-Os"))) >> __attribute__ ((section(".ramfunc"))) >> void portable_delay_cycles(unsigned long n) >> { >> (void)n; >> >> __asm ( >> "loop: DMB \n" >> "SUB r0, r0, #1 \n" >> "CMP r0, #0 \n" >> "BNE loop " >> ); >> } >> > > Someone should teach the Atmel folk about gcc inline assembly...
What is wrong with that code?