EmbeddedRelated.com
Forums

Short blocking delay in Cortex-M0+

Started by pozz February 23, 2017
On 24/03/17 13:54, pozz wrote:
> Il 24/03/2017 12:57, David Brown ha scritto: >> On 24/03/17 12:23, pozz wrote: >>> Il 23/02/2017 17:15, pozz ha scritto: >>>> During startup, I need a short and not precise delay, before >>>> configuring >>>> clocks, timers and other peripherals (at startup the CPU runs with >>>> internal clock). >>>> >>>> What do you suggest? >>>> >>>> I think there's a simpler method than configuring a hardware timer. >>>> >>>> I need to check the status of an input pin, *after* enabling internal >>>> pull-up. I'd like to introduce a short delay after enabling internal >>>> pull-up, otherwise there's a risk I will read a transient level >>>> (maybe 0 >>>> or 1). >>> >>> I found some interesting code in Atmel Software Framework. It is based >>> on the following function defined in delay.c file: >>> >>> __attribute__((optimize("-Os"))) >>> __attribute__ ((section(".ramfunc"))) >>> void portable_delay_cycles(unsigned long n) >>> { >>> (void)n; >>> >>> __asm ( >>> "loop: DMB \n" >>> "SUB r0, r0, #1 \n" >>> "CMP r0, #0 \n" >>> "BNE loop " >>> ); >>> } >>> >> >> Someone should teach the Atmel folk about gcc inline assembly... > > What is wrong with that code? >
That is written in the old "basic" format for inline assembly, which for most purposes has been replaced by the "extended" format. As it says in the manual: "Using extended asm (see Extended Asm) typically produces smaller, safer, and more efficient code, and in most cases it is a better solution than basic asm." This would have been better written: __attribute__((section(".ramfunc")) void portable_delay_cycles(unsigned long n) { asm volatile ( " dmb\n" " 1:\n" " sub %[n], %[n], #1\n" " cmp %[n], #0\n" " bne 1b" : [n] "+r" (n) :: "cc"); } I would actually prefer such functions to be declared "static inline", and not as a separately compiled function in a different segment, to avoid function call overhead in the delay calculation. But I appreciate the thought of putting it into ram like this. There is very little point in specifying an optimisation attribute for a function containing nothing but assembly! When written as basic assembly, the compiler has fewer opportunities to do much with the code. Extended assembly gives the compiler details of exactly what you need, and exactly what you are doing. If the compiler can move the code around (such as for a "static inline" function, or if you have link-time optimisation enabled), then the compiler is able to generate better code. With the basic assembly, the compiler must assume that the assembly code will corrupt the volatile registers r0..r3, lr and the condition codes. With extended assembly, the compiler knows exactly which registers are in use - and it does not have to pick r0 for the loop counter. It also means the compiler knows exactly how "n" is going to be used. Also, there is no need to put a "data memory barrier" /inside/ the loop! I don't quite see how Atmel count 7 clocks per cycle - I see 6 myself. But putting the DMB outside the loop reduces it by two clocks per cycle, thus increasing the resolution. Finally, they might like to note that: void portable_delay_cycles(unsigned long n) { while (n--) { asm volatile (""); } } gives a faster loop, as well as actually being portable! The best code for accurate delays is usually made by having static inline functions, so that the compiler can calculate and compensate for loop entry/exit directly, and use whatever registers make most sense for efficient code.
On 24.3.17 16:21, David Brown wrote:
> On 24/03/17 13:54, pozz wrote: >> Il 24/03/2017 12:57, David Brown ha scritto: >>> On 24/03/17 12:23, pozz wrote: >>>> Il 23/02/2017 17:15, pozz ha scritto: >>>>> During startup, I need a short and not precise delay, before >>>>> configuring >>>>> clocks, timers and other peripherals (at startup the CPU runs with >>>>> internal clock). >>>>> >>>>> What do you suggest? >>>>> >>>>> I think there's a simpler method than configuring a hardware timer. >>>>> >>>>> I need to check the status of an input pin, *after* enabling internal >>>>> pull-up. I'd like to introduce a short delay after enabling internal >>>>> pull-up, otherwise there's a risk I will read a transient level >>>>> (maybe 0 >>>>> or 1). >>>> >>>> I found some interesting code in Atmel Software Framework. It is based >>>> on the following function defined in delay.c file: >>>> >>>> __attribute__((optimize("-Os"))) >>>> __attribute__ ((section(".ramfunc"))) >>>> void portable_delay_cycles(unsigned long n) >>>> { >>>> (void)n; >>>> >>>> __asm ( >>>> "loop: DMB \n" >>>> "SUB r0, r0, #1 \n" >>>> "CMP r0, #0 \n" >>>> "BNE loop " >>>> ); >>>> } >>>> >>> >>> Someone should teach the Atmel folk about gcc inline assembly... >> >> What is wrong with that code? >> > > That is written in the old "basic" format for inline assembly, which for > most purposes has been replaced by the "extended" format. As it says in > the manual: > > "Using extended asm (see Extended Asm) typically produces smaller, > safer, and more efficient code, and in most cases it is a better > solution than basic asm." > > This would have been better written: > > __attribute__((section(".ramfunc")) > void portable_delay_cycles(unsigned long n) > { > asm volatile ( > " dmb\n" > " 1:\n" > " sub %[n], %[n], #1\n" > " cmp %[n], #0\n" > " bne 1b" > : [n] "+r" (n) :: "cc"); > } > > I would actually prefer such functions to be declared "static inline", > and not as a separately compiled function in a different segment, to > avoid function call overhead in the delay calculation. But I appreciate > the thought of putting it into ram like this. > > There is very little point in specifying an optimisation attribute for a > function containing nothing but assembly! > > When written as basic assembly, the compiler has fewer opportunities to > do much with the code. Extended assembly gives the compiler details of > exactly what you need, and exactly what you are doing. If the compiler > can move the code around (such as for a "static inline" function, or if > you have link-time optimisation enabled), then the compiler is able to > generate better code. With the basic assembly, the compiler must assume > that the assembly code will corrupt the volatile registers r0..r3, lr > and the condition codes. With extended assembly, the compiler knows > exactly which registers are in use - and it does not have to pick r0 for > the loop counter. It also means the compiler knows exactly how "n" is > going to be used. > > Also, there is no need to put a "data memory barrier" /inside/ the loop! > I don't quite see how Atmel count 7 clocks per cycle - I see 6 myself. > But putting the DMB outside the loop reduces it by two clocks per > cycle, thus increasing the resolution. > > Finally, they might like to note that: > > void portable_delay_cycles(unsigned long n) { > while (n--) { > asm volatile (""); > } > } > > gives a faster loop, as well as actually being portable! > > The best code for accurate delays is usually made by having static > inline functions, so that the compiler can calculate and compensate for > loop entry/exit directly, and use whatever registers make most sense for > efficient code. > > >
I would delete the cmp instruction and change the sub instruction to a subs instruction. One more vote to the C version of David. -- -TV
On 24/03/17 20:26, Tauno Voipio wrote:
> On 24.3.17 16:21, David Brown wrote:
>> >> Finally, they might like to note that: >> >> void portable_delay_cycles(unsigned long n) { >> while (n--) { >> asm volatile (""); >> } >> } >> >> gives a faster loop, as well as actually being portable! >> >> The best code for accurate delays is usually made by having static >> inline functions, so that the compiler can calculate and compensate for >> loop entry/exit directly, and use whatever registers make most sense for >> efficient code. >> >> >> > > I would delete the cmp instruction and change the sub instruction > to a subs instruction.
That is what gcc generates for the C loop above (I said it gave a faster loop, but did not post the generated code).
> > One more vote to the C version of David. >
Yes. The " asm volatile(""); " trick might be new to some people - it tells the compiler "pretend something important is happening here, even though there is no code". It is cheaper than the traditional idea of making the loop variable volatile to force the compiler to keep the loop, or the alternative of using an assembly "nop" instruction.