EmbeddedRelated.com
Forums

gnu compiler optimizes out "asm" statements

Started by Tim Wescott May 28, 2015
On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
> On Thu, 28 May 2015 19:06:04 +0000, Simon Clubley wrote: > >> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >>> >>> Another data point: I'm optimizing at O1. When I build at O0, it >>> works. >>> >>> >> In that case, try my suggestion of marking the asm statement itself as >> volatile. >> >> Simon. > > The compiler doesn't allow that. >
It works for me in C. What syntax are you using ? Here's one example from a test program: asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory"); Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote:

> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >> On Thu, 28 May 2015 19:06:04 +0000, Simon Clubley wrote: >> >>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >>>> >>>> Another data point: I'm optimizing at O1. When I build at O0, it >>>> works. >>>> >>>> >>> In that case, try my suggestion of marking the asm statement itself as >>> volatile. >>> >>> Simon. >> >> The compiler doesn't allow that. >> >> > It works for me in C. What syntax are you using ? > > Here's one example from a test program: > > asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory");
I need to take my brain out and examine it under a microscope to see how large it is, apparently. I was using "volatile asm". "asm volatile" compiles, and works great, to boot. So -- more kosher than setting the "optimize" attribute of the whole function to "O0", do you think? -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On Thu, 28 May 2015 14:17:33 -0500, Tim Wescott wrote:

> On Thu, 28 May 2015 13:27:31 -0500, Tim Wescott wrote: > >> This is related to my question about interrupts in an STM32F303 >> processor. It turns out that the problem is in the compiler (or I'm >> going insane, which is never outside the realm of possibility when I'm >> working on embedded software). >> >> I'm coding in C++, and I'm using a clever dodge for protecting chunks >> of code from getting interrupted. Basically, I have a class that >> protects a block of code from being interrupted. The constructor saves >> the interrupt state then disables interrupts, and the destructor >> restores interrupts. >> >> This has been reliable for me for years, but now the destructor is not >> being called. I suspect that the optimizer can't make sense of it >> because of the asm statements, and is throwing it away. >> >> If someone knows the proper gnu-magic to tell the optimizer not to do >> that, I'd appreciate it. I'm going to look in my documentation, but I >> want to make sure I use the right method, and don't just stumble onto >> something that works for now but should be depreciated, or is fragile, >> or whatever. >> >> Here's the "protect a block" class: >> >> typedef class CProtect { >> public: >> >> CProtect(void) >> { >> int primask_copy; >> asm("mrs %[primask_copy], primask\n\t" // save interrupt status >> "cpsid i\n\t" // disable interrupts : >> [primask_copy] >> "=r" (primask_copy)); >> _primask = primask_copy; >> } >> >> ~CProtect() >> { >> int primask_copy = _primask; >> // Restore interrupts to their previous value asm("msr primask, >> %[primask_copy]" : : [primask_copy] >> "r" (primask_copy)); >> } >> >> private: >> volatile int _primask; >> } CProtect; >> >> and here's how it's used: >> >> { >> CProtect protect; >> >> // critical code goes here >> } > > This works (with the optimize attribute specified for each function, and > the level set at O0), but I would like some opinions on whether it is > kosher. It works even when the overall optimization level is set to > "O3", > which is cool. > > typedef class CProtect { > public: > > CProtect(void) __attribute__ ((__optimize__ ("O0"))) > { > int primask_copy; > asm("mrs %[primask_copy], primask\n\t" // save interrupt status > "cpsid i\n\t" // disable interrupts : [primask_copy] > "=r" (primask_copy)); > _primask = primask_copy; > } > > ~CProtect() __attribute__ ((__optimize__ ("O0"))) > { > int primask_copy = _primask; > // Restore interrupts to their previous value asm("msr primask, > %[primask_copy]" : : [primask_copy] > "r" (primask_copy)); > } > > private: > volatile int _primask; > } CProtect;
This also works (note commented-out optimize attributes, and "asm volatile"): class CProtect { public: CProtect(void) // __attribute__ ((__optimize__ ("O0"))) { int primask_copy; asm volatile ("mrs %[primask_copy], primask\n\t" // save interrupt "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy)); _primask = primask_copy; } ~CProtect() // __attribute__ ((__optimize__ ("O0"))) { int primask_copy = _primask; // Restore interrupts to their previous value asm volatile ("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)); } private: volatile int _primask; }; -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
> > This works (with the optimize attribute specified for each function, and > the level set at O0), but I would like some opinions on whether it is > kosher. It works even when the overall optimization level is set to "O3", > which is cool. > > typedef class CProtect > { > public: > > CProtect(void) __attribute__ ((__optimize__ ("O0"))) > { > int primask_copy;
[Code example snipped.] Sorry Tim, but my initial reaction, in a good natured way, is yuck! :-) The code feels to me like you are trying to trick the compiler instead of solving the core problem and the proposed solution feels "fragile". Are you sure you can't use "asm volatile" with C++ code ? I don't know if that would solve your problem but if it did, it would feel more "legitimate" to me as volatile is documented to behave in certain ways as you can see from the page I pointed you to. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
> On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote: >> It works for me in C. What syntax are you using ? >> >> Here's one example from a test program: >> >> asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory"); > > I need to take my brain out and examine it under a microscope to see how > large it is, apparently. > > I was using "volatile asm". "asm volatile" compiles, and works great, to > boot. > > So -- more kosher than setting the "optimize" attribute of the whole > function to "O0", do you think? >
Certainly (at least based on previous experience). It will be interesting to see if others agree or if there's any issues I have not thought of. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On 28.5.15 22:30, Simon Clubley wrote:
> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >> On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote: >>> It works for me in C. What syntax are you using ? >>> >>> Here's one example from a test program: >>> >>> asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory"); >> >> I need to take my brain out and examine it under a microscope to see how >> large it is, apparently. >> >> I was using "volatile asm". "asm volatile" compiles, and works great, to >> boot. >> >> So -- more kosher than setting the "optimize" attribute of the whole >> function to "O0", do you think? >> > > Certainly (at least based on previous experience). > > It will be interesting to see if others agree or if there's any issues > I have not thought of. > > Simon. >
For embedded code, my favorite is -Os. -- -TV
On 2015-05-28, Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote:
> On 28.5.15 22:30, Simon Clubley wrote: >> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >>> On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote: >>>> It works for me in C. What syntax are you using ? >>>> >>>> Here's one example from a test program: >>>> >>>> asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory"); >>> >>> I need to take my brain out and examine it under a microscope to see how >>> large it is, apparently. >>> >>> I was using "volatile asm". "asm volatile" compiles, and works great, to >>> boot. >>> >>> So -- more kosher than setting the "optimize" attribute of the whole >>> function to "O0", do you think? >>> >> >> Certainly (at least based on previous experience). >> >> It will be interesting to see if others agree or if there's any issues >> I have not thought of. >> > > For embedded code, my favorite is -Os. >
Interesting. How does -Os change the behaviour of asm volatile ? Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On 28/05/15 21:06, Simon Clubley wrote:
> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >> >> Another data point: I'm optimizing at O1. When I build at O0, it works. >> > > In that case, try my suggestion of marking the asm statement itself > as volatile. > > Simon. >
That is almost certainly the issue. Some compilers consider inline assembly as "volatile" - they view them as something scary, and make sure everything before them is completely finished before executing the secret assembly code, and basically turn off all optimisation around the inline assembly call. gcc (and clang, and a few other compilers) is not like that - it provides ways for the programmer to tell the compiler exactly what the assembly code affects or depends on, so that it can optimise around it. This is extremely useful for some sorts of inline assembly, and it lets you make good use of processor instructions that cannot easily be expressed in C (such as a bit reverse instruction) with only the bare minimum being written in assembly. It also means you don't have to mess around with things like the "primask_copy" variable in this CProtect class - gcc understands these things, and makes copies in registers as needed. The flipside is that you have to know the rules, and be very careful to apply them. A key rule here is "volatile". A normal inline assembly instruction is considered non-volatile - the compiler is free to omit it if it is dead code, and can re-order it as it finds convenient. (Inline assembly statements with no outputs, and whose inputs don't involve addresses, are considered "volatile" by default as they would be pointless if they didn't do something unknown to the compiler.) So step one is to make the inline assembly codes "volatile" so the compiler knows it has execute them, and it has to do so in order. The second key rule is the interaction of "volatile" accesses (either volatile reads and writes, volatile inline assembly, or calls to unknown external code) and normal accesses. C does not specify this ordering in any way. So in code like this: int a; volatile int v; void foo(void) { a = 0; v = 1; a++; v = 2; a++; } the compiler can re-arrange writes to "a" with writes to "v". It can replace all accesses to a with a "a = 2;", and it can put that before, in the middle, or at the end of the two volatile writes to v. The same applies to volatile assembly. Consider this: uint64_t big; void atomic_write(uint64_t x) { asm volatile("disableInterrupts"); big = x; asm volatile("enableInterrupts"); } This will not work, except by luck - the compiler can re-order the write to "big" with respect to the interrupt disable/enable, and therefore destroy your hopes of making an atomic write. The way to deal with this is either by making the write to "big" volatile, to add artificial volatile dependencies that enforce the order, or by using "clobbers" in the assembly statements. Clobbers can be quite sophisticated when you want to get the maximal performance (by using minimal clobbers), but the easiest and therefore safest method is to clobber "memory": void atomic_write(uint64_t x) { asm volatile("disableInterrupts" ::: "memory"); big = x; asm volatile("enableInterrupts" ::: "memory"); } The memory clobber tells the compiler that the inline assembly might read or write memory in unexpected ways - all statements that logically write something to memory that appear before the inline assembly, must complete those writes. And any logical reads from memory after the inline assembly, cannot be started until after the assembly. Data from memory cannot be cached in registers across the assembly. This is often used with an empty inline assembly: static inline void compilerBarrier(void) { asm volatile("" ::: "memory"); } Once we have cleaned up the other minor issues in your class (the unnecessary "volatile" on the private member, the unnecessary typedef, the use of "int" instead of "uint32_t", and the use of reserved identifiers with leading underscores), we get this: #include <stdint.h> class CProtect { public : CProtect(void) { asm volatile("mrs %[primask_], primask\n" "cpsid i" : [primask_] "=r" (primask_) : : "memory"); } ~CProtect() { asm volatile("msr primask, %[primask_]" : : [primask_] "r" (primask_) : "memory"); } private : uint32_t primask_; }; extern uint64_t big; void atomic_write(uint64_t x) { CProtect protect; big = x; } Compiling with this command line (using the usual optimisation setting -Os): /opt/Freescale/KDS_2.0.0/toolchain/bin/arm-none-eabi-gcc -c a.cpp -Wall -Wextra -Wa,-ahdsl -Os -mcpu=cortex-m4 -mthumb gives this assembly: 21 _Z12atomic_writey: 22 .fnstart 23 .LFB6: 24 @ args = 0, pretend = 0, frame = 0 25 @ frame_needed = 0, uses_anonymous_args = 0 26 @ link register save eliminated. 27 @ 9 "a.cpp" 1 28 0000 EFF31083 mrs r3, primask 29 0004 72B6 cpsid i 30 @ 0 "" 2 31 .thumb 32 0006 034A ldr r2, .L2 33 0008 C2E90001 strd r0, [r2] 34 @ 15 "a.cpp" 1 35 000c 83F31088 msr primask, r3 36 @ 0 "" 2 37 .thumb 38 0010 7047 bx lr 39 .L3: 40 0012 00BF .align 2 41 .L2: 42 0014 00000000 .word big 43 .cantunwind 44 .fnend And that, I believe, is both correct and optimal.
On Thu, 28 May 2015 19:26:15 +0000, Simon Clubley wrote:

> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >> >> This works (with the optimize attribute specified for each function, >> and the level set at O0), but I would like some opinions on whether it >> is kosher. It works even when the overall optimization level is set to >> "O3", >> which is cool. >> >> typedef class CProtect { >> public: >> >> CProtect(void) __attribute__ ((__optimize__ ("O0"))) >> { >> int primask_copy; > > [Code example snipped.] > > Sorry Tim, but my initial reaction, in a good natured way, is yuck! :-)
Well, there's a reason I'm tossing it out to the group for comment!
> > The code feels to me like you are trying to trick the compiler instead > of solving the core problem and the proposed solution feels "fragile".
Me, too. Actually, I had been compiling at -O1, possibly because with the Cortex M3 processor set it worked at that level but not higher.
> Are you sure you can't use "asm volatile" with C++ code ?
I can. I just can't use "volatile asm". See my own reply that's parallel with yours.
> I don't know if that would solve your problem but if it did, it would > feel more "legitimate" to me as volatile is documented to behave in > certain ways as you can see from the page I pointed you to. >
"asm volatile" certainly seems to fix the issue (which ended up being that the optimizer had an extraneous call to part of the constructor, not a missing call to the destructor, BTW). -- www.wescottdesign.com
On 28/05/15 21:41, Simon Clubley wrote:
> On 2015-05-28, Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote: >> On 28.5.15 22:30, Simon Clubley wrote: >>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >>>> On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote: >>>>> It works for me in C. What syntax are you using ? >>>>> >>>>> Here's one example from a test program: >>>>> >>>>> asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory"); >>>> >>>> I need to take my brain out and examine it under a microscope to see how >>>> large it is, apparently. >>>> >>>> I was using "volatile asm". "asm volatile" compiles, and works great, to >>>> boot. >>>> >>>> So -- more kosher than setting the "optimize" attribute of the whole >>>> function to "O0", do you think? >>>>
The general rule is that if you think you need to reduce optimisation to make your code work, your code is wrong. Very occasionally, the compiler is broken - but that should be rare indeed.
>>> >>> Certainly (at least based on previous experience). >>> >>> It will be interesting to see if others agree or if there's any issues >>> I have not thought of. >>> >> >> For embedded code, my favorite is -Os. >> > > Interesting. How does -Os change the behaviour of asm volatile ? > > Simon. >
"-Os" does most of the "-O2" optimisations, except for an emphasis on smaller size if the speed optimisation in "-O2" would expand the code significantly. (Note that you still get inlining and occasional loop unrolling - but only if the result is smaller code, or if you asked for the inlining explicitly.) As always with optimisation flags, it keeps correct code correct - but makes it more likely that poor code (such as missing or incorrect volatiles) breaks dramatically.