EmbeddedRelated.com
Forums

gnu compiler optimizes out "asm" statements

Started by Tim Wescott May 28, 2015
On 28/05/15 21:15, Tim Wescott wrote:
> On Thu, 28 May 2015 21:10:08 +0200, Lanarcam wrote: > >> Le 28/05/2015 21:04, Tim Wescott a écrit : >>> On Thu, 28 May 2015 13:27:31 -0500, Tim Wescott wrote: >>> >>>> This is related to my question about interrupts in an STM32F303 >>>> processor. It turns out that the problem is in the compiler (or I'm >>>> going insane, which is never outside the realm of possibility when I'm >>>> working on embedded software). >>>> >>>> I'm coding in C++, and I'm using a clever dodge for protecting chunks >>>> of code from getting interrupted. Basically, I have a class that >>>> protects a block of code from being interrupted. The constructor >>>> saves the interrupt state then disables interrupts, and the destructor >>>> restores interrupts. >>>> >>>> This has been reliable for me for years, but now the destructor is not >>>> being called. I suspect that the optimizer can't make sense of it >>>> because of the asm statements, and is throwing it away. >>>> >>>> If someone knows the proper gnu-magic to tell the optimizer not to do >>>> that, I'd appreciate it. I'm going to look in my documentation, but I >>>> want to make sure I use the right method, and don't just stumble onto >>>> something that works for now but should be depreciated, or is fragile, >>>> or whatever. >>>> >>>> Here's the "protect a block" class: >>>> >>>> typedef class CProtect { >>>> public: >>>> >>>> CProtect(void) >>>> { >>>> int primask_copy; >>>> asm("mrs %[primask_copy], primask\n\t" // save interrupt >>>> status >>>> "cpsid i\n\t" // disable interrupts : >>>> [primask_copy] >>>> "=r" (primask_copy)); >>>> _primask = primask_copy; >>>> } >>>> >>>> ~CProtect() >>>> { >>>> int primask_copy = _primask; >>>> // Restore interrupts to their previous value asm("msr primask, >>>> %[primask_copy]" : : [primask_copy] >>>> "r" (primask_copy)); >>>> } >>>> >>>> private: >>>> volatile int _primask; >>>> } CProtect; >>>> >>>> and here's how it's used: >>>> >>>> { >>>> CProtect protect; >>>> >>>> // critical code goes here >>>> } >>> >>> Another data point: I'm optimizing at O1. When I build at O0, it >>> works. >>> >> Could you try: Cprotect *protect = new Cprotect(); > > I could, but here at Wescott Design Services we have a fairly hard to > overcome rule that says "don't thrash the heap". My boss would kill me, > which would hurt me twice because I'm my boss. >
In addition, the code would be bigger, slower, open to out-of-memory or memory fragmentation issues, pull in a large chunk of library (if this is the only "new"), and it would still be wrong.
On 28/05/15 21:57, Tim Wescott wrote:
> On Thu, 28 May 2015 19:26:15 +0000, Simon Clubley wrote: > >> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >>> >>> This works (with the optimize attribute specified for each function, >>> and the level set at O0), but I would like some opinions on whether it >>> is kosher. It works even when the overall optimization level is set to >>> "O3", >>> which is cool. >>> >>> typedef class CProtect { >>> public: >>> >>> CProtect(void) __attribute__ ((__optimize__ ("O0"))) >>> { >>> int primask_copy; >> >> [Code example snipped.] >> >> Sorry Tim, but my initial reaction, in a good natured way, is yuck! :-) > > Well, there's a reason I'm tossing it out to the group for comment!
It was a good question - I hope my other reply post is useful to others in this group too.
>> >> The code feels to me like you are trying to trick the compiler instead >> of solving the core problem and the proposed solution feels "fragile". > > Me, too. Actually, I had been compiling at -O1, possibly because with > the Cortex M3 processor set it worked at that level but not higher.
It can be legitimate to use optimize attributes to change the way code is generated (such as using "-Os" for most of the program, and "-O3" for a function where you really want loop unrolling). But changing optimisation levels to hide bugs in the code is rarely a good idea! Remember, optimisation levels are hints to the compiler - they are not commands, and they do not change the logic of your code.
> >> Are you sure you can't use "asm volatile" with C++ code ? > > I can. I just can't use "volatile asm". See my own reply that's > parallel with yours. > >> I don't know if that would solve your problem but if it did, it would >> feel more "legitimate" to me as volatile is documented to behave in >> certain ways as you can see from the page I pointed you to. >> > > "asm volatile" certainly seems to fix the issue (which ended up being > that the optimizer had an extraneous call to part of the constructor, not > a missing call to the destructor, BTW). >
The way you write that implies that the compiler was making a mistake - the compiler was generating legitimate code from the source given. And don't forget that you almost certainly need the "memory" clobber, unless your protected code contains volatile accesses which force the ordering.
On 2015-05-28, David Brown <david.brown@hesbynett.no> wrote:
> On 28/05/15 21:57, Tim Wescott wrote: >> On Thu, 28 May 2015 19:26:15 +0000, Simon Clubley wrote: >>> >>> [Code example snipped.] >>> >>> Sorry Tim, but my initial reaction, in a good natured way, is yuck! :-) >> >> Well, there's a reason I'm tossing it out to the group for comment! > > It was a good question - I hope my other reply post is useful to others > in this group too. >
Sorry Tim, I was just trying to make a good natured comment. Sorry if it didn't come across that way.
>>> I don't know if that would solve your problem but if it did, it would >>> feel more "legitimate" to me as volatile is documented to behave in >>> certain ways as you can see from the page I pointed you to. >>> >> >> "asm volatile" certainly seems to fix the issue (which ended up being >> that the optimizer had an extraneous call to part of the constructor, not >> a missing call to the destructor, BTW). >> > > The way you write that implies that the compiler was making a mistake - > the compiler was generating legitimate code from the source given. > > And don't forget that you almost certainly need the "memory" clobber, > unless your protected code contains volatile accesses which force the > ordering. >
Indeed. The memory clobber is a lesson I learnt long ago as you can see from the example line of code I posted. I should have remembered to mention that one as well. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On Thu, 28 May 2015 21:57:00 +0200, David Brown wrote:

> On 28/05/15 21:06, Simon Clubley wrote: >> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >>> >>> Another data point: I'm optimizing at O1. When I build at O0, it >>> works. >>> >>> >> In that case, try my suggestion of marking the asm statement itself as >> volatile. >> >> Simon. >> >> > That is almost certainly the issue. > > Some compilers consider inline assembly as "volatile" - they view them > as something scary, and make sure everything before them is completely > finished before executing the secret assembly code, and basically turn > off all optimisation around the inline assembly call. > > gcc (and clang, and a few other compilers) is not like that - it > provides ways for the programmer to tell the compiler exactly what the > assembly code affects or depends on, so that it can optimise around it. > This is extremely useful for some sorts of inline assembly, and it > lets you make good use of processor instructions that cannot easily be > expressed in C (such as a bit reverse instruction) with only the bare > minimum being written in assembly. It also means you don't have to mess > around with things like the "primask_copy" variable in this CProtect > class - gcc understands these things, and makes copies in registers as > needed. > > The flipside is that you have to know the rules, and be very careful to > apply them. > > A key rule here is "volatile". A normal inline assembly instruction is > considered non-volatile - the compiler is free to omit it if it is dead > code, and can re-order it as it finds convenient. (Inline assembly > statements with no outputs, and whose inputs don't involve addresses, > are considered "volatile" by default as they would be pointless if they > didn't do something unknown to the compiler.) So step one is to make > the inline assembly codes "volatile" so the compiler knows it has > execute them, and it has to do so in order. > > The second key rule is the interaction of "volatile" accesses (either > volatile reads and writes, volatile inline assembly, or calls to unknown > external code) and normal accesses. C does not specify this ordering in > any way. So in code like this: > > int a; > volatile int v; > > void foo(void) { > a = 0; > v = 1; > a++; > v = 2; > a++; > } > > the compiler can re-arrange writes to "a" with writes to "v". It can > replace all accesses to a with a "a = 2;", and it can put that before, > in the middle, or at the end of the two volatile writes to v. > > The same applies to volatile assembly. > > Consider this: > > uint64_t big; > > void atomic_write(uint64_t x) { > asm volatile("disableInterrupts"); > big = x; > asm volatile("enableInterrupts"); > } > > This will not work, except by luck - the compiler can re-order the write > to "big" with respect to the interrupt disable/enable, and therefore > destroy your hopes of making an atomic write. > > The way to deal with this is either by making the write to "big" > volatile, to add artificial volatile dependencies that enforce the > order, or by using "clobbers" in the assembly statements. Clobbers can > be quite sophisticated when you want to get the maximal performance (by > using minimal clobbers), but the easiest and therefore safest method is > to clobber "memory": > > void atomic_write(uint64_t x) { > asm volatile("disableInterrupts" ::: "memory"); > big = x; > asm volatile("enableInterrupts" ::: "memory"); > } > > The memory clobber tells the compiler that the inline assembly might > read or write memory in unexpected ways - all statements that logically > write something to memory that appear before the inline assembly, must > complete those writes. And any logical reads from memory after the > inline assembly, cannot be started until after the assembly. Data from > memory cannot be cached in registers across the assembly. > > This is often used with an empty inline assembly: > > static inline void compilerBarrier(void) { > asm volatile("" ::: "memory"); > } > > > Once we have cleaned up the other minor issues in your class (the > unnecessary "volatile" on the private member, the unnecessary typedef, > the use of "int" instead of "uint32_t", and the use of reserved > identifiers with leading underscores), we get this: > > > #include <stdint.h> > > class CProtect { > public : > CProtect(void) { > asm volatile("mrs %[primask_], primask\n" > "cpsid i" > : [primask_] "=r" (primask_) : : "memory"); > } > > ~CProtect() { > asm volatile("msr primask, %[primask_]" > : : [primask_] "r" (primask_) > : "memory"); > } > private : > uint32_t primask_; > }; > > extern uint64_t big; > > void atomic_write(uint64_t x) { > CProtect protect; big = x; > } > > > Compiling with this command line (using the usual optimisation setting > -Os): > > /opt/Freescale/KDS_2.0.0/toolchain/bin/arm-none-eabi-gcc -c a.cpp -Wall > -Wextra -Wa,-ahdsl -Os -mcpu=cortex-m4 -mthumb > > gives this assembly: > > > 21 _Z12atomic_writey: > 22 .fnstart 23 .LFB6: > 24 @ args = 0, pretend = 0, frame = 0
25
> @ frame_needed = 0, uses_anonymous_args = 0 26
@ link
> register save eliminated. > 27 @ 9 "a.cpp" 1 28 0000 EFF31083 mrs r3,
primask 29
> 0004 72B6 cpsid i 30 @ 0 "" 2
31
> .thumb 32 0006 034A ldr r2, .L2 33 0008 C2E90001
strd r0,
> [r2] > 34 @ 15 "a.cpp" 1 35 000c 83F31088
msr primask, r3 36
> @ 0 "" 2 37 .thumb 38
0010 7047 bx
> lr 39 .L3: > 40 0012 00BF .align 2 41 .L2: > 42 0014 00000000 .word big 43
.cantunwind 44
> .fnend > > > And that, I believe, is both correct and optimal.
Wow. Thank you for that. I try to understand the whys and the wherefores of "asm", but sometimes I fall short -- maybe you need to write compilers to get it all. (OTOH, if you need a control loop closed -- call me). That code has been around for a long time, but I dimly recall that part of the reason I used the "primask_copy" stuff was because if the object should be built on the stack rather than using a register, gcc or the assembler would bitch because the mrs instruction only works with registers. If "primask_" ends up in a memory location then the compiler or assembler would just halt with a nastygram to the effect that you're not doing your job right. My assumption was that using "primask_copy" would keep the thing from bombing in the event that the object was created on the stack, but any side effects from using it (except for the volatile part, which was my fault) would get optimized away. So wouldn't it be better to continue to use "primask_copy", but take away its volatility and that of "primask_"? -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On Thu, 28 May 2015 16:00:35 -0500, Tim Wescott wrote:

> On Thu, 28 May 2015 21:57:00 +0200, David Brown wrote: > >> On 28/05/15 21:06, Simon Clubley wrote: >>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote: >>>> >>>> Another data point: I'm optimizing at O1. When I build at O0, it >>>> works. >>>> >>>> >>> In that case, try my suggestion of marking the asm statement itself as >>> volatile. >>> >>> Simon. >>> >>> >> That is almost certainly the issue. >> >> Some compilers consider inline assembly as "volatile" - they view them >> as something scary, and make sure everything before them is completely >> finished before executing the secret assembly code, and basically turn >> off all optimisation around the inline assembly call. >> >> gcc (and clang, and a few other compilers) is not like that - it >> provides ways for the programmer to tell the compiler exactly what the >> assembly code affects or depends on, so that it can optimise around it. >> This is extremely useful for some sorts of inline assembly, and it >> lets you make good use of processor instructions that cannot easily be >> expressed in C (such as a bit reverse instruction) with only the bare >> minimum being written in assembly. It also means you don't have to >> mess around with things like the "primask_copy" variable in this >> CProtect class - gcc understands these things, and makes copies in >> registers as needed. >> >> The flipside is that you have to know the rules, and be very careful to >> apply them. >> >> A key rule here is "volatile". A normal inline assembly instruction is >> considered non-volatile - the compiler is free to omit it if it is dead >> code, and can re-order it as it finds convenient. (Inline assembly >> statements with no outputs, and whose inputs don't involve addresses, >> are considered "volatile" by default as they would be pointless if they >> didn't do something unknown to the compiler.) So step one is to make >> the inline assembly codes "volatile" so the compiler knows it has >> execute them, and it has to do so in order. >> >> The second key rule is the interaction of "volatile" accesses (either >> volatile reads and writes, volatile inline assembly, or calls to >> unknown external code) and normal accesses. C does not specify this >> ordering in any way. So in code like this: >> >> int a; >> volatile int v; >> >> void foo(void) { >> a = 0; >> v = 1; >> a++; >> v = 2; >> a++; >> } >> >> the compiler can re-arrange writes to "a" with writes to "v". It can >> replace all accesses to a with a "a = 2;", and it can put that before, >> in the middle, or at the end of the two volatile writes to v. >> >> The same applies to volatile assembly. >> >> Consider this: >> >> uint64_t big; >> >> void atomic_write(uint64_t x) { >> asm volatile("disableInterrupts"); >> big = x; >> asm volatile("enableInterrupts"); >> } >> >> This will not work, except by luck - the compiler can re-order the >> write to "big" with respect to the interrupt disable/enable, and >> therefore destroy your hopes of making an atomic write. >> >> The way to deal with this is either by making the write to "big" >> volatile, to add artificial volatile dependencies that enforce the >> order, or by using "clobbers" in the assembly statements. Clobbers can >> be quite sophisticated when you want to get the maximal performance (by >> using minimal clobbers), but the easiest and therefore safest method is >> to clobber "memory": >> >> void atomic_write(uint64_t x) { >> asm volatile("disableInterrupts" ::: "memory"); >> big = x; >> asm volatile("enableInterrupts" ::: "memory"); >> } >> >> The memory clobber tells the compiler that the inline assembly might >> read or write memory in unexpected ways - all statements that logically >> write something to memory that appear before the inline assembly, must >> complete those writes. And any logical reads from memory after the >> inline assembly, cannot be started until after the assembly. Data from >> memory cannot be cached in registers across the assembly. >> >> This is often used with an empty inline assembly: >> >> static inline void compilerBarrier(void) { >> asm volatile("" ::: "memory"); >> } >> >> >> Once we have cleaned up the other minor issues in your class (the >> unnecessary "volatile" on the private member, the unnecessary typedef, >> the use of "int" instead of "uint32_t", and the use of reserved >> identifiers with leading underscores), we get this: >> >> >> #include <stdint.h> >> >> class CProtect { >> public : >> CProtect(void) { >> asm volatile("mrs %[primask_], primask\n" >> "cpsid i" >> : [primask_] "=r" (primask_) : : "memory"); >> } >> >> ~CProtect() { >> asm volatile("msr primask, %[primask_]" >> : : [primask_] "r" (primask_) >> : "memory"); >> } >> private : >> uint32_t primask_; >> }; >> >> extern uint64_t big; >> >> void atomic_write(uint64_t x) { >> CProtect protect; big = x; >> } >> >> >> Compiling with this command line (using the usual optimisation setting >> -Os): >> >> /opt/Freescale/KDS_2.0.0/toolchain/bin/arm-none-eabi-gcc -c a.cpp -Wall >> -Wextra -Wa,-ahdsl -Os -mcpu=cortex-m4 -mthumb >> >> gives this assembly: >> >> >> 21 _Z12atomic_writey: >> 22 .fnstart 23 .LFB6: >> 24 @ args = 0, pretend = 0, frame = 0 > 25 >> @ frame_needed = 0, uses_anonymous_args = 0 26 > @ link >> register save eliminated. >> 27 @ 9 "a.cpp" 1 28 0000 EFF31083 mrs r3, > primask 29 >> 0004 72B6 cpsid i 30 @ 0 "" 2 > 31 >> .thumb 32 0006 034A ldr r2, .L2 33 0008 C2E90001 > strd r0, >> [r2] >> 34 @ 15 "a.cpp" 1 35 000c 83F31088 > msr primask, r3 36 >> @ 0 "" 2 37 .thumb 38 > 0010 7047 bx >> lr 39 .L3: >> 40 0012 00BF .align 2 41 .L2: >> 42 0014 00000000 .word big 43 > .cantunwind 44 >> .fnend >> >> >> And that, I believe, is both correct and optimal. > > Wow. Thank you for that. I try to understand the whys and the > wherefores of "asm", but sometimes I fall short -- maybe you need to > write compilers to get it all. > > (OTOH, if you need a control loop closed -- call me). > > That code has been around for a long time, but I dimly recall that part > of the reason I used the "primask_copy" stuff was because if the object > should be built on the stack rather than using a register, gcc or the > assembler would bitch because the mrs instruction only works with > registers. If "primask_" ends up in a memory location then the compiler > or assembler would just halt with a nastygram to the effect that you're > not doing your job right. > > My assumption was that using "primask_copy" would keep the thing from > bombing in the event that the object was created on the stack, but any > side effects from using it (except for the volatile part, which was my > fault) would get optimized away. > > So wouldn't it be better to continue to use "primask_copy", but take > away its volatility and that of "primask_"?
This is what I have at the moment. class CProtect { public: CProtect(void) { int primask_copy; asm volatile ("mrs %[primask_copy], primask\n\t" // save interrupt status "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy) : : "memory"); _primask = primask_copy; } ~CProtect() { int primask_copy = _primask; // Restore interrupts to their previous value asm volatile ("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy) : "memory"); } private: int _primask; }; It appears to generate sensible assembly code, and it works when I test it. I haven't tried to force a build of a "CProtect" object on the stack (I'm not sure exactly how, for that matter, other than it would involve lots and lots of local variables). -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
I think you have your answer on this, but speaking as a gcc
maintainer...

> asm("mrs %[primask_copy], primask\n\t" // save interrupt status > "cpsid i\n\t" // disable interrupts > : [primask_copy] "=r" (primask_copy));
Here you have told gcc that you'd doing "something" that will produce a value, which gcc needs to deal with. Since gcc doesn't know what the value will be, and must store it in a volatile member, it has no choice but to run the "something".
> asm("msr primask, %[primask_copy]" : : [primask_copy] > "r" (primask_copy));
In this case, you've told gcc that "something" needs the value of primask_copy, but doesn't do anything visible with it - you have specified no outputs or clobbers, so gcc concludes (based on what you told it!) that the instruction has no side effects and thus gcc is free to omit it if it chooses. If you need gcc to keep your asm around, you need to give it a reason. Either: 1. Add an output constraint that gcc must deal with, like the first asm call. (not appropriate in this case) 2. Indicate that the asm has side effects via a clobber that describes those side effects. Clobbering "all of memory" is one way, but also tells gcc to forget every memory optimization it's made at that point, which is (duh) sub-optimal. 3. Tell gcc to keep the asm despite the lack of visible side effects, which is what "volatile" is for: asm volatile ("something" . . .) The "volatile" tells gcc that the asm does something "unspecified and unoptimizable" - which means not only will gcc keep the asm around, but it will also ensure that other volatile operations specified before the asm happen before the asm, and operations after happen after, i.e. preserving the order of volatile operations. Note: If you use an old-style asm, which has no i/o parameters: asm ("something") gcc assumes the asm is doing something behind the compiler's back (because, in this case, there's no way to tell it otherwise), and leaves it alone. This is not the preferred way of doing things, since you and gcc have no idea if you're going to step on each other's toes.
On Thu, 28 May 2015 17:15:06 -0400, DJ Delorie wrote:

> I think you have your answer on this, but speaking as a gcc > maintainer... > >> asm("mrs %[primask_copy], primask\n\t" // save interrupt status >> "cpsid i\n\t" // disable interrupts : >> [primask_copy] "=r" (primask_copy)); > > Here you have told gcc that you'd doing "something" that will produce a > value, which gcc needs to deal with. Since gcc doesn't know what the > value will be, and must store it in a volatile member, it has no choice > but to run the "something". > >> asm("msr primask, %[primask_copy]" : : [primask_copy] >> "r" (primask_copy)); > > In this case, you've told gcc that "something" needs the value of > primask_copy, but doesn't do anything visible with it - you have > specified no outputs or clobbers, so gcc concludes (based on what you > told it!) that the instruction has no side effects and thus gcc is free > to omit it if it chooses. > > If you need gcc to keep your asm around, you need to give it a reason. > Either: > > 1. Add an output constraint that gcc must deal with, like the first asm > call. (not appropriate in this case) > > 2. Indicate that the asm has side effects via a clobber that describes > those side effects. Clobbering "all of memory" is one way, but also > tells gcc to forget every memory optimization it's made at that > point, which is (duh) sub-optimal. > > 3. Tell gcc to keep the asm despite the lack of visible side effects, > which is what "volatile" is for: > > asm volatile ("something" . . .) > > The "volatile" tells gcc that the asm does something "unspecified and > unoptimizable" - which means not only will gcc keep the asm around, > but it will also ensure that other volatile operations specified > before the asm happen before the asm, and operations after happen > after, i.e. preserving the order of volatile operations. > > Note: If you use an old-style asm, which has no i/o parameters: > > asm ("something") > > gcc assumes the asm is doing something behind the compiler's back > (because, in this case, there's no way to tell it otherwise), and leaves > it alone. This is not the preferred way of doing things, since you and > gcc have no idea if you're going to step on each other's toes.
Thanks DJ. David's method is designed to force the two asm blocks to be before and after the "code in the middle", without asking the developer using CProtect (who is presumably naive, 'cuz it might be me!) to make all of the stuff in the middle volatile or anything else. I think this is more or less in keeping with the intent of my CProtect class, because it turns off all interrupts, which should be done for an absolute minimal amount of time. (While I have occasional help from other developers, I do review code, and I'm persnickety about not egregiously turning off interrupts - you do it for a good reason, and you do it for as short an amount of time as you can.) I don't know if there is a better way to meet this intent, but I'm open to suggestion, or even, occasionally, wisdom. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On 28/05/15 23:29, Tim Wescott wrote:
> On Thu, 28 May 2015 17:15:06 -0400, DJ Delorie wrote: > >> I think you have your answer on this, but speaking as a gcc >> maintainer... >> >>> asm("mrs %[primask_copy], primask\n\t" // save interrupt status >>> "cpsid i\n\t" // disable interrupts : >>> [primask_copy] "=r" (primask_copy)); >> >> Here you have told gcc that you'd doing "something" that will produce a >> value, which gcc needs to deal with. Since gcc doesn't know what the >> value will be, and must store it in a volatile member, it has no choice >> but to run the "something". >> >>> asm("msr primask, %[primask_copy]" : : [primask_copy] >>> "r" (primask_copy)); >> >> In this case, you've told gcc that "something" needs the value of >> primask_copy, but doesn't do anything visible with it - you have >> specified no outputs or clobbers, so gcc concludes (based on what you >> told it!) that the instruction has no side effects and thus gcc is free >> to omit it if it chooses. >> >> If you need gcc to keep your asm around, you need to give it a reason. >> Either: >> >> 1. Add an output constraint that gcc must deal with, like the first asm >> call. (not appropriate in this case) >> >> 2. Indicate that the asm has side effects via a clobber that describes >> those side effects. Clobbering "all of memory" is one way, but also >> tells gcc to forget every memory optimization it's made at that >> point, which is (duh) sub-optimal. >> >> 3. Tell gcc to keep the asm despite the lack of visible side effects, >> which is what "volatile" is for: >> >> asm volatile ("something" . . .) >> >> The "volatile" tells gcc that the asm does something "unspecified and >> unoptimizable" - which means not only will gcc keep the asm around, >> but it will also ensure that other volatile operations specified >> before the asm happen before the asm, and operations after happen >> after, i.e. preserving the order of volatile operations. >> >> Note: If you use an old-style asm, which has no i/o parameters: >> >> asm ("something") >> >> gcc assumes the asm is doing something behind the compiler's back >> (because, in this case, there's no way to tell it otherwise), and leaves >> it alone. This is not the preferred way of doing things, since you and >> gcc have no idea if you're going to step on each other's toes. > > Thanks DJ. > > David's method is designed to force the two asm blocks to be before and > after the "code in the middle", without asking the developer using > CProtect (who is presumably naive, 'cuz it might be me!) to make all of > the stuff in the middle volatile or anything else. > > I think this is more or less in keeping with the intent of my CProtect > class, because it turns off all interrupts, which should be done for an > absolute minimal amount of time. (While I have occasional help from other > developers, I do review code, and I'm persnickety about not egregiously > turning off interrupts - you do it for a good reason, and you do it for > as short an amount of time as you can.) >
That is exactly the point of using a memory clobber (and volatile). As DJ says, it is often overkill - you don't want to use a full memory clobber in the middle of a critical loop, for example. But here it is better to be extra safe than extra fast! And in practical situations for the CProtect class, I think it will seldom be significantly sub-optimal.
> I don't know if there is a better way to meet this intent, but I'm open > to suggestion, or even, occasionally, wisdom. >
I know there are ways to use a more refined clobber list - I /think/ you can just list the variables that you know need to be ordered with respect to the assembly. But I am not experienced enough at them to give advice here, and using a less general clobber would mean removing the "memory" clobber from the CProtect class and requiring class users to have their own guards (asm volatile ("" ::: <clobber-list>); ). That would defeat the ease-of-use and safety of the class.
On 28/05/15 23:00, Tim Wescott wrote:
> On Thu, 28 May 2015 21:57:00 +0200, David Brown wrote: >
<snip>
>> And that, I believe, is both correct and optimal. > > Wow. Thank you for that. I try to understand the whys and the > wherefores of "asm", but sometimes I fall short -- maybe you need to > write compilers to get it all.
You don't need to write compilers to understand this stuff (I would like to be involved in gcc development, but I simply haven't got the time. Maybe when the kids are all grown up). I just find this stuff interesting.
> > (OTOH, if you need a control loop closed -- call me).
I know - I bought one of your books. Unfortunately, I lost it before I'd read much of it! But most of my programs only need fairly simple control systems.
> > That code has been around for a long time, but I dimly recall that part > of the reason I used the "primask_copy" stuff was because if the object > should be built on the stack rather than using a register, gcc or the > assembler would bitch because the mrs instruction only works with > registers. If "primask_" ends up in a memory location then the compiler > or assembler would just halt with a nastygram to the effect that you're > not doing your job right. > > My assumption was that using "primask_copy" would keep the thing from > bombing in the event that the object was created on the stack, but any > side effects from using it (except for the volatile part, which was my > fault) would get optimized away. > > So wouldn't it be better to continue to use "primask_copy", but take away > its volatility and that of "primask_"? >
The "r" in the assembly statement tells gcc that it needs to go in a register - there are other letters used for memory operands, or more general register-or-memory operands (not so useful for the ARM, but nice for m68k or other cpus). So gcc already knows it needs to move primask_ into a register if it is not there from before. Thus there is no need for you to make an extra copy. Of course, the optimiser will remove the extra primask_copy if you have included it, so you still get optimal generated code. But I think it is neater without it in the source.
It's clear that asm volatile() solves the problem but the reason why 
will not be obvious to a casual reader.  I prefer to explicitly disable 
optimization using a pragma bracketing the code so that the asm 
statement (severely non-portable, by the way) will not be optimized away.

JJS