EmbeddedRelated.com
Forums
Memfault Beyond the Launch

gnu compiler optimizes out "asm" statements

Started by Tim Wescott May 28, 2015
On Thu, 28 May 2015 23:37:19 +0200, David Brown wrote:

> On 28/05/15 23:29, Tim Wescott wrote: >> On Thu, 28 May 2015 17:15:06 -0400, DJ Delorie wrote: >> >>> I think you have your answer on this, but speaking as a gcc >>> maintainer... >>> >>>> asm("mrs %[primask_copy], primask\n\t" // save interrupt >>>> status >>>> "cpsid i\n\t" // disable interrupts : >>>> [primask_copy] "=r" (primask_copy)); >>> >>> Here you have told gcc that you'd doing "something" that will produce >>> a value, which gcc needs to deal with. Since gcc doesn't know what >>> the value will be, and must store it in a volatile member, it has no >>> choice but to run the "something". >>> >>>> asm("msr primask, %[primask_copy]" : : [primask_copy] >>>> "r" (primask_copy)); >>> >>> In this case, you've told gcc that "something" needs the value of >>> primask_copy, but doesn't do anything visible with it - you have >>> specified no outputs or clobbers, so gcc concludes (based on what you >>> told it!) that the instruction has no side effects and thus gcc is >>> free to omit it if it chooses. >>> >>> If you need gcc to keep your asm around, you need to give it a reason. >>> Either: >>> >>> 1. Add an output constraint that gcc must deal with, like the first >>> asm >>> call. (not appropriate in this case) >>> >>> 2. Indicate that the asm has side effects via a clobber that describes >>> those side effects. Clobbering "all of memory" is one way, but >>> also tells gcc to forget every memory optimization it's made at >>> that point, which is (duh) sub-optimal. >>> >>> 3. Tell gcc to keep the asm despite the lack of visible side effects, >>> which is what "volatile" is for: >>> >>> asm volatile ("something" . . .) >>> >>> The "volatile" tells gcc that the asm does something "unspecified >>> and unoptimizable" - which means not only will gcc keep the asm >>> around, but it will also ensure that other volatile operations >>> specified before the asm happen before the asm, and operations >>> after happen after, i.e. preserving the order of volatile >>> operations. >>> >>> Note: If you use an old-style asm, which has no i/o parameters: >>> >>> asm ("something") >>> >>> gcc assumes the asm is doing something behind the compiler's back >>> (because, in this case, there's no way to tell it otherwise), and >>> leaves it alone. This is not the preferred way of doing things, since >>> you and gcc have no idea if you're going to step on each other's toes. >> >> Thanks DJ. >> >> David's method is designed to force the two asm blocks to be before and >> after the "code in the middle", without asking the developer using >> CProtect (who is presumably naive, 'cuz it might be me!) to make all of >> the stuff in the middle volatile or anything else. >> >> I think this is more or less in keeping with the intent of my CProtect >> class, because it turns off all interrupts, which should be done for an >> absolute minimal amount of time. (While I have occasional help from >> other developers, I do review code, and I'm persnickety about not >> egregiously turning off interrupts - you do it for a good reason, and >> you do it for as short an amount of time as you can.) >> >> > That is exactly the point of using a memory clobber (and volatile). As > DJ says, it is often overkill - you don't want to use a full memory > clobber in the middle of a critical loop, for example. But here it is > better to be extra safe than extra fast! And in practical situations > for the CProtect class, I think it will seldom be significantly > sub-optimal. > >> I don't know if there is a better way to meet this intent, but I'm open >> to suggestion, or even, occasionally, wisdom. >> >> > I know there are ways to use a more refined clobber list - I /think/ you > can just list the variables that you know need to be ordered with > respect to the assembly. But I am not experienced enough at them to > give advice here, and using a less general clobber would mean removing > the "memory" clobber from the CProtect class and requiring class users > to have their own guards (asm volatile ("" ::: <clobber-list>); ). That > would defeat the ease-of-use and safety of the class.
You perceive exactly how I want to use the class. Indeed, I'm willing to give up a bit of optimal behavior in return for generality and safety. Straying off topic from this thread, but I find that things like this CProtect class are (a) invaluable, and (b) way dangerous in the hands of ninnies. I've seen them used by lazy developers around huge swaths of code "because it makes everything work" when what was really needed was a couple of wrappers around non-atomic operations on variables that were accessed in different threads of execution. Of course, the "everything" that was "fixed" was the part that particular developer was rewarded for making work, not the other time-critical parts of the code that were broken in the process. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On 2015-05-28, John Speth <johnspeth@yahoo.com> wrote:

> It's clear that asm volatile() solves the problem but the reason why > will not be obvious to a casual reader. I prefer to explicitly > disable optimization using a pragma bracketing the code so that the > asm statement (severely non-portable, by the way) will not be > optimized away.
Sorry, that's just plain wrong. The compiler is free to optimize as much as it wants in any manner it wants at _any_time_ _regardless_of_options_. Sure, specifying -O0 might work today using version X.Y.Z of the compiler for this particular target on this particular host. You are not justified in depending on the assumption it won't fail on for a different target, with a different compiler version (either older or newer), on a different host, or even on a different day. [OK the last one is slightly hyperbolc.] Just write correct code. -- Grant Edwards grant.b.edwards Yow! I always have fun at because I'm out of my gmail.com mind!!!
On Thu, 28 May 2015 22:07:41 +0000, Grant Edwards wrote:

> On 2015-05-28, John Speth <johnspeth@yahoo.com> wrote: > >> It's clear that asm volatile() solves the problem but the reason why >> will not be obvious to a casual reader. I prefer to explicitly disable >> optimization using a pragma bracketing the code so that the asm >> statement (severely non-portable, by the way) will not be optimized >> away. > > Sorry, that's just plain wrong. > > The compiler is free to optimize as much as it wants in any manner it > wants at _any_time_ _regardless_of_options_. > > Sure, specifying -O0 might work today using version X.Y.Z of the > compiler for this particular target on this particular host. > > You are not justified in depending on the assumption it won't fail on > for a different target, with a different compiler version (either older > or newer), on a different host, or even on a different day. [OK the > last one is slightly hyperbolc.] > > Just write correct code.
And besides, anything with "asm" in it is automatically weird chit and for the advanced programmer only; the less the casual reader is inclined to become a casual writer, the better. I will admit that in this particular code base the intent of the code is rather under-commented, because at its original writing it was purely for me. I need to change that... -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On Thursday, May 28, 2015 at 4:09:43 PM UTC-4, David Brown wrote:
> It was a good question - I hope my other reply post is useful to others > in this group too.
Your reply was great - thanks for the extremely clear and detailed response to a question for which its really hard to find a concise answer. Thanks! Best Regards, Dave
Grant Edwards <invalid@invalid.invalid> writes:
> The compiler is free to optimize as much as it wants in any manner it > wants at _any_time_ _regardless_of_options_.
Case in point, gcc 5 *will* optimize some even at -O0.
Tim Wescott wrote:
> This is related to my question about interrupts in an STM32F303 > processor. It turns out that the problem is in the compiler (or I'm > going insane, which is never outside the realm of possibility when I'm > working on embedded software). > > I'm coding in C++, and I'm using a clever dodge for protecting chunks of > code from getting interrupted. Basically, I have a class that protects a > block of code from being interrupted. The constructor saves the > interrupt state then disables interrupts, and the destructor restores > interrupts. > > This has been reliable for me for years, but now the destructor is not > being called. I suspect that the optimizer can't make sense of it > because of the asm statements, and is throwing it away. > > If someone knows the proper gnu-magic to tell the optimizer not to do > that, I'd appreciate it. I'm going to look in my documentation, but I > want to make sure I use the right method, and don't just stumble onto > something that works for now but should be depreciated, or is fragile, or > whatever. > > Here's the "protect a block" class: > > typedef class CProtect > { > public: > > CProtect(void) > { > int primask_copy; > asm("mrs %[primask_copy], primask\n\t" // save interrupt status > "cpsid i\n\t" // disable interrupts > : [primask_copy] "=r" (primask_copy)); > _primask = primask_copy; > } > > ~CProtect() > { > int primask_copy = _primask; > // Restore interrupts to their previous value > asm("msr primask, %[primask_copy]" : : [primask_copy] > "r" (primask_copy)); > } > > private: > volatile int _primask; > } CProtect; > > and here's how it's used: > > { > CProtect protect; > > // critical code goes here > } >
There are no side effects in the destructor. Notes: - extern is not extern "C". You can dern tootin' make 'em extern "C" but I don't see the point. - SFAIK, the "extern...inline" combination works. I have no way to test it. "extern...inline' isn't critical; it's just more or less an apology for making things slightly more complex. - If you are not using separate compilation it gets even simpler. I would: #ifndef _CPROTECT_H_ #define _CPROTECT_H_ // CProtect.h extern void inline interruptsOn(void); extern void inline interruptsOff(void); typedef class CProtect { private: public: CProtect(void) { interruptsOff(); } ~CProtect() { interruptsOn(); } } CProtect; #endif // CProtect.cpp #include <CProtect.h> static int _primask; void inline interruptsOff(void) { int primask_copy; asm("mrs %[primask_copy], primask\n\t" // save interrupt status "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy)); _primask = primask_copy; } void inline interruptsOn() { int primask_copy = _primask; // Restore interrupts to their previous value asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)); } -- Les Cargill
On 28/05/15 23:57, Tim Wescott wrote:
> On Thu, 28 May 2015 23:37:19 +0200, David Brown wrote: >
<snip>
>>> >> That is exactly the point of using a memory clobber (and volatile). As >> DJ says, it is often overkill - you don't want to use a full memory >> clobber in the middle of a critical loop, for example. But here it is >> better to be extra safe than extra fast! And in practical situations >> for the CProtect class, I think it will seldom be significantly >> sub-optimal. >> >>> I don't know if there is a better way to meet this intent, but I'm open >>> to suggestion, or even, occasionally, wisdom. >>> >>> >> I know there are ways to use a more refined clobber list - I /think/ you >> can just list the variables that you know need to be ordered with >> respect to the assembly. But I am not experienced enough at them to >> give advice here, and using a less general clobber would mean removing >> the "memory" clobber from the CProtect class and requiring class users >> to have their own guards (asm volatile ("" ::: <clobber-list>); ). That >> would defeat the ease-of-use and safety of the class. > > You perceive exactly how I want to use the class. Indeed, I'm willing to > give up a bit of optimal behavior in return for generality and safety. > > Straying off topic from this thread, but I find that things like this > CProtect class are (a) invaluable, and (b) way dangerous in the hands of > ninnies. I've seen them used by lazy developers around huge swaths of > code "because it makes everything work" when what was really needed was a > couple of wrappers around non-atomic operations on variables that were > accessed in different threads of execution.
It would be nice if there were some standard way to handle these things, to save people making them themselves (and either having to learn more than they wanted to know about compilers, or risk getting it subtly wrong). But there is no standard method in C or C++, because these languages have very little support for multiple threads of execution (either from an RTOS, or interrupt handlers). These also means solutions won't be cross-target, and are usually not even directly portable to different compilers on the same target. There are several alternatives, however. First off - if you are using an RTOS (FreeRTOS, MQX, whatever), the OS will provide macros, functions, types, etc., for atomic access and for safely transferring data between threads. This should always be your first port of call when you have an RTOS. Target libraries sometimes have support for interrupt control. avrlibc, which is usually used with avr-gcc, has macros for making uninterruptable blocks: <http://nongnu.org/avr-libc/user-manual/group__util__atomic.html> The msp430 gcc port has an elegant method - it implements a "critical" function attribute, which causes interrupt state to be saved and disabled at the start of the function, and restored at the end. It works fine with inline functions too. I would like to see that one implemented on other gcc targets (if you have the spare time, DJ ?). There is the type "sig_atomic_t" in <signal.h>, that has existed since the dawn of C. You can use that to make guards around access to bigger data objects, with judicious use of volatile for ordering - the resulting code is a bit awkward, but it can be made portable and it does not need to disable interrupts. It is theoretically subject to livelock, however. C11 and C++11 both provide some atomic and multi-threading support in the language and standard libraries. Unfortunately, these are not well supported, significantly (and subtly) different from each other, overly complicated, under-specified, optionally implemented, poor matches for existing OS's, and very ugly.
> > Of course, the "everything" that was "fixed" was the part that particular > developer was rewarded for making work, not the other time-critical parts > of the code that were broken in the process. >
On 28/05/15 23:51, John Speth wrote:
> It's clear that asm volatile() solves the problem but the reason why > will not be obvious to a casual reader. I prefer to explicitly disable > optimization using a pragma bracketing the code so that the asm > statement (severely non-portable, by the way) will not be optimized away. >
Write /correct/ code - code that depends on optimisation settings for correctness, is wrong code. If the reason for needing "volatile" in the asm statements is not clear to casual readers, then either comment it, or make sure casual readers don't fiddle with things they don't understand.
On 29/05/15 04:23, Les Cargill wrote:
> Tim Wescott wrote: >> This is related to my question about interrupts in an STM32F303 >> processor. It turns out that the problem is in the compiler (or I'm >> going insane, which is never outside the realm of possibility when I'm >> working on embedded software). >> >> I'm coding in C++, and I'm using a clever dodge for protecting chunks of >> code from getting interrupted. Basically, I have a class that protects a >> block of code from being interrupted. The constructor saves the >> interrupt state then disables interrupts, and the destructor restores >> interrupts. >> >> This has been reliable for me for years, but now the destructor is not >> being called. I suspect that the optimizer can't make sense of it >> because of the asm statements, and is throwing it away. >> >> If someone knows the proper gnu-magic to tell the optimizer not to do >> that, I'd appreciate it. I'm going to look in my documentation, but I >> want to make sure I use the right method, and don't just stumble onto >> something that works for now but should be depreciated, or is fragile, or >> whatever. >> >> Here's the "protect a block" class: >> >> typedef class CProtect >> { >> public: >> >> CProtect(void) >> { >> int primask_copy; >> asm("mrs %[primask_copy], primask\n\t" // save interrupt status >> "cpsid i\n\t" // disable interrupts >> : [primask_copy] "=r" (primask_copy)); >> _primask = primask_copy; >> } >> >> ~CProtect() >> { >> int primask_copy = _primask; >> // Restore interrupts to their previous value >> asm("msr primask, %[primask_copy]" : : [primask_copy] >> "r" (primask_copy)); >> } >> >> private: >> volatile int _primask; >> } CProtect; >> >> and here's how it's used: >> >> { >> CProtect protect; >> >> // critical code goes here >> } >> > > > There are no side effects in the destructor. > > Notes: > - extern is not extern "C". You can dern tootin' make 'em > extern "C" but I don't see the point. > - SFAIK, the "extern...inline" combination works. I have > no way to test it. "extern...inline' isn't critical; it's > just more or less an apology for making things slightly > more complex. > - If you are not using separate compilation it gets > even simpler. > > I would: > > #ifndef _CPROTECT_H_ > #define _CPROTECT_H_ > // CProtect.h > > extern void inline interruptsOn(void); > extern void inline interruptsOff(void); > > typedef class CProtect > { > private: > > public: > > CProtect(void) > { > interruptsOff(); > } > > ~CProtect() > { > interruptsOn(); > } > > } CProtect; > #endif > > > > // CProtect.cpp > > #include <CProtect.h> > > static int _primask; > > void inline interruptsOff(void) > { > int primask_copy; > asm("mrs %[primask_copy], primask\n\t" // save interrupt status > "cpsid i\n\t" // disable interrupts > : [primask_copy] "=r" (primask_copy)); > _primask = primask_copy; > } > > void inline interruptsOn() > { > int primask_copy = _primask; > // Restore interrupts to their previous value > asm("msr primask, %[primask_copy]" : : [primask_copy] > "r" (primask_copy)); > } >
This looks like you believe making an external function call enforces a memory order. This is simply incorrect - if the compiler knows the effects the function may have, it can use that information to re-order the code. In particular, it knows something about the data the function could access (it cannot legally access function-local or file static data if the data does not "escape" by passing pointers to it out to unknown code), and it may have full access to the source code through inlining or link-time optimisation. So while the use of external functions might appear to work, and /will/ work in some contexts, it will work by accident of compiler flags and limited compiler capabilities, rather than by design. The use of "inline" (or "extern inline") here is utterly pointless - if the compiler is able to inline the code, then you lose all hope of it working. And when the code is /not/ inlined, you generate significantly bigger and slower code for something that should boil down to 2 instructions for the constructor and 1 instruction for the destructor.
On Thursday, May 28, 2015 at 10:17:18 PM UTC-4, Les Cargill wrote:
> ... > CProtect(void) > { > interruptsOff(); > }
Careful Les, that's a bug if entered in a context where interrupts are already disabled and must remain so...

Memfault Beyond the Launch