gnu compiler optimizes out "asm" statements| page 4

Reply by Tim Wescott ●May 28, 20152015-05-28

On Thu, 28 May 2015 23:37:19 +0200, David Brown wrote:

> On 28/05/15 23:29, Tim Wescott wrote:
>> On Thu, 28 May 2015 17:15:06 -0400, DJ Delorie wrote:
>>
>>> I think you have your answer on this, but speaking as a gcc
>>> maintainer...
>>>
>>>>      asm("mrs %[primask_copy], primask\n\t"     // save interrupt
>>>>      status
>>>>          "cpsid i\n\t"             // disable interrupts :
>>>>          [primask_copy] "=r" (primask_copy));
>>>
>>> Here you have told gcc that you'd doing "something" that will produce
>>> a value, which gcc needs to deal with.  Since gcc doesn't know what
>>> the value will be, and must store it in a volatile member, it has no
>>> choice but to run the "something".
>>>
>>>>      asm("msr primask, %[primask_copy]" : : [primask_copy]
>>>>          "r" (primask_copy));
>>>
>>> In this case, you've told gcc that "something" needs the value of
>>> primask_copy, but doesn't do anything visible with it - you have
>>> specified no outputs or clobbers, so gcc concludes (based on what you
>>> told it!) that the instruction has no side effects and thus gcc is
>>> free to omit it if it chooses.
>>>
>>> If you need gcc to keep your asm around, you need to give it a reason.
>>> Either:
>>>
>>> 1. Add an output constraint that gcc must deal with, like the first
>>> asm
>>>     call.  (not appropriate in this case)
>>>
>>> 2. Indicate that the asm has side effects via a clobber that describes
>>>     those side effects.  Clobbering "all of memory" is one way, but
>>>     also tells gcc to forget every memory optimization it's made at
>>>     that point, which is (duh) sub-optimal.
>>>
>>> 3. Tell gcc to keep the asm despite the lack of visible side effects,
>>>     which is what "volatile" is for:
>>>
>>>     asm volatile ("something" . . .)
>>>
>>>     The "volatile" tells gcc that the asm does something "unspecified
>>>     and unoptimizable" - which means not only will gcc keep the asm
>>>     around, but it will also ensure that other volatile operations
>>>     specified before the asm happen before the asm, and operations
>>>     after happen after, i.e.  preserving the order of volatile
>>>     operations.
>>>
>>> Note: If you use an old-style asm, which has no i/o parameters:
>>>
>>>     asm ("something")
>>>
>>> gcc assumes the asm is doing something behind the compiler's back
>>> (because, in this case, there's no way to tell it otherwise), and
>>> leaves it alone.  This is not the preferred way of doing things, since
>>> you and gcc have no idea if you're going to step on each other's toes.
>>
>> Thanks DJ.
>>
>> David's method is designed to force the two asm blocks to be before and
>> after the "code in the middle", without asking the developer using
>> CProtect (who is presumably naive, 'cuz it might be me!) to make all of
>> the stuff in the middle volatile or anything else.
>>
>> I think this is more or less in keeping with the intent of my CProtect
>> class, because it turns off all interrupts, which should be done for an
>> absolute minimal amount of time. (While I have occasional help from
>> other developers, I do review code, and I'm persnickety about not
>> egregiously turning off interrupts - you do it for a good reason, and
>> you do it for as short an amount of time as you can.)
>>
>>
> That is exactly the point of using a memory clobber (and volatile).  As
> DJ says, it is often overkill - you don't want to use a full memory
> clobber in the middle of a critical loop, for example.  But here it is
> better to be extra safe than extra fast!  And in practical situations
> for the CProtect class, I think it will seldom be significantly
> sub-optimal.
> 
>> I don't know if there is a better way to meet this intent, but I'm open
>> to suggestion, or even, occasionally, wisdom.
>>
>>
> I know there are ways to use a more refined clobber list - I /think/ you
> can just list the variables that you know need to be ordered with
> respect to the assembly.  But I am not experienced enough at them to
> give advice here, and using a less general clobber would mean removing
> the "memory" clobber from the CProtect class and requiring class users
> to have their own guards (asm volatile ("" ::: <clobber-list>); ).  That
> would defeat the ease-of-use and safety of the class.

You perceive exactly how I want to use the class.  Indeed, I'm willing to 
give up a bit of optimal behavior in return for generality and safety.

Straying off topic from this thread, but I find that things like this 
CProtect class are (a) invaluable, and (b) way dangerous in the hands of 
ninnies.  I've seen them used by lazy developers around huge swaths of 
code "because it makes everything work" when what was really needed was a 
couple of wrappers around non-atomic operations on variables that were 
accessed in different threads of execution.

Of course, the "everything" that was "fixed" was the part that particular 
developer was rewarded for making work, not the other time-critical parts 
of the code that were broken in the process.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Grant Edwards ●May 28, 20152015-05-28

On 2015-05-28, John Speth <johnspeth@yahoo.com> wrote:

> It's clear that asm volatile() solves the problem but the reason why 
> will not be obvious to a casual reader.  I prefer to explicitly
> disable optimization using a pragma bracketing the code so that the
> asm statement (severely non-portable, by the way) will not be
> optimized away.

Sorry, that's just plain wrong.

The compiler is free to optimize as much as it wants in any manner it
wants at _any_time_ _regardless_of_options_.

Sure, specifying -O0 might work today using version X.Y.Z of the
compiler for this particular target on this particular host.

You are not justified in depending on the assumption it won't fail on
for a different target, with a different compiler version (either
older or newer), on a different host, or even on a different day.  [OK
the last one is slightly hyperbolc.]

Just write correct code.

-- 
Grant Edwards               grant.b.edwards        Yow! I always have fun
                                  at               because I'm out of my
                              gmail.com            mind!!!

Reply by Tim Wescott ●May 28, 20152015-05-28

On Thu, 28 May 2015 22:07:41 +0000, Grant Edwards wrote:

> On 2015-05-28, John Speth <johnspeth@yahoo.com> wrote:
> 
>> It's clear that asm volatile() solves the problem but the reason why
>> will not be obvious to a casual reader.  I prefer to explicitly disable
>> optimization using a pragma bracketing the code so that the asm
>> statement (severely non-portable, by the way) will not be optimized
>> away.
> 
> Sorry, that's just plain wrong.
> 
> The compiler is free to optimize as much as it wants in any manner it
> wants at _any_time_ _regardless_of_options_.
> 
> Sure, specifying -O0 might work today using version X.Y.Z of the
> compiler for this particular target on this particular host.
> 
> You are not justified in depending on the assumption it won't fail on
> for a different target, with a different compiler version (either older
> or newer), on a different host, or even on a different day.  [OK the
> last one is slightly hyperbolc.]
> 
> Just write correct code.

And besides, anything with "asm" in it is automatically weird chit and 
for the advanced programmer only; the less the casual reader is inclined 
to become a casual writer, the better.

I will admit that in this particular code base the intent of the code is 
rather under-commented, because at its original writing it was purely for 
me.  I need to change that...

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Dave Nadler ●May 28, 20152015-05-28

On Thursday, May 28, 2015 at 4:09:43 PM UTC-4, David Brown wrote:
> It was a good question - I hope my other reply post is useful to others 
> in this group too.

Your reply was great - thanks for the extremely clear and detailed
response to a question for which its really hard to find a concise
answer.
Thanks!
Best Regards, Dave

Reply by DJ Delorie ●May 28, 20152015-05-28

Grant Edwards <invalid@invalid.invalid> writes:
> The compiler is free to optimize as much as it wants in any manner it
> wants at _any_time_ _regardless_of_options_.

Case in point, gcc 5 *will* optimize some even at -O0.

Reply by Les Cargill ●May 28, 20152015-05-28

Tim Wescott wrote:
> This is related to my question about interrupts in an STM32F303
> processor.  It turns out that the problem is in the compiler (or I'm
> going insane, which is never outside the realm of possibility when I'm
> working on embedded software).
>
> I'm coding in C++, and I'm using a clever dodge for protecting chunks of
> code from getting interrupted.  Basically, I have a class that protects a
> block of code from being interrupted.  The constructor saves the
> interrupt state then disables interrupts, and the destructor restores
> interrupts.
>
> This has been reliable for me for years, but now the destructor is not
> being called.  I suspect that the optimizer can't make sense of it
> because of the asm statements, and is throwing it away.
>
> If someone knows the proper gnu-magic to tell the optimizer not to do
> that, I'd appreciate it.  I'm going to look in my documentation, but I
> want to make sure I use the right method, and don't just stumble onto
> something that works for now but should be depreciated, or is fragile, or
> whatever.
>
> Here's the "protect a block" class:
>
> typedef class CProtect
> {
>    public:
>
>    CProtect(void)
>    {
>      int primask_copy;
>      asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
>          "cpsid i\n\t"             // disable interrupts
>          : [primask_copy] "=r" (primask_copy));
>      _primask = primask_copy;
>    }
>
>    ~CProtect()
>    {
>      int primask_copy = _primask;
>      // Restore interrupts to their previous value
>      asm("msr primask, %[primask_copy]" : : [primask_copy]
>          "r" (primask_copy));
>    }
>
>    private:
>    volatile int  _primask;
> } CProtect;
>
> and here's how it's used:
>
>          {
>            CProtect protect;
>
>            // critical code goes here
>          }
>


There are no side effects in the destructor.

Notes:
	- extern is not extern "C". You can dern tootin' make 'em
           extern "C" but I don't see the point.
	- SFAIK, the "extern...inline" combination works. I have
	  no way to test it. "extern...inline' isn't critical; it's
           just  more or less an apology for making things slightly
           more complex.
        -  If you are not using separate compilation it gets
	  even simpler.

I would:

#ifndef _CPROTECT_H_
#define _CPROTECT_H_
// CProtect.h

extern void inline interruptsOn(void);
extern void inline interruptsOff(void);

typedef class CProtect
{
   private:

   public:

   CProtect(void)
   {
       interruptsOff();
   }

   ~CProtect()
   {
       interruptsOn();
   }

} CProtect;
#endif



// CProtect.cpp

#include <CProtect.h>

static int  _primask;

void inline interruptsOff(void)
{
     int primask_copy;
     asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
         "cpsid i\n\t"             // disable interrupts
         : [primask_copy] "=r" (primask_copy));
     _primask = primask_copy;
}

void inline interruptsOn()
{
     int primask_copy = _primask;
     // Restore interrupts to their previous value
     asm("msr primask, %[primask_copy]" : : [primask_copy]
         "r" (primask_copy));
}

-- 
Les Cargill

Reply by David Brown ●May 29, 20152015-05-29

On 28/05/15 23:57, Tim Wescott wrote:
> On Thu, 28 May 2015 23:37:19 +0200, David Brown wrote:
> 
<snip>
>>>
>> That is exactly the point of using a memory clobber (and volatile).  As
>> DJ says, it is often overkill - you don't want to use a full memory
>> clobber in the middle of a critical loop, for example.  But here it is
>> better to be extra safe than extra fast!  And in practical situations
>> for the CProtect class, I think it will seldom be significantly
>> sub-optimal.
>>
>>> I don't know if there is a better way to meet this intent, but I'm open
>>> to suggestion, or even, occasionally, wisdom.
>>>
>>>
>> I know there are ways to use a more refined clobber list - I /think/ you
>> can just list the variables that you know need to be ordered with
>> respect to the assembly.  But I am not experienced enough at them to
>> give advice here, and using a less general clobber would mean removing
>> the "memory" clobber from the CProtect class and requiring class users
>> to have their own guards (asm volatile ("" ::: <clobber-list>); ).  That
>> would defeat the ease-of-use and safety of the class.
> 
> You perceive exactly how I want to use the class.  Indeed, I'm willing to 
> give up a bit of optimal behavior in return for generality and safety.
> 
> Straying off topic from this thread, but I find that things like this 
> CProtect class are (a) invaluable, and (b) way dangerous in the hands of 
> ninnies.  I've seen them used by lazy developers around huge swaths of 
> code "because it makes everything work" when what was really needed was a 
> couple of wrappers around non-atomic operations on variables that were 
> accessed in different threads of execution.

It would be nice if there were some standard way to handle these things,
to save people making them themselves (and either having to learn more
than they wanted to know about compilers, or risk getting it subtly
wrong).  But there is no standard method in C or C++, because these
languages have very little support for multiple threads of execution
(either from an RTOS, or interrupt handlers).  These also means
solutions won't be cross-target, and are usually not even directly
portable to different compilers on the same target.

There are several alternatives, however.  First off - if you are using
an RTOS (FreeRTOS, MQX, whatever), the OS will provide macros,
functions, types, etc., for atomic access and for safely transferring
data between threads.  This should always be your first port of call
when you have an RTOS.

Target libraries sometimes have support for interrupt control.  avrlibc,
which is usually used with avr-gcc, has macros for making
uninterruptable blocks:

<http://nongnu.org/avr-libc/user-manual/group__util__atomic.html>

The msp430 gcc port has an elegant method - it implements a "critical"
function attribute, which causes interrupt state to be saved and
disabled at the start of the function, and restored at the end.  It
works fine with inline functions too.  I would like to see that one
implemented on other gcc targets (if you have the spare time, DJ ?).

There is the type "sig_atomic_t" in <signal.h>, that has existed since
the dawn of C.  You can use that to make guards around access to bigger
data objects, with judicious use of volatile for ordering - the
resulting code is a bit awkward, but it can be made portable and it does
not need to disable interrupts.  It is theoretically subject to
livelock, however.

C11 and C++11 both provide some atomic and multi-threading support in
the language and standard libraries.  Unfortunately, these are not well
supported, significantly (and subtly) different from each other, overly
complicated, under-specified, optionally implemented, poor matches for
existing OS's, and very ugly.

> 
> Of course, the "everything" that was "fixed" was the part that particular 
> developer was rewarded for making work, not the other time-critical parts 
> of the code that were broken in the process.
>

Reply by David Brown ●May 29, 20152015-05-29

On 28/05/15 23:51, John Speth wrote:
> It's clear that asm volatile() solves the problem but the reason why
> will not be obvious to a casual reader.  I prefer to explicitly disable
> optimization using a pragma bracketing the code so that the asm
> statement (severely non-portable, by the way) will not be optimized away.
> 

Write /correct/ code - code that depends on optimisation settings for
correctness, is wrong code.

If the reason for needing "volatile" in the asm statements is not clear
to casual readers, then either comment it, or make sure casual readers
don't fiddle with things they don't understand.

Reply by David Brown ●May 29, 20152015-05-29

On 29/05/15 04:23, Les Cargill wrote:
> Tim Wescott wrote:
>> This is related to my question about interrupts in an STM32F303
>> processor.  It turns out that the problem is in the compiler (or I'm
>> going insane, which is never outside the realm of possibility when I'm
>> working on embedded software).
>>
>> I'm coding in C++, and I'm using a clever dodge for protecting chunks of
>> code from getting interrupted.  Basically, I have a class that protects a
>> block of code from being interrupted.  The constructor saves the
>> interrupt state then disables interrupts, and the destructor restores
>> interrupts.
>>
>> This has been reliable for me for years, but now the destructor is not
>> being called.  I suspect that the optimizer can't make sense of it
>> because of the asm statements, and is throwing it away.
>>
>> If someone knows the proper gnu-magic to tell the optimizer not to do
>> that, I'd appreciate it.  I'm going to look in my documentation, but I
>> want to make sure I use the right method, and don't just stumble onto
>> something that works for now but should be depreciated, or is fragile, or
>> whatever.
>>
>> Here's the "protect a block" class:
>>
>> typedef class CProtect
>> {
>>    public:
>>
>>    CProtect(void)
>>    {
>>      int primask_copy;
>>      asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
>>          "cpsid i\n\t"             // disable interrupts
>>          : [primask_copy] "=r" (primask_copy));
>>      _primask = primask_copy;
>>    }
>>
>>    ~CProtect()
>>    {
>>      int primask_copy = _primask;
>>      // Restore interrupts to their previous value
>>      asm("msr primask, %[primask_copy]" : : [primask_copy]
>>          "r" (primask_copy));
>>    }
>>
>>    private:
>>    volatile int  _primask;
>> } CProtect;
>>
>> and here's how it's used:
>>
>>          {
>>            CProtect protect;
>>
>>            // critical code goes here
>>          }
>>
> 
> 
> There are no side effects in the destructor.
> 
> Notes:
>     - extern is not extern "C". You can dern tootin' make 'em
>           extern "C" but I don't see the point.
>     - SFAIK, the "extern...inline" combination works. I have
>       no way to test it. "extern...inline' isn't critical; it's
>           just  more or less an apology for making things slightly
>           more complex.
>        -  If you are not using separate compilation it gets
>       even simpler.
> 
> I would:
> 
> #ifndef _CPROTECT_H_
> #define _CPROTECT_H_
> // CProtect.h
> 
> extern void inline interruptsOn(void);
> extern void inline interruptsOff(void);
> 
> typedef class CProtect
> {
>   private:
> 
>   public:
> 
>   CProtect(void)
>   {
>       interruptsOff();
>   }
> 
>   ~CProtect()
>   {
>       interruptsOn();
>   }
> 
> } CProtect;
> #endif
> 
> 
> 
> // CProtect.cpp
> 
> #include <CProtect.h>
> 
> static int  _primask;
> 
> void inline interruptsOff(void)
> {
>     int primask_copy;
>     asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
>         "cpsid i\n\t"             // disable interrupts
>         : [primask_copy] "=r" (primask_copy));
>     _primask = primask_copy;
> }
> 
> void inline interruptsOn()
> {
>     int primask_copy = _primask;
>     // Restore interrupts to their previous value
>     asm("msr primask, %[primask_copy]" : : [primask_copy]
>         "r" (primask_copy));
> }
> 

This looks like you believe making an external function call enforces a
memory order.  This is simply incorrect - if the compiler knows the
effects the function may have, it can use that information to re-order
the code.  In particular, it knows something about the data the function
could access (it cannot legally access function-local or file static
data if the data does not "escape" by passing pointers to it out to
unknown code), and it may have full access to the source code through
inlining or link-time optimisation.

So while the use of external functions might appear to work, and /will/
work in some contexts, it will work by accident of compiler flags and
limited compiler capabilities, rather than by design.  The use of
"inline" (or "extern inline") here is utterly pointless - if the
compiler is able to inline the code, then you lose all hope of it
working.  And when the code is /not/ inlined, you generate significantly
bigger and slower code for something that should boil down to 2
instructions for the constructor and 1 instruction for the destructor.

Reply by Dave Nadler ●May 29, 20152015-05-29

On Thursday, May 28, 2015 at 10:17:18 PM UTC-4, Les Cargill wrote:
> ...
>    CProtect(void)
>    {
>        interruptsOff();
>    }

Careful Les, that's a bug if entered in a context where
interrupts are already disabled and must remain so...

Previous 2 345 6 Next

gnu compiler optimizes out "asm" statements

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group