gnu compiler optimizes out "asm" statements| page 2

Reply by Simon Clubley ●May 28, 20152015-05-28

On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
> On Thu, 28 May 2015 19:06:04 +0000, Simon Clubley wrote:
>
>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>>>
>>> Another data point: I'm optimizing at O1.  When I build at O0, it
>>> works.
>>>
>>>
>> In that case, try my suggestion of marking the asm statement itself as
>> volatile.
>> 
>> Simon.
>
> The compiler doesn't allow that.
>

It works for me in C. What syntax are you using ?

Here's one example from a test program:

asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory");

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Tim Wescott ●May 28, 20152015-05-28

On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote:

> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>> On Thu, 28 May 2015 19:06:04 +0000, Simon Clubley wrote:
>>
>>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>>>>
>>>> Another data point: I'm optimizing at O1.  When I build at O0, it
>>>> works.
>>>>
>>>>
>>> In that case, try my suggestion of marking the asm statement itself as
>>> volatile.
>>> 
>>> Simon.
>>
>> The compiler doesn't allow that.
>>
>>
> It works for me in C. What syntax are you using ?
> 
> Here's one example from a test program:
> 
> asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory");

I need to take my brain out and examine it under a microscope to see how 
large it is, apparently.

I was using "volatile asm".  "asm volatile" compiles, and works great, to 
boot.

So -- more kosher than setting the "optimize" attribute of the whole 
function to "O0", do you think?

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Tim Wescott ●May 28, 20152015-05-28

On Thu, 28 May 2015 14:17:33 -0500, Tim Wescott wrote:

> On Thu, 28 May 2015 13:27:31 -0500, Tim Wescott wrote:
> 
>> This is related to my question about interrupts in an STM32F303
>> processor.  It turns out that the problem is in the compiler (or I'm
>> going insane, which is never outside the realm of possibility when I'm
>> working on embedded software).
>> 
>> I'm coding in C++, and I'm using a clever dodge for protecting chunks
>> of code from getting interrupted.  Basically, I have a class that
>> protects a block of code from being interrupted.  The constructor saves
>> the interrupt state then disables interrupts, and the destructor
>> restores interrupts.
>> 
>> This has been reliable for me for years, but now the destructor is not
>> being called.  I suspect that the optimizer can't make sense of it
>> because of the asm statements, and is throwing it away.
>> 
>> If someone knows the proper gnu-magic to tell the optimizer not to do
>> that, I'd appreciate it.  I'm going to look in my documentation, but I
>> want to make sure I use the right method, and don't just stumble onto
>> something that works for now but should be depreciated, or is fragile,
>> or whatever.
>> 
>> Here's the "protect a block" class:
>> 
>> typedef class CProtect {
>>   public:
>> 
>>   CProtect(void)
>>   {
>>     int primask_copy;
>>     asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
>>         "cpsid i\n\t"             // disable interrupts :
>>         [primask_copy]
>>         "=r" (primask_copy));
>>     _primask = primask_copy;
>>   }
>> 
>>   ~CProtect()
>>   {
>>     int primask_copy = _primask;
>>     // Restore interrupts to their previous value asm("msr primask,
>>     %[primask_copy]" : : [primask_copy]
>>         "r" (primask_copy));
>>   }
>> 
>>   private:
>>   volatile int  _primask;
>> } CProtect;
>> 
>> and here's how it's used:
>> 
>>         {
>>           CProtect protect;
>> 
>>           // critical code goes here
>>         }
> 
> This works (with the optimize attribute specified for each function, and
> the level set at O0), but I would like some opinions on whether it is
> kosher.  It works even when the overall optimization level is set to
> "O3",
> which is cool.
> 
> typedef class CProtect {
>   public:
> 
>   CProtect(void) __attribute__ ((__optimize__ ("O0")))
>   {
>     int primask_copy;
>     asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
>         "cpsid i\n\t"             // disable interrupts : [primask_copy]
>         "=r" (primask_copy));
>     _primask = primask_copy;
>   }
> 
>   ~CProtect() __attribute__ ((__optimize__ ("O0")))
>   {
>     int primask_copy = _primask;
>     // Restore interrupts to their previous value asm("msr primask,
>     %[primask_copy]" : : [primask_copy]
>         "r" (primask_copy));
>   }
> 
>   private:
>   volatile int  _primask;
> } CProtect;

This also works (note commented-out optimize attributes, and "asm 
volatile"):

class CProtect
{
  public:

  CProtect(void) // __attribute__ ((__optimize__ ("O0")))
  {
    int primask_copy;
    asm volatile ("mrs %[primask_copy], primask\n\t"  // save interrupt
        "cpsid i\n\t"                   // disable interrupts
        : [primask_copy] "=r" (primask_copy));
    _primask = primask_copy;
  }

  ~CProtect() // __attribute__ ((__optimize__ ("O0")))
  {
    int primask_copy = _primask;
    // Restore interrupts to their previous value
    asm volatile ("msr primask, %[primask_copy]" : : [primask_copy] 
"r" (primask_copy));
  }

  private:
  volatile int  _primask;
};


-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Simon Clubley ●May 28, 20152015-05-28

On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>
> This works (with the optimize attribute specified for each function, and 
> the level set at O0), but I would like some opinions on whether it is 
> kosher.  It works even when the overall optimization level is set to "O3", 
> which is cool.
>
> typedef class CProtect
> {
>   public:
>
>   CProtect(void) __attribute__ ((__optimize__ ("O0")))
>   {
>     int primask_copy;

[Code example snipped.]

Sorry Tim, but my initial reaction, in a good natured way, is yuck! :-)

The code feels to me like you are trying to trick the compiler instead
of solving the core problem and the proposed solution feels "fragile".

Are you sure you can't use "asm volatile" with C++ code ?

I don't know if that would solve your problem but if it did, it would
feel more "legitimate" to me as volatile is documented to behave in
certain ways as you can see from the page I pointed you to.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Simon Clubley ●May 28, 20152015-05-28

On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
> On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote:
>> It works for me in C. What syntax are you using ?
>> 
>> Here's one example from a test program:
>> 
>> asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory");
>
> I need to take my brain out and examine it under a microscope to see how 
> large it is, apparently.
>
> I was using "volatile asm".  "asm volatile" compiles, and works great, to 
> boot.
>
> So -- more kosher than setting the "optimize" attribute of the whole 
> function to "O0", do you think?
>

Certainly (at least based on previous experience).

It will be interesting to see if others agree or if there's any issues
I have not thought of.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Tauno Voipio ●May 28, 20152015-05-28

On 28.5.15 22:30, Simon Clubley wrote:
> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>> On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote:
>>> It works for me in C. What syntax are you using ?
>>>
>>> Here's one example from a test program:
>>>
>>> asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory");
>>
>> I need to take my brain out and examine it under a microscope to see how
>> large it is, apparently.
>>
>> I was using "volatile asm".  "asm volatile" compiles, and works great, to
>> boot.
>>
>> So -- more kosher than setting the "optimize" attribute of the whole
>> function to "O0", do you think?
>>
>
> Certainly (at least based on previous experience).
>
> It will be interesting to see if others agree or if there's any issues
> I have not thought of.
>
> Simon.
>

For embedded code, my favorite is -Os.

-- 

-TV

Reply by Simon Clubley ●May 28, 20152015-05-28

On 2015-05-28, Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote:
> On 28.5.15 22:30, Simon Clubley wrote:
>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>>> On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote:
>>>> It works for me in C. What syntax are you using ?
>>>>
>>>> Here's one example from a test program:
>>>>
>>>> asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory");
>>>
>>> I need to take my brain out and examine it under a microscope to see how
>>> large it is, apparently.
>>>
>>> I was using "volatile asm".  "asm volatile" compiles, and works great, to
>>> boot.
>>>
>>> So -- more kosher than setting the "optimize" attribute of the whole
>>> function to "O0", do you think?
>>>
>>
>> Certainly (at least based on previous experience).
>>
>> It will be interesting to see if others agree or if there's any issues
>> I have not thought of.
>>
>
> For embedded code, my favorite is -Os.
>

Interesting. How does -Os change the behaviour of asm volatile ?

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by David Brown ●May 28, 20152015-05-28

On 28/05/15 21:06, Simon Clubley wrote:
> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>>
>> Another data point: I'm optimizing at O1.  When I build at O0, it works.
>>
>
> In that case, try my suggestion of marking the asm statement itself
> as volatile.
>
> Simon.
>

That is almost certainly the issue.

Some compilers consider inline assembly as "volatile" - they view them 
as something scary, and make sure everything before them is completely 
finished before executing the secret assembly code, and basically turn 
off all optimisation around the inline assembly call.

gcc (and clang, and a few other compilers) is not like that - it 
provides ways for the programmer to tell the compiler exactly what the 
assembly code affects or depends on, so that it can optimise around it. 
  This is extremely useful for some sorts of inline assembly, and it 
lets you make good use of processor instructions that cannot easily be 
expressed in C (such as a bit reverse instruction) with only the bare 
minimum being written in assembly.  It also means you don't have to mess 
around with things like the "primask_copy" variable in this CProtect 
class - gcc understands these things, and makes copies in registers as 
needed.

The flipside is that you have to know the rules, and be very careful to 
apply them.

A key rule here is "volatile".  A normal inline assembly instruction is 
considered non-volatile - the compiler is free to omit it if it is dead 
code, and can re-order it as it finds convenient.  (Inline assembly 
statements with no outputs, and whose inputs don't involve addresses, 
are considered "volatile" by default as they would be pointless if they 
didn't do something unknown to the compiler.)  So step one is to make 
the inline assembly codes "volatile" so the compiler knows it has 
execute them, and it has to do so in order.

The second key rule is the interaction of "volatile" accesses (either 
volatile reads and writes, volatile inline assembly, or calls to unknown 
external code) and normal accesses.  C does not specify this ordering in 
any way.  So in code like this:

int a;
volatile int v;

void foo(void) {
	a = 0;
	v = 1;
	a++;
	v = 2;
	a++;
}

the compiler can re-arrange writes to "a" with writes to "v".  It can 
replace all accesses to a with a "a = 2;", and it can put that before, 
in the middle, or at the end of the two volatile writes to v.

The same applies to volatile assembly.

Consider this:

uint64_t big;

void atomic_write(uint64_t x) {
	asm volatile("disableInterrupts");
	big = x;
	asm volatile("enableInterrupts");
}

This will not work, except by luck - the compiler can re-order the write 
to "big" with respect to the interrupt disable/enable, and therefore 
destroy your hopes of making an atomic write.

The way to deal with this is either by making the write to "big" 
volatile, to add artificial volatile dependencies that enforce the 
order, or by using "clobbers" in the assembly statements.  Clobbers can 
be quite sophisticated when you want to get the maximal performance (by 
using minimal clobbers), but the easiest and therefore safest method is 
to clobber "memory":

void atomic_write(uint64_t x) {
	asm volatile("disableInterrupts" ::: "memory");
	big = x;
	asm volatile("enableInterrupts" ::: "memory");
}

The memory clobber tells the compiler that the inline assembly might 
read or write memory in unexpected ways - all statements that logically 
write something to memory that appear before the inline assembly, must 
complete those writes.  And any logical reads from memory after the 
inline assembly, cannot be started until after the assembly.  Data from 
memory cannot be cached in registers across the assembly.

This is often used with an empty inline assembly:

static inline void compilerBarrier(void) {
	asm volatile("" ::: "memory");
}

Once we have cleaned up the other minor issues in your class (the 
unnecessary "volatile" on the private member, the unnecessary typedef, 
the use of "int" instead of "uint32_t", and the use of reserved 
identifiers with leading underscores), we get this:

#include <stdint.h>

class CProtect {
public :
     CProtect(void) {
        	asm volatile("mrs %[primask_], primask\n"
                     "cpsid i"
                     : [primask_] "=r" (primask_)
                     : : "memory");
     }

     ~CProtect() {
        	asm volatile("msr primask, %[primask_]"
                     : : [primask_] "r" (primask_)
                     : "memory");
     }
private :
     uint32_t primask_;
};

extern uint64_t big;

void atomic_write(uint64_t x) {
     CProtect protect;
     big = x;
}

Compiling with this command line (using the usual optimisation setting -Os):

/opt/Freescale/KDS_2.0.0/toolchain/bin/arm-none-eabi-gcc -c a.cpp -Wall 
-Wextra -Wa,-ahdsl -Os -mcpu=cortex-m4 -mthumb

gives this assembly:

   21              	_Z12atomic_writey:
   22              		.fnstart
   23              	.LFB6:
   24              		@ args = 0, pretend = 0, frame = 0
   25              		@ frame_needed = 0, uses_anonymous_args = 0
   26              		@ link register save eliminated.
   27              	@ 9 "a.cpp" 1
   28 0000 EFF31083 		mrs r3, primask
   29 0004 72B6     	cpsid i
   30              	@ 0 "" 2
   31              		.thumb
   32 0006 034A     		ldr	r2, .L2
   33 0008 C2E90001 		strd	r0, [r2]
   34              	@ 15 "a.cpp" 1
   35 000c 83F31088 		msr primask, r3
   36              	@ 0 "" 2
   37              		.thumb
   38 0010 7047     		bx	lr
   39              	.L3:
   40 0012 00BF     		.align	2
   41              	.L2:
   42 0014 00000000 		.word	big
   43              		.cantunwind
   44              		.fnend

And that, I believe, is both correct and optimal.

Reply by Tim Wescott ●May 28, 20152015-05-28

On Thu, 28 May 2015 19:26:15 +0000, Simon Clubley wrote:

> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>>
>> This works (with the optimize attribute specified for each function,
>> and the level set at O0), but I would like some opinions on whether it
>> is kosher.  It works even when the overall optimization level is set to
>> "O3",
>> which is cool.
>>
>> typedef class CProtect {
>>   public:
>>
>>   CProtect(void) __attribute__ ((__optimize__ ("O0")))
>>   {
>>     int primask_copy;
> 
> [Code example snipped.]
> 
> Sorry Tim, but my initial reaction, in a good natured way, is yuck! :-)

Well, there's a reason I'm tossing it out to the group for comment!
> 
> The code feels to me like you are trying to trick the compiler instead
> of solving the core problem and the proposed solution feels "fragile".

Me, too.  Actually, I had been compiling at -O1, possibly because with 
the Cortex M3 processor set it worked at that level but not higher.

> Are you sure you can't use "asm volatile" with C++ code ?

I can.  I just can't use "volatile asm".  See my own reply that's 
parallel with yours.

> I don't know if that would solve your problem but if it did, it would
> feel more "legitimate" to me as volatile is documented to behave in
> certain ways as you can see from the page I pointed you to.
> 

"asm volatile" certainly seems to fix the issue (which ended up being 
that the optimizer had an extraneous call to part of the constructor, not 
a missing call to the destructor, BTW).

-- 
www.wescottdesign.com

Reply by David Brown ●May 28, 20152015-05-28

On 28/05/15 21:41, Simon Clubley wrote:
> On 2015-05-28, Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote:
>> On 28.5.15 22:30, Simon Clubley wrote:
>>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>>>> On Thu, 28 May 2015 19:18:35 +0000, Simon Clubley wrote:
>>>>> It works for me in C. What syntax are you using ?
>>>>>
>>>>> Here's one example from a test program:
>>>>>
>>>>> asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory");
>>>>
>>>> I need to take my brain out and examine it under a microscope to see how
>>>> large it is, apparently.
>>>>
>>>> I was using "volatile asm".  "asm volatile" compiles, and works great, to
>>>> boot.
>>>>
>>>> So -- more kosher than setting the "optimize" attribute of the whole
>>>> function to "O0", do you think?
>>>>

The general rule is that if you think you need to reduce optimisation to 
make your code work, your code is wrong.  Very occasionally, the 
compiler is broken - but that should be rare indeed.

>>>
>>> Certainly (at least based on previous experience).
>>>
>>> It will be interesting to see if others agree or if there's any issues
>>> I have not thought of.
>>>
>>
>> For embedded code, my favorite is -Os.
>>
>
> Interesting. How does -Os change the behaviour of asm volatile ?
>
> Simon.
>

"-Os" does most of the "-O2" optimisations, except for an emphasis on 
smaller size if the speed optimisation in "-O2" would expand the code 
significantly.  (Note that you still get inlining and occasional loop 
unrolling - but only if the result is smaller code, or if you asked for 
the inlining explicitly.)

As always with optimisation flags, it keeps correct code correct - but 
makes it more likely that poor code (such as missing or incorrect 
volatiles) breaks dramatically.

Previous 123 4 5 6 Next

gnu compiler optimizes out "asm" statements

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group