gnu compiler optimizes out "asm" statements| page 3

Reply by David Brown ●May 28, 20152015-05-28

On 28/05/15 21:15, Tim Wescott wrote:
> On Thu, 28 May 2015 21:10:08 +0200, Lanarcam wrote:
>
>> Le 28/05/2015 21:04, Tim Wescott a &eacute;crit :
>>> On Thu, 28 May 2015 13:27:31 -0500, Tim Wescott wrote:
>>>
>>>> This is related to my question about interrupts in an STM32F303
>>>> processor.  It turns out that the problem is in the compiler (or I'm
>>>> going insane, which is never outside the realm of possibility when I'm
>>>> working on embedded software).
>>>>
>>>> I'm coding in C++, and I'm using a clever dodge for protecting chunks
>>>> of code from getting interrupted.  Basically, I have a class that
>>>> protects a block of code from being interrupted.  The constructor
>>>> saves the interrupt state then disables interrupts, and the destructor
>>>> restores interrupts.
>>>>
>>>> This has been reliable for me for years, but now the destructor is not
>>>> being called.  I suspect that the optimizer can't make sense of it
>>>> because of the asm statements, and is throwing it away.
>>>>
>>>> If someone knows the proper gnu-magic to tell the optimizer not to do
>>>> that, I'd appreciate it.  I'm going to look in my documentation, but I
>>>> want to make sure I use the right method, and don't just stumble onto
>>>> something that works for now but should be depreciated, or is fragile,
>>>> or whatever.
>>>>
>>>> Here's the "protect a block" class:
>>>>
>>>> typedef class CProtect {
>>>>     public:
>>>>
>>>>     CProtect(void)
>>>>     {
>>>>       int primask_copy;
>>>>       asm("mrs %[primask_copy], primask\n\t"     // save interrupt
>>>>       status
>>>>           "cpsid i\n\t"             // disable interrupts :
>>>>           [primask_copy]
>>>>           "=r" (primask_copy));
>>>>       _primask = primask_copy;
>>>>     }
>>>>
>>>>     ~CProtect()
>>>>     {
>>>>       int primask_copy = _primask;
>>>>       // Restore interrupts to their previous value asm("msr primask,
>>>>       %[primask_copy]" : : [primask_copy]
>>>>           "r" (primask_copy));
>>>>     }
>>>>
>>>>     private:
>>>>     volatile int  _primask;
>>>> } CProtect;
>>>>
>>>> and here's how it's used:
>>>>
>>>>           {
>>>>             CProtect protect;
>>>>
>>>>             // critical code goes here
>>>>           }
>>>
>>> Another data point: I'm optimizing at O1.  When I build at O0, it
>>> works.
>>>
>> Could you try: Cprotect *protect = new Cprotect();
>
> I could, but here at Wescott Design Services we have a fairly hard to
> overcome rule that says "don't thrash the heap".  My boss would kill me,
> which would hurt me twice because I'm my boss.
>

In addition, the code would be bigger, slower, open to out-of-memory or 
memory fragmentation issues, pull in a large chunk of library (if this 
is the only "new"), and it would still be wrong.

Reply by David Brown ●May 28, 20152015-05-28

On 28/05/15 21:57, Tim Wescott wrote:
> On Thu, 28 May 2015 19:26:15 +0000, Simon Clubley wrote:
>
>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>>>
>>> This works (with the optimize attribute specified for each function,
>>> and the level set at O0), but I would like some opinions on whether it
>>> is kosher.  It works even when the overall optimization level is set to
>>> "O3",
>>> which is cool.
>>>
>>> typedef class CProtect {
>>>    public:
>>>
>>>    CProtect(void) __attribute__ ((__optimize__ ("O0")))
>>>    {
>>>      int primask_copy;
>>
>> [Code example snipped.]
>>
>> Sorry Tim, but my initial reaction, in a good natured way, is yuck! :-)
>
> Well, there's a reason I'm tossing it out to the group for comment!

It was a good question - I hope my other reply post is useful to others 
in this group too.

>>
>> The code feels to me like you are trying to trick the compiler instead
>> of solving the core problem and the proposed solution feels "fragile".
>
> Me, too.  Actually, I had been compiling at -O1, possibly because with
> the Cortex M3 processor set it worked at that level but not higher.

It can be legitimate to use optimize attributes to change the way code 
is generated (such as using "-Os" for most of the program, and "-O3" for 
a function where you really want loop unrolling).  But changing 
optimisation levels to hide bugs in the code is rarely a good idea! 
Remember, optimisation levels are hints to the compiler - they are not 
commands, and they do not change the logic of your code.

>
>> Are you sure you can't use "asm volatile" with C++ code ?
>
> I can.  I just can't use "volatile asm".  See my own reply that's
> parallel with yours.
>
>> I don't know if that would solve your problem but if it did, it would
>> feel more "legitimate" to me as volatile is documented to behave in
>> certain ways as you can see from the page I pointed you to.
>>
>
> "asm volatile" certainly seems to fix the issue (which ended up being
> that the optimizer had an extraneous call to part of the constructor, not
> a missing call to the destructor, BTW).
>

The way you write that implies that the compiler was making a mistake - 
the compiler was generating legitimate code from the source given.

And don't forget that you almost certainly need the "memory" clobber, 
unless your protected code contains volatile accesses which force the 
ordering.

Reply by Simon Clubley ●May 28, 20152015-05-28

On 2015-05-28, David Brown <david.brown@hesbynett.no> wrote:
> On 28/05/15 21:57, Tim Wescott wrote:
>> On Thu, 28 May 2015 19:26:15 +0000, Simon Clubley wrote:
>>>
>>> [Code example snipped.]
>>>
>>> Sorry Tim, but my initial reaction, in a good natured way, is yuck! :-)
>>
>> Well, there's a reason I'm tossing it out to the group for comment!
>
> It was a good question - I hope my other reply post is useful to others 
> in this group too.
>

Sorry Tim, I was just trying to make a good natured comment. Sorry if it
didn't come across that way.

>>> I don't know if that would solve your problem but if it did, it would
>>> feel more "legitimate" to me as volatile is documented to behave in
>>> certain ways as you can see from the page I pointed you to.
>>>
>>
>> "asm volatile" certainly seems to fix the issue (which ended up being
>> that the optimizer had an extraneous call to part of the constructor, not
>> a missing call to the destructor, BTW).
>>
>
> The way you write that implies that the compiler was making a mistake - 
> the compiler was generating legitimate code from the source given.
>
> And don't forget that you almost certainly need the "memory" clobber, 
> unless your protected code contains volatile accesses which force the 
> ordering.
>

Indeed. The memory clobber is a lesson I learnt long ago as you can see
from the example line of code I posted. I should have remembered to
mention that one as well.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Tim Wescott ●May 28, 20152015-05-28

On Thu, 28 May 2015 21:57:00 +0200, David Brown wrote:

> On 28/05/15 21:06, Simon Clubley wrote:
>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>>>
>>> Another data point: I'm optimizing at O1.  When I build at O0, it
>>> works.
>>>
>>>
>> In that case, try my suggestion of marking the asm statement itself as
>> volatile.
>>
>> Simon.
>>
>>
> That is almost certainly the issue.
> 
> Some compilers consider inline assembly as "volatile" - they view them
> as something scary, and make sure everything before them is completely
> finished before executing the secret assembly code, and basically turn
> off all optimisation around the inline assembly call.
> 
> gcc (and clang, and a few other compilers) is not like that - it
> provides ways for the programmer to tell the compiler exactly what the
> assembly code affects or depends on, so that it can optimise around it.
>   This is extremely useful for some sorts of inline assembly, and it
> lets you make good use of processor instructions that cannot easily be
> expressed in C (such as a bit reverse instruction) with only the bare
> minimum being written in assembly.  It also means you don't have to mess
> around with things like the "primask_copy" variable in this CProtect
> class - gcc understands these things, and makes copies in registers as
> needed.
> 
> The flipside is that you have to know the rules, and be very careful to
> apply them.
> 
> A key rule here is "volatile".  A normal inline assembly instruction is
> considered non-volatile - the compiler is free to omit it if it is dead
> code, and can re-order it as it finds convenient.  (Inline assembly
> statements with no outputs, and whose inputs don't involve addresses,
> are considered "volatile" by default as they would be pointless if they
> didn't do something unknown to the compiler.)  So step one is to make
> the inline assembly codes "volatile" so the compiler knows it has
> execute them, and it has to do so in order.
> 
> The second key rule is the interaction of "volatile" accesses (either
> volatile reads and writes, volatile inline assembly, or calls to unknown
> external code) and normal accesses.  C does not specify this ordering in
> any way.  So in code like this:
> 
> int a;
> volatile int v;
> 
> void foo(void) {
> 	a = 0;
> 	v = 1;
> 	a++;
> 	v = 2;
> 	a++;
> }
> 
> the compiler can re-arrange writes to "a" with writes to "v".  It can
> replace all accesses to a with a "a = 2;", and it can put that before,
> in the middle, or at the end of the two volatile writes to v.
> 
> The same applies to volatile assembly.
> 
> Consider this:
> 
> uint64_t big;
> 
> void atomic_write(uint64_t x) {
> 	asm volatile("disableInterrupts");
> 	big = x;
> 	asm volatile("enableInterrupts");
> }
> 
> This will not work, except by luck - the compiler can re-order the write
> to "big" with respect to the interrupt disable/enable, and therefore
> destroy your hopes of making an atomic write.
> 
> The way to deal with this is either by making the write to "big"
> volatile, to add artificial volatile dependencies that enforce the
> order, or by using "clobbers" in the assembly statements.  Clobbers can
> be quite sophisticated when you want to get the maximal performance (by
> using minimal clobbers), but the easiest and therefore safest method is
> to clobber "memory":
> 
> void atomic_write(uint64_t x) {
> 	asm volatile("disableInterrupts" ::: "memory");
> 	big = x;
> 	asm volatile("enableInterrupts" ::: "memory");
> }
> 
> The memory clobber tells the compiler that the inline assembly might
> read or write memory in unexpected ways - all statements that logically
> write something to memory that appear before the inline assembly, must
> complete those writes.  And any logical reads from memory after the
> inline assembly, cannot be started until after the assembly.  Data from
> memory cannot be cached in registers across the assembly.
> 
> This is often used with an empty inline assembly:
> 
> static inline void compilerBarrier(void) {
> 	asm volatile("" ::: "memory");
> }
> 
> 
> Once we have cleaned up the other minor issues in your class (the
> unnecessary "volatile" on the private member, the unnecessary typedef,
> the use of "int" instead of "uint32_t", and the use of reserved
> identifiers with leading underscores), we get this:
> 
> 
> #include <stdint.h>
> 
> class CProtect {
> public :
>      CProtect(void) {
>         	asm volatile("mrs %[primask_], primask\n"
>                      "cpsid i"
>                      : [primask_] "=r" (primask_) : : "memory");
>      }
> 
>      ~CProtect() {
>         	asm volatile("msr primask, %[primask_]"
>                      : : [primask_] "r" (primask_)
>                      : "memory");
>      }
> private :
>      uint32_t primask_;
> };
> 
> extern uint64_t big;
> 
> void atomic_write(uint64_t x) {
>      CProtect protect; big = x;
> }
> 
> 
> Compiling with this command line (using the usual optimisation setting
> -Os):
> 
> /opt/Freescale/KDS_2.0.0/toolchain/bin/arm-none-eabi-gcc -c a.cpp -Wall
> -Wextra -Wa,-ahdsl -Os -mcpu=cortex-m4 -mthumb
> 
> gives this assembly:
> 
> 
>    21              	_Z12atomic_writey:
>    22              		.fnstart 23              	.LFB6:
>    24              		@ args = 0, pretend = 0, frame = 0 
25              
>    	@ frame_needed = 0, uses_anonymous_args = 0 26              
		@ link
>    register save eliminated.
>    27              	@ 9 "a.cpp" 1 28 0000 EFF31083 		mrs r3, 
primask 29
>    0004 72B6     	cpsid i 30              	@ 0 "" 2 
31              	
>    .thumb 32 0006 034A     		ldr	r2, .L2 33 0008 C2E90001 
		strd	r0,
>    [r2]
>    34              	@ 15 "a.cpp" 1 35 000c 83F31088 		
msr primask, r3 36
>                 	@ 0 "" 2 37              		.thumb 38 
0010 7047     		bx
>    lr 39              	.L3:
>    40 0012 00BF     		.align	2 41              	.L2:
>    42 0014 00000000 		.word	big 43              
		.cantunwind 44        
>         		.fnend
> 
> 
> And that, I believe, is both correct and optimal.

Wow.  Thank you for that.  I try to understand the whys and the 
wherefores of "asm", but sometimes I fall short -- maybe you need to 
write compilers to get it all.

(OTOH, if you need a control loop closed -- call me).

That code has been around for a long time, but I dimly recall that part 
of the reason I used the "primask_copy" stuff was because if the object 
should be built on the stack rather than using a register, gcc or the 
assembler would bitch because the mrs instruction only works with 
registers.  If "primask_" ends up in a memory location then the compiler 
or assembler would just halt with a nastygram to the effect that you're 
not doing your job right.

My assumption was that using "primask_copy" would keep the thing from 
bombing in the event that the object was created on the stack, but any 
side effects from using it (except for the volatile part, which was my 
fault) would get optimized away.

So wouldn't it be better to continue to use "primask_copy", but take away 
its volatility and that of "primask_"?

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Tim Wescott ●May 28, 20152015-05-28

On Thu, 28 May 2015 16:00:35 -0500, Tim Wescott wrote:

> On Thu, 28 May 2015 21:57:00 +0200, David Brown wrote:
> 
>> On 28/05/15 21:06, Simon Clubley wrote:
>>> On 2015-05-28, Tim Wescott <seemywebsite@myfooter.really> wrote:
>>>>
>>>> Another data point: I'm optimizing at O1.  When I build at O0, it
>>>> works.
>>>>
>>>>
>>> In that case, try my suggestion of marking the asm statement itself as
>>> volatile.
>>>
>>> Simon.
>>>
>>>
>> That is almost certainly the issue.
>> 
>> Some compilers consider inline assembly as "volatile" - they view them
>> as something scary, and make sure everything before them is completely
>> finished before executing the secret assembly code, and basically turn
>> off all optimisation around the inline assembly call.
>> 
>> gcc (and clang, and a few other compilers) is not like that - it
>> provides ways for the programmer to tell the compiler exactly what the
>> assembly code affects or depends on, so that it can optimise around it.
>>   This is extremely useful for some sorts of inline assembly, and it
>> lets you make good use of processor instructions that cannot easily be
>> expressed in C (such as a bit reverse instruction) with only the bare
>> minimum being written in assembly.  It also means you don't have to
>> mess around with things like the "primask_copy" variable in this
>> CProtect class - gcc understands these things, and makes copies in
>> registers as needed.
>> 
>> The flipside is that you have to know the rules, and be very careful to
>> apply them.
>> 
>> A key rule here is "volatile".  A normal inline assembly instruction is
>> considered non-volatile - the compiler is free to omit it if it is dead
>> code, and can re-order it as it finds convenient.  (Inline assembly
>> statements with no outputs, and whose inputs don't involve addresses,
>> are considered "volatile" by default as they would be pointless if they
>> didn't do something unknown to the compiler.)  So step one is to make
>> the inline assembly codes "volatile" so the compiler knows it has
>> execute them, and it has to do so in order.
>> 
>> The second key rule is the interaction of "volatile" accesses (either
>> volatile reads and writes, volatile inline assembly, or calls to
>> unknown external code) and normal accesses.  C does not specify this
>> ordering in any way.  So in code like this:
>> 
>> int a;
>> volatile int v;
>> 
>> void foo(void) {
>> 	a = 0;
>> 	v = 1;
>> 	a++;
>> 	v = 2;
>> 	a++;
>> }
>> 
>> the compiler can re-arrange writes to "a" with writes to "v".  It can
>> replace all accesses to a with a "a = 2;", and it can put that before,
>> in the middle, or at the end of the two volatile writes to v.
>> 
>> The same applies to volatile assembly.
>> 
>> Consider this:
>> 
>> uint64_t big;
>> 
>> void atomic_write(uint64_t x) {
>> 	asm volatile("disableInterrupts");
>> 	big = x;
>> 	asm volatile("enableInterrupts");
>> }
>> 
>> This will not work, except by luck - the compiler can re-order the
>> write to "big" with respect to the interrupt disable/enable, and
>> therefore destroy your hopes of making an atomic write.
>> 
>> The way to deal with this is either by making the write to "big"
>> volatile, to add artificial volatile dependencies that enforce the
>> order, or by using "clobbers" in the assembly statements.  Clobbers can
>> be quite sophisticated when you want to get the maximal performance (by
>> using minimal clobbers), but the easiest and therefore safest method is
>> to clobber "memory":
>> 
>> void atomic_write(uint64_t x) {
>> 	asm volatile("disableInterrupts" ::: "memory");
>> 	big = x;
>> 	asm volatile("enableInterrupts" ::: "memory");
>> }
>> 
>> The memory clobber tells the compiler that the inline assembly might
>> read or write memory in unexpected ways - all statements that logically
>> write something to memory that appear before the inline assembly, must
>> complete those writes.  And any logical reads from memory after the
>> inline assembly, cannot be started until after the assembly.  Data from
>> memory cannot be cached in registers across the assembly.
>> 
>> This is often used with an empty inline assembly:
>> 
>> static inline void compilerBarrier(void) {
>> 	asm volatile("" ::: "memory");
>> }
>> 
>> 
>> Once we have cleaned up the other minor issues in your class (the
>> unnecessary "volatile" on the private member, the unnecessary typedef,
>> the use of "int" instead of "uint32_t", and the use of reserved
>> identifiers with leading underscores), we get this:
>> 
>> 
>> #include <stdint.h>
>> 
>> class CProtect {
>> public :
>>      CProtect(void) {
>>         	asm volatile("mrs %[primask_], primask\n"
>>                      "cpsid i"
>>                      : [primask_] "=r" (primask_) : : "memory");
>>      }
>> 
>>      ~CProtect() {
>>         	asm volatile("msr primask, %[primask_]"
>>                      : : [primask_] "r" (primask_)
>>                      : "memory");
>>      }
>> private :
>>      uint32_t primask_;
>> };
>> 
>> extern uint64_t big;
>> 
>> void atomic_write(uint64_t x) {
>>      CProtect protect; big = x;
>> }
>> 
>> 
>> Compiling with this command line (using the usual optimisation setting
>> -Os):
>> 
>> /opt/Freescale/KDS_2.0.0/toolchain/bin/arm-none-eabi-gcc -c a.cpp -Wall
>> -Wextra -Wa,-ahdsl -Os -mcpu=cortex-m4 -mthumb
>> 
>> gives this assembly:
>> 
>> 
>>    21              	_Z12atomic_writey:
>>    22              		.fnstart 23              	.LFB6:
>>    24              		@ args = 0, pretend = 0, frame = 0
> 25
>>    	@ frame_needed = 0, uses_anonymous_args = 0 26
> 		@ link
>>    register save eliminated.
>>    27              	@ 9 "a.cpp" 1 28 0000 EFF31083 		mrs r3,
> primask 29
>>    0004 72B6     	cpsid i 30              	@ 0 "" 2
> 31
>>    .thumb 32 0006 034A     		ldr	r2, .L2 33 0008 C2E90001
> 		strd	r0,
>>    [r2]
>>    34              	@ 15 "a.cpp" 1 35 000c 83F31088
> msr primask, r3 36
>>                 	@ 0 "" 2 37              		.thumb 38
> 0010 7047     		bx
>>    lr 39              	.L3:
>>    40 0012 00BF     		.align	2 41              	.L2:
>>    42 0014 00000000 		.word	big 43
> 		.cantunwind 44
>>         		.fnend
>> 
>> 
>> And that, I believe, is both correct and optimal.
> 
> Wow.  Thank you for that.  I try to understand the whys and the
> wherefores of "asm", but sometimes I fall short -- maybe you need to
> write compilers to get it all.
> 
> (OTOH, if you need a control loop closed -- call me).
> 
> That code has been around for a long time, but I dimly recall that part
> of the reason I used the "primask_copy" stuff was because if the object
> should be built on the stack rather than using a register, gcc or the
> assembler would bitch because the mrs instruction only works with
> registers.  If "primask_" ends up in a memory location then the compiler
> or assembler would just halt with a nastygram to the effect that you're
> not doing your job right.
> 
> My assumption was that using "primask_copy" would keep the thing from
> bombing in the event that the object was created on the stack, but any
> side effects from using it (except for the volatile part, which was my
> fault) would get optimized away.
> 
> So wouldn't it be better to continue to use "primask_copy", but take
> away its volatility and that of "primask_"?

This is what I have at the moment.

class CProtect
{
  public:

  CProtect(void)
  {
    int primask_copy;
    asm volatile 
       ("mrs %[primask_copy], primask\n\t"  // save interrupt status
        "cpsid i\n\t"                       // disable interrupts
        : [primask_copy] "=r" (primask_copy) : : "memory");
    _primask = primask_copy;
  }

  ~CProtect()
  {
    int primask_copy = _primask;
    // Restore interrupts to their previous value
    asm volatile ("msr primask, %[primask_copy]" : : [primask_copy]
                  "r" (primask_copy) : "memory");
  }

  private:
  int  _primask;
};

It appears to generate sensible assembly code, and it works when I test 
it.

I haven't tried to force a build of a "CProtect" object on the stack (I'm 
not sure exactly how, for that matter, other than it would involve lots 
and lots of local variables).

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by DJ Delorie ●May 28, 20152015-05-28

I think you have your answer on this, but speaking as a gcc
maintainer...

>     asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
>         "cpsid i\n\t"             // disable interrupts
>         : [primask_copy] "=r" (primask_copy));

Here you have told gcc that you'd doing "something" that will produce a
value, which gcc needs to deal with.  Since gcc doesn't know what the
value will be, and must store it in a volatile member, it has no choice
but to run the "something".

>     asm("msr primask, %[primask_copy]" : : [primask_copy]
>         "r" (primask_copy));

In this case, you've told gcc that "something" needs the value of
primask_copy, but doesn't do anything visible with it - you have
specified no outputs or clobbers, so gcc concludes (based on what you
told it!) that the instruction has no side effects and thus gcc is free
to omit it if it chooses.

If you need gcc to keep your asm around, you need to give it a reason.
Either:

1. Add an output constraint that gcc must deal with, like the first asm
   call.  (not appropriate in this case)

2. Indicate that the asm has side effects via a clobber that describes
   those side effects.  Clobbering "all of memory" is one way, but also
   tells gcc to forget every memory optimization it's made at that
   point, which is (duh) sub-optimal.

3. Tell gcc to keep the asm despite the lack of visible side effects,
   which is what "volatile" is for:

   asm volatile ("something" . . .)

   The "volatile" tells gcc that the asm does something "unspecified and
   unoptimizable" - which means not only will gcc keep the asm around,
   but it will also ensure that other volatile operations specified
   before the asm happen before the asm, and operations after happen
   after, i.e.  preserving the order of volatile operations.

Note: If you use an old-style asm, which has no i/o parameters:

   asm ("something")

gcc assumes the asm is doing something behind the compiler's back
(because, in this case, there's no way to tell it otherwise), and leaves
it alone.  This is not the preferred way of doing things, since you and
gcc have no idea if you're going to step on each other's toes.

Reply by Tim Wescott ●May 28, 20152015-05-28

On Thu, 28 May 2015 17:15:06 -0400, DJ Delorie wrote:

> I think you have your answer on this, but speaking as a gcc
> maintainer...
> 
>>     asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
>>         "cpsid i\n\t"             // disable interrupts :
>>         [primask_copy] "=r" (primask_copy));
> 
> Here you have told gcc that you'd doing "something" that will produce a
> value, which gcc needs to deal with.  Since gcc doesn't know what the
> value will be, and must store it in a volatile member, it has no choice
> but to run the "something".
> 
>>     asm("msr primask, %[primask_copy]" : : [primask_copy]
>>         "r" (primask_copy));
> 
> In this case, you've told gcc that "something" needs the value of
> primask_copy, but doesn't do anything visible with it - you have
> specified no outputs or clobbers, so gcc concludes (based on what you
> told it!) that the instruction has no side effects and thus gcc is free
> to omit it if it chooses.
> 
> If you need gcc to keep your asm around, you need to give it a reason.
> Either:
> 
> 1. Add an output constraint that gcc must deal with, like the first asm
>    call.  (not appropriate in this case)
> 
> 2. Indicate that the asm has side effects via a clobber that describes
>    those side effects.  Clobbering "all of memory" is one way, but also
>    tells gcc to forget every memory optimization it's made at that
>    point, which is (duh) sub-optimal.
> 
> 3. Tell gcc to keep the asm despite the lack of visible side effects,
>    which is what "volatile" is for:
> 
>    asm volatile ("something" . . .)
> 
>    The "volatile" tells gcc that the asm does something "unspecified and
>    unoptimizable" - which means not only will gcc keep the asm around,
>    but it will also ensure that other volatile operations specified
>    before the asm happen before the asm, and operations after happen
>    after, i.e.  preserving the order of volatile operations.
> 
> Note: If you use an old-style asm, which has no i/o parameters:
> 
>    asm ("something")
> 
> gcc assumes the asm is doing something behind the compiler's back
> (because, in this case, there's no way to tell it otherwise), and leaves
> it alone.  This is not the preferred way of doing things, since you and
> gcc have no idea if you're going to step on each other's toes.

Thanks DJ.

David's method is designed to force the two asm blocks to be before and 
after the "code in the middle", without asking the developer using 
CProtect (who is presumably naive, 'cuz it might be me!) to make all of 
the stuff in the middle volatile or anything else.

I think this is more or less in keeping with the intent of my CProtect 
class, because it turns off all interrupts, which should be done for an 
absolute minimal amount of time. (While I have occasional help from other 
developers, I do review code, and I'm persnickety about not egregiously 
turning off interrupts - you do it for a good reason, and you do it for 
as short an amount of time as you can.)

I don't know if there is a better way to meet this intent, but I'm open 
to suggestion, or even, occasionally, wisdom.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by David Brown ●May 28, 20152015-05-28

On 28/05/15 23:29, Tim Wescott wrote:
> On Thu, 28 May 2015 17:15:06 -0400, DJ Delorie wrote:
>
>> I think you have your answer on this, but speaking as a gcc
>> maintainer...
>>
>>>      asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
>>>          "cpsid i\n\t"             // disable interrupts :
>>>          [primask_copy] "=r" (primask_copy));
>>
>> Here you have told gcc that you'd doing "something" that will produce a
>> value, which gcc needs to deal with.  Since gcc doesn't know what the
>> value will be, and must store it in a volatile member, it has no choice
>> but to run the "something".
>>
>>>      asm("msr primask, %[primask_copy]" : : [primask_copy]
>>>          "r" (primask_copy));
>>
>> In this case, you've told gcc that "something" needs the value of
>> primask_copy, but doesn't do anything visible with it - you have
>> specified no outputs or clobbers, so gcc concludes (based on what you
>> told it!) that the instruction has no side effects and thus gcc is free
>> to omit it if it chooses.
>>
>> If you need gcc to keep your asm around, you need to give it a reason.
>> Either:
>>
>> 1. Add an output constraint that gcc must deal with, like the first asm
>>     call.  (not appropriate in this case)
>>
>> 2. Indicate that the asm has side effects via a clobber that describes
>>     those side effects.  Clobbering "all of memory" is one way, but also
>>     tells gcc to forget every memory optimization it's made at that
>>     point, which is (duh) sub-optimal.
>>
>> 3. Tell gcc to keep the asm despite the lack of visible side effects,
>>     which is what "volatile" is for:
>>
>>     asm volatile ("something" . . .)
>>
>>     The "volatile" tells gcc that the asm does something "unspecified and
>>     unoptimizable" - which means not only will gcc keep the asm around,
>>     but it will also ensure that other volatile operations specified
>>     before the asm happen before the asm, and operations after happen
>>     after, i.e.  preserving the order of volatile operations.
>>
>> Note: If you use an old-style asm, which has no i/o parameters:
>>
>>     asm ("something")
>>
>> gcc assumes the asm is doing something behind the compiler's back
>> (because, in this case, there's no way to tell it otherwise), and leaves
>> it alone.  This is not the preferred way of doing things, since you and
>> gcc have no idea if you're going to step on each other's toes.
>
> Thanks DJ.
>
> David's method is designed to force the two asm blocks to be before and
> after the "code in the middle", without asking the developer using
> CProtect (who is presumably naive, 'cuz it might be me!) to make all of
> the stuff in the middle volatile or anything else.
>
> I think this is more or less in keeping with the intent of my CProtect
> class, because it turns off all interrupts, which should be done for an
> absolute minimal amount of time. (While I have occasional help from other
> developers, I do review code, and I'm persnickety about not egregiously
> turning off interrupts - you do it for a good reason, and you do it for
> as short an amount of time as you can.)
>

That is exactly the point of using a memory clobber (and volatile).  As 
DJ says, it is often overkill - you don't want to use a full memory 
clobber in the middle of a critical loop, for example.  But here it is 
better to be extra safe than extra fast!  And in practical situations 
for the CProtect class, I think it will seldom be significantly sub-optimal.

> I don't know if there is a better way to meet this intent, but I'm open
> to suggestion, or even, occasionally, wisdom.
>

I know there are ways to use a more refined clobber list - I /think/ you 
can just list the variables that you know need to be ordered with 
respect to the assembly.  But I am not experienced enough at them to 
give advice here, and using a less general clobber would mean removing 
the "memory" clobber from the CProtect class and requiring class users 
to have their own guards (asm volatile ("" ::: <clobber-list>); ).  That 
would defeat the ease-of-use and safety of the class.

Reply by David Brown ●May 28, 20152015-05-28

On 28/05/15 23:00, Tim Wescott wrote:
> On Thu, 28 May 2015 21:57:00 +0200, David Brown wrote:
>
<snip>
>> And that, I believe, is both correct and optimal.
>
> Wow.  Thank you for that.  I try to understand the whys and the
> wherefores of "asm", but sometimes I fall short -- maybe you need to
> write compilers to get it all.

You don't need to write compilers to understand this stuff (I would like 
to be involved in gcc development, but I simply haven't got the time. 
Maybe when the kids are all grown up).  I just find this stuff interesting.

>
> (OTOH, if you need a control loop closed -- call me).

I know - I bought one of your books.  Unfortunately, I lost it before 
I'd read much of it!  But most of my programs only need fairly simple 
control systems.

>
> That code has been around for a long time, but I dimly recall that part
> of the reason I used the "primask_copy" stuff was because if the object
> should be built on the stack rather than using a register, gcc or the
> assembler would bitch because the mrs instruction only works with
> registers.  If "primask_" ends up in a memory location then the compiler
> or assembler would just halt with a nastygram to the effect that you're
> not doing your job right.
>
> My assumption was that using "primask_copy" would keep the thing from
> bombing in the event that the object was created on the stack, but any
> side effects from using it (except for the volatile part, which was my
> fault) would get optimized away.
>
> So wouldn't it be better to continue to use "primask_copy", but take away
> its volatility and that of "primask_"?
>

The "r" in the assembly statement tells gcc that it needs to go in a 
register - there are other letters used for memory operands, or more 
general register-or-memory operands (not so useful for the ARM, but nice 
for m68k or other cpus).  So gcc already knows it needs to move primask_ 
into a register if it is not there from before.  Thus there is no need 
for you to make an extra copy.

Of course, the optimiser will remove the extra primask_copy if you have 
included it, so you still get optimal generated code.  But I think it is 
neater without it in the source.

Reply by John Speth ●May 28, 20152015-05-28

It's clear that asm volatile() solves the problem but the reason why 
will not be obvious to a casual reader.  I prefer to explicitly disable 
optimization using a pragma bracketing the code so that the asm 
statement (severely non-portable, by the way) will not be optimized away.

JJS

Previous 1 234 5 6 Next

gnu compiler optimizes out "asm" statements

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group