EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Beware of GNU-ARM compiler for Cortex-M0/M0+/M1

Started by StateMachineCOM October 10, 2017
What level is this -O ?
All I know are -O0, -O1, -O2, -O3, -Os

On 12.10.17 08:22, raimond.dragomir@gmail.com wrote:
> What level is this -O ? > All I know are -O0, -O1, -O2, -O3, -Os >
-O is the same as -O1. My preference for small embedded targets is -Os. -- -TV
On 11/10/17 18:51, David Brown wrote:
> On 11/10/17 18:03, StateMachineCOM wrote: >> Thank you everyone for attention. There is really no need to be >> hostile. I'm NOT trying to sell you anything. I merely didn't have the >> time to distill the problem to be completely "context free". >> >> But I was was able to distill the problem to a relatively small >> snippet of code without any external dependencies or macros. I filed >> this information as an official bug report at GCC-ARM-Embedded, please >> see: >> >> https://bugs.launchpad.net/gcc-arm-embedded/+bug/1722849 >> >> As I experimented with this code, the excessive type casting in the >> condition for the if statement seems to be implicated (the bug goes >> away if I remove some of this type casting). The type casting has been >> added in the first place to satisfy static analysis with PC-Lint for >> MISRA-C compliance. >> > > No hostility was intended - I just want to make sure that this issue is > considered properly, and followed up properly.  I have seen too many > people drop into a newsgroup like this and make claims about compiler > bugs, then disappear (perhaps in embarrassment) when it is their own > code that is found faulty.  I want to push you to follow the thread here > and keep things updated > > Thank you for posting the test code (in the launchpad bug report).  I > can't see anything wrong with the code you wrote so far.  I am a little > short on time just now (it's dinner time here :-) ) but I will do some > experiments with the code as soon as I get the chance, and get back to you.
I have just added this comment to the gcc-arm-embedded bug report linked above: I agree that it is surprising that "asm volatile" statements can be re-arranged with respect to each other, and with respect to other volatile accesses. This seems to be a particular problem with asm statements that return outputs - "volatile" is used primarily to tell the compiler that you can get different outputs at different times, even if the inputs (if any) are the same. For asm statements with no outputs, the compiler appears to assume they have a special function and should not be moved. As far as I know, there is no way in C or as a gcc extension to specify ordering of statements or executable code - you can only specify ordering of memory (via volatile accesses). Even the traditional method of declaring a function (like "foo" in the sample) externally is in danger - with link-time optimisation, the compiler knows everything and can re-arrange bits of "foo" with respect to the asm statements or volatile accesses. A related problem is well documented for the AVR gcc port: http://www.nongnu.org/avr-libc/user-manual/optimization.html There is, however, a solution to all this. (I have told the avr-libc folks about it a number of times, but they have not added my solution to their webpage. I have given up trying to persuade them.) I have three macros defined that I use in circumstances like these: #define forceDependency(val) \ asm volatile("" :: "" (val) : ) #define forgetCompilerKnowledge(v) \ asm ("" : "+g" (v)) #define forgetCompilerBlock(start, size) \ do { typedef struct { char x[size]; } XS; XS *p = (XS *) start; \ asm ("" : "+m" (*p)); \ } while (0); The first one tells the compiler "I am using "val" here, so you have to evaluate it before this point". The second one tells the compiler "I am using "val" here, and I might change it, so you have to evaluate it before this point, but forget anything you know about it". The third one is just another version that can handle data of any size. Putting "forceDependency(status)" after the "mrs" instruction, but before the "foo()" call, ensures that the compiler has evaluated "status" before calling foo. It makes most sense to put it before the "cpsid i" instruction, but that does not appear to be critical. The neatest arrangement is to combine it with the cpsid: uint32_t status; asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: ); asm volatile ("cpsid i" :: "" (status) :); foo(); asm volatile ("msr PRIMASK, %0" :: "r" (status) : );
On 12/10/17 09:09, Tauno Voipio wrote:
> On 12.10.17 08:22, raimond.dragomir@gmail.com wrote: >> What level is this -O ? >> All I know are -O0, -O1, -O2, -O3, -Os >> > > -O is the same as -O1. > > My preference for small embedded targets is -Os. >
I am tending to use -O2 more than -Os for Cortex-M targets. Often the code they generate is almost identical, but -Os can sometimes sacrifice quite a lot of performance for very minor space savings. It will depend on your target, your code, and your requirements, however.
On Thursday, October 12, 2017 at 10:04:55 AM UTC-4, David Brown wrote:
> On 11/10/17 18:51, David Brown wrote: > . . . > Putting "forceDependency(status)" after the "mrs" instruction, but > before the "foo()" call, ensures that the compiler has evaluated > "status" before calling foo. It makes most sense to put it before the > "cpsid i" instruction, but that does not appear to be critical. The > neatest arrangement is to combine it with the cpsid: > > uint32_t status; > asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: ); > asm volatile ("cpsid i" :: "" (status) :); > > foo(); > > asm volatile ("msr PRIMASK, %0" :: "r" (status) : );
Yes! This solution seems to fix the problem! The code compiles correctly at all optimization levels (no explicit optimization, -O, -O1, -O2, -O3, -Os). Thanks a lot, David. --MMS
On 12/10/17 16:43, StateMachineCOM wrote:
> On Thursday, October 12, 2017 at 10:04:55 AM UTC-4, David Brown wrote: >> On 11/10/17 18:51, David Brown wrote: >> . . . >> Putting "forceDependency(status)" after the "mrs" instruction, but >> before the "foo()" call, ensures that the compiler has evaluated >> "status" before calling foo. It makes most sense to put it before the >> "cpsid i" instruction, but that does not appear to be critical. The >> neatest arrangement is to combine it with the cpsid: >> >> uint32_t status; >> asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: ); >> asm volatile ("cpsid i" :: "" (status) :); >> >> foo(); >> >> asm volatile ("msr PRIMASK, %0" :: "r" (status) : ); > > Yes! This solution seems to fix the problem! The code compiles correctly at all optimization levels (no explicit optimization, -O, -O1, -O2, -O3, -Os). > > Thanks a lot, David. >
You are very welcome. I think this kind of stuff is fun, and it's nice when it can help people. David
On 12.10.17 17:04, David Brown wrote:
> On 11/10/17 18:51, David Brown wrote: >> On 11/10/17 18:03, StateMachineCOM wrote: >>> Thank you everyone for attention. There is really no need to be >>> hostile. I'm NOT trying to sell you anything. I merely didn't have >>> the time to distill the problem to be completely "context free". >>> >>> But I was was able to distill the problem to a relatively small >>> snippet of code without any external dependencies or macros. I filed >>> this information as an official bug report at GCC-ARM-Embedded, >>> please see: >>> >>> https://bugs.launchpad.net/gcc-arm-embedded/+bug/1722849 >>> >>> As I experimented with this code, the excessive type casting in the >>> condition for the if statement seems to be implicated (the bug goes >>> away if I remove some of this type casting). The type casting has >>> been added in the first place to satisfy static analysis with PC-Lint >>> for MISRA-C compliance. >>> >> >> No hostility was intended - I just want to make sure that this issue >> is considered properly, and followed up properly.  I have seen too >> many people drop into a newsgroup like this and make claims about >> compiler bugs, then disappear (perhaps in embarrassment) when it is >> their own code that is found faulty.  I want to push you to follow the >> thread here and keep things updated >> >> Thank you for posting the test code (in the launchpad bug report).  I >> can't see anything wrong with the code you wrote so far.  I am a >> little short on time just now (it's dinner time here :-) ) but I will >> do some experiments with the code as soon as I get the chance, and get >> back to you. > > I have just added this comment to the gcc-arm-embedded bug report linked > above: > > > I agree that it is surprising that "asm volatile" statements can be > re-arranged with respect to each other, and with respect to other > volatile accesses.  This seems to be a particular problem with asm > statements that return outputs - "volatile" is used primarily to tell > the compiler that you can get different outputs at different times, even > if the inputs (if any) are the same.  For asm statements with no > outputs, the compiler appears to assume they have a special function and > should not be moved. > > As far as I know, there is no way in C or as a gcc extension to specify > ordering of statements or executable code - you can only specify > ordering of memory (via volatile accesses).  Even the traditional method > of declaring a function (like "foo" in the sample) externally is in > danger - with link-time optimisation, the compiler knows everything and > can re-arrange bits of "foo" with respect to the asm statements or > volatile accesses. > > A related problem is well documented for the AVR gcc port: > > http://www.nongnu.org/avr-libc/user-manual/optimization.html > > There is, however, a solution to all this.  (I have told the avr-libc > folks about it a number of times, but they have not added my solution to > their webpage.  I have given up trying to persuade them.) > > > I have three macros defined that I use in circumstances like these: > > #define forceDependency(val) \ >                 asm volatile("" :: "" (val) : ) > > #define forgetCompilerKnowledge(v) \ >                 asm ("" : "+g" (v)) > > #define forgetCompilerBlock(start, size) \ >     do { typedef struct { char x[size]; } XS; XS *p = (XS *) start; \ >       asm ("" : "+m" (*p)); \ >     } while (0); > > > > The first one tells the compiler "I am using "val" here, so you have to > evaluate it before this point".  The second one tells the compiler "I am > using "val" here, and I might change it, so you have to evaluate it > before this point, but forget anything you know about it".  The third > one is just another version that can handle data of any size. > > > Putting "forceDependency(status)" after the "mrs" instruction, but > before the "foo()" call, ensures that the compiler has evaluated > "status" before calling foo.  It makes most sense to put it before the > "cpsid i" instruction, but that does not appear to be critical.  The > neatest arrangement is to combine it with the cpsid: > >     uint32_t status; >     asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: ); >     asm volatile ("cpsid i" :: "" (status) :); > >     foo(); > >     asm volatile ("msr PRIMASK, %0" :: "r" (status) : ); > >
Or: uint32_t status; asm volatile ( "mrs %0, PRIMASK\n\t" "cpsid i" :: "=r" (status)); Combining the instructions into a single asm statement does it. -- -TV
On 12/10/17 18:36, Tauno Voipio wrote:
> On 12.10.17 17:04, David Brown wrote: >> On 11/10/17 18:51, David Brown wrote: >>> On 11/10/17 18:03, StateMachineCOM wrote: >>>> Thank you everyone for attention. There is really no need to be >>>> hostile. I'm NOT trying to sell you anything. I merely didn't have >>>> the time to distill the problem to be completely "context free". >>>> >>>> But I was was able to distill the problem to a relatively small >>>> snippet of code without any external dependencies or macros. I filed >>>> this information as an official bug report at GCC-ARM-Embedded, >>>> please see: >>>> >>>> https://bugs.launchpad.net/gcc-arm-embedded/+bug/1722849 >>>> >>>> As I experimented with this code, the excessive type casting in the >>>> condition for the if statement seems to be implicated (the bug goes >>>> away if I remove some of this type casting). The type casting has >>>> been added in the first place to satisfy static analysis with >>>> PC-Lint for MISRA-C compliance. >>>> >>> >>> No hostility was intended - I just want to make sure that this issue >>> is considered properly, and followed up properly.  I have seen too >>> many people drop into a newsgroup like this and make claims about >>> compiler bugs, then disappear (perhaps in embarrassment) when it is >>> their own code that is found faulty.  I want to push you to follow >>> the thread here and keep things updated >>> >>> Thank you for posting the test code (in the launchpad bug report).  I >>> can't see anything wrong with the code you wrote so far.  I am a >>> little short on time just now (it's dinner time here :-) ) but I will >>> do some experiments with the code as soon as I get the chance, and >>> get back to you. >> >> I have just added this comment to the gcc-arm-embedded bug report >> linked above: >> >> >> I agree that it is surprising that "asm volatile" statements can be >> re-arranged with respect to each other, and with respect to other >> volatile accesses.  This seems to be a particular problem with asm >> statements that return outputs - "volatile" is used primarily to tell >> the compiler that you can get different outputs at different times, >> even if the inputs (if any) are the same.  For asm statements with no >> outputs, the compiler appears to assume they have a special function >> and should not be moved. >> >> As far as I know, there is no way in C or as a gcc extension to >> specify ordering of statements or executable code - you can only >> specify ordering of memory (via volatile accesses).  Even the >> traditional method of declaring a function (like "foo" in the sample) >> externally is in danger - with link-time optimisation, the compiler >> knows everything and can re-arrange bits of "foo" with respect to the >> asm statements or volatile accesses. >> >> A related problem is well documented for the AVR gcc port: >> >> http://www.nongnu.org/avr-libc/user-manual/optimization.html >> >> There is, however, a solution to all this.  (I have told the avr-libc >> folks about it a number of times, but they have not added my solution >> to their webpage.  I have given up trying to persuade them.) >> >> >> I have three macros defined that I use in circumstances like these: >> >> #define forceDependency(val) \ >>                  asm volatile("" :: "" (val) : ) >> >> #define forgetCompilerKnowledge(v) \ >>                  asm ("" : "+g" (v)) >> >> #define forgetCompilerBlock(start, size) \ >>      do { typedef struct { char x[size]; } XS; XS *p = (XS *) start; \ >>        asm ("" : "+m" (*p)); \ >>      } while (0); >> >> >> >> The first one tells the compiler "I am using "val" here, so you have >> to evaluate it before this point".  The second one tells the compiler >> "I am using "val" here, and I might change it, so you have to evaluate >> it before this point, but forget anything you know about it".  The >> third one is just another version that can handle data of any size. >> >> >> Putting "forceDependency(status)" after the "mrs" instruction, but >> before the "foo()" call, ensures that the compiler has evaluated >> "status" before calling foo.  It makes most sense to put it before the >> "cpsid i" instruction, but that does not appear to be critical.  The >> neatest arrangement is to combine it with the cpsid: >> >>      uint32_t status; >>      asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: ); >>      asm volatile ("cpsid i" :: "" (status) :); >> >>      foo(); >> >>      asm volatile ("msr PRIMASK, %0" :: "r" (status) : ); >> >> > > Or: > >        uint32_t status; >        asm volatile ( >          "mrs %0, PRIMASK\n\t" >          "cpsid i" >          :: "=r" (status)); > > Combining the instructions into a single asm statement does it. >
No, it does not - not in this case. Putting the two instructions in a single asm means that these two instructions will follow each other. However, in this example it does not hinder the compiler from moving the "foo()" call above this asm statement, which is the wrong order. The ordering enforcement of gcc's asm instructions is not very clear in the documentation. But it appears (from my testing and experience) that "asm volatile" statements with no output are not moved in relation to other statements (at least, volatile memory accesses, unknown external function calls, and other no-output asm volatile statements). "asm volatile" statements with an output are often moved. So the first important point here is to have an asm volatile statement with no output (such as the "cpsid i" one, or a "forceDependency" macro) before "foo()". The second important point is to force the usage of the output of the "mrs" statement, by using its output as an artificial input to another asm statement. Does that make it clearer? (This stuff is hard to understand, and hard to explain - especially when I can't claim to have a full understanding myself.) You can try it out using the online compiler at <https://gotbolt.org> . If we are lucky, the following link will have the example code: <https://godbolt.org/g/jJX8i4>
On 10/10/17 16:47, StateMachineCOM wrote:
> The popular GNU-ARM toolset has had long-known issues for the > Cortex-M0/M0+/M1 (ARMv6-M architecture). Specifically, people have > reported very inefficient code generated, see "Cortex M0/M0+/M1/M23 > BAD Optimisation in GCC" https://embdev.net/topic/426508 . > > But while so far people reported only inefficient code, I would like > to make people aware of *incorrect* code generated by GNU-ARM for > Cortex-M0/M0+. > > The issue was detected with interrupt disabling and has been > documented in a bug report for the QP framework, see > https://sourceforge.net/p/qpc/bugs/184/ . The experiments performed > with the latest available GUN-ARM (GNU Tools for ARM Embedded > Processors 6-2017-q2-update, 6.3.1 20170620 release) clearly show > incorrect code generated at optimization level -O, while the same > code compiled at -O2 level seemed to be correct. > > Please be careful with GNU-ARM for ARMv6-M architecture and > preferably avoid using it for these CPUs as long as the issue remains > unresolved. > > Miro Samek state-machine.com >
Just to keep people updated - this has been confirmed as a gcc bug. "asm volatile" statements are not supposed to be moveable across other "asm volatile" statements. Details can be seen at: <https://bugs.launchpad.net/gcc-arm-embedded/+bug/1722849> <https://gcc.gnu.org/ml/gcc-help/2017-10/msg00061.html> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602> In addition, a memory clobber has been added to the CMSIS function "__get_PRIMASK" for gcc as a workaround.
On Thursday, October 12, 2017 at 10:04:55 AM UTC-4, David Brown wrote:
> On 11/10/17 18:51, David Brown wrote: > > On 11/10/17 18:03, StateMachineCOM wrote: > >> Thank you everyone for attention. There is really no need to be > >> hostile. I'm NOT trying to sell you anything. I merely didn't have the > >> time to distill the problem to be completely "context free". > >> > >> But I was was able to distill the problem to a relatively small > >> snippet of code without any external dependencies or macros. I filed > >> this information as an official bug report at GCC-ARM-Embedded, please > >> see: > >> > >> https://bugs.launchpad.net/gcc-arm-embedded/+bug/1722849 > >> > >> As I experimented with this code, the excessive type casting in the > >> condition for the if statement seems to be implicated (the bug goes > >> away if I remove some of this type casting). The type casting has been > >> added in the first place to satisfy static analysis with PC-Lint for > >> MISRA-C compliance. > >> > > > > No hostility was intended - I just want to make sure that this issue is > > considered properly, and followed up properly.&nbsp; I have seen too many > > people drop into a newsgroup like this and make claims about compiler > > bugs, then disappear (perhaps in embarrassment) when it is their own > > code that is found faulty.&nbsp; I want to push you to follow the thread here > > and keep things updated > > > > Thank you for posting the test code (in the launchpad bug report).&nbsp; I > > can't see anything wrong with the code you wrote so far.&nbsp; I am a little > > short on time just now (it's dinner time here :-) ) but I will do some > > experiments with the code as soon as I get the chance, and get back to you. > > I have just added this comment to the gcc-arm-embedded bug report linked > above: > > > I agree that it is surprising that "asm volatile" statements can be > re-arranged with respect to each other, and with respect to other > volatile accesses. This seems to be a particular problem with asm > statements that return outputs - "volatile" is used primarily to tell > the compiler that you can get different outputs at different times, even > if the inputs (if any) are the same. For asm statements with no > outputs, the compiler appears to assume they have a special function and > should not be moved. > > As far as I know, there is no way in C or as a gcc extension to specify > ordering of statements or executable code - you can only specify > ordering of memory (via volatile accesses). Even the traditional method > of declaring a function (like "foo" in the sample) externally is in > danger - with link-time optimisation, the compiler knows everything and > can re-arrange bits of "foo" with respect to the asm statements or > volatile accesses.
To prevent compiler reordering I use this inline asm macro. __asm__ __volatile("":::"memory"); I'm pretty sure that tells gcc that it should not reorder above and below that statement. So you might sprinkle those about?
> A related problem is well documented for the AVR gcc port: > > http://www.nongnu.org/avr-libc/user-manual/optimization.html > > There is, however, a solution to all this. (I have told the avr-libc > folks about it a number of times, but they have not added my solution to > their webpage. I have given up trying to persuade them.) > > > I have three macros defined that I use in circumstances like these: > > #define forceDependency(val) \ > asm volatile("" :: "" (val) : ) > > #define forgetCompilerKnowledge(v) \ > asm ("" : "+g" (v)) > > #define forgetCompilerBlock(start, size) \ > do { typedef struct { char x[size]; } XS; XS *p = (XS *) start; \ > asm ("" : "+m" (*p)); \ > } while (0); > > > > The first one tells the compiler "I am using "val" here, so you have to > evaluate it before this point". The second one tells the compiler "I am > using "val" here, and I might change it, so you have to evaluate it > before this point, but forget anything you know about it". The third > one is just another version that can handle data of any size. > > > Putting "forceDependency(status)" after the "mrs" instruction, but > before the "foo()" call, ensures that the compiler has evaluated > "status" before calling foo. It makes most sense to put it before the > "cpsid i" instruction, but that does not appear to be critical. The > neatest arrangement is to combine it with the cpsid: > > uint32_t status; > asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: ); > asm volatile ("cpsid i" :: "" (status) :); > > foo(); > > asm volatile ("msr PRIMASK, %0" :: "r" (status) : );

The 2024 Embedded Online Conference