Beware of GNU-ARM compiler for Cortex-M0/M0+/M1

Started by StateMachineCOM October 10, 2017
The popular GNU-ARM toolset has had long-known issues for the Cortex-M0/M0+/M1
(ARMv6-M architecture). Specifically, people have reported very inefficient code
generated, see "Cortex M0/M0+/M1/M23 BAD Optimisation in GCC"
https://embdev.net/topic/426508 .

But while so far people reported only inefficient code, I would like to make people
aware of *incorrect* code generated by GNU-ARM for Cortex-M0/M0+.

The issue was detected with interrupt disabling and has been documented in a bug
report for the QP framework, see https://sourceforge.net/p/qpc/bugs/184/ . The
experiments performed with the latest available GUN-ARM (GNU Tools for ARM Embedded
Processors 6-2017-q2-update, 6.3.1 20170620 release) clearly show incorrect code
generated at optimization level -O, while the same code compiled at -O2 level seemed
to be correct.

Please be careful with GNU-ARM for ARMv6-M architecture and preferably avoid using
it for these CPUs as long as the issue remains unresolved.

Miro Samek
state-machine.com
Thanks Miro for bringing this to our attention...
On 10.10.17 17:47, StateMachineCOM wrote:
> The popular GNU-ARM toolset has had long-known issues for the Cortex-M0/M0+/M1
(ARMv6-M architecture). Specifically, people have reported very inefficient code generated, see "Cortex M0/M0+/M1/M23 BAD Optimisation in GCC" https://embdev.net/topic/426508 .
> > But while so far people reported only inefficient code, I would like to make
people aware of *incorrect* code generated by GNU-ARM for Cortex-M0/M0+.
> > The issue was detected with interrupt disabling and has been documented in a bug
report for the QP framework, see https://sourceforge.net/p/qpc/bugs/184/ . The experiments performed with the latest available GUN-ARM (GNU Tools for ARM Embedded Processors 6-2017-q2-update, 6.3.1 20170620 release) clearly show incorrect code generated at optimization level -O, while the same code compiled at -O2 level seemed to be correct.
> > Please be careful with GNU-ARM for ARMv6-M architecture and preferably avoid using
it for these CPUs as long as the issue remains unresolved.
> > Miro Samek > state-machine.com
Do you have an example source snippet? QP feels pretty heavy for Cortex-M0. -- -TV
On 10/10/17 16:47, StateMachineCOM wrote:
> The popular GNU-ARM toolset has had long-known issues for the Cortex-M0/M0+/M1
(ARMv6-M architecture). Specifically, people have reported very inefficient code generated, see "Cortex M0/M0+/M1/M23 BAD Optimisation in GCC" https://embdev.net/topic/426508 .
> > But while so far people reported only inefficient code, I would like to make
people aware of *incorrect* code generated by GNU-ARM for Cortex-M0/M0+.
> > The issue was detected with interrupt disabling and has been documented in a bug
report for the QP framework, see https://sourceforge.net/p/qpc/bugs/184/ . The experiments performed with the latest available GUN-ARM (GNU Tools for ARM Embedded Processors 6-2017-q2-update, 6.3.1 20170620 release) clearly show incorrect code generated at optimization level -O, while the same code compiled at -O2 level seemed to be correct.
> > Please be careful with GNU-ARM for ARMv6-M architecture and preferably avoid using
it for these CPUs as long as the issue remains unresolved.
> > Miro Samek > state-machine.com >
It is impossible for anyone to determine if this is a bug in the compiler or a bug in the QS macros without giving us the source of the test. Can you give us the source of these macros (or if they are proprietary, a roughly equivalent source that shows the same problems)? I'd like to see it, and try it on a simple case such as the example in the linked page. void crit_section_test(void) { uint32_t i; for(i = 0; i < 10; i++) { QS_BEGIN(123, 0); QS_U32(8, 0); QS_END(); } } My guess here is that there is a misunderstanding or error in the embedded assembly in these macros. gcc inline assembly can be a bit fiddly to get exactly right.
On 10.10.17 21:55, David Brown wrote:
> On 10/10/17 16:47, StateMachineCOM wrote: >> The popular GNU-ARM toolset has had long-known issues for the >> Cortex-M0/M0+/M1 (ARMv6-M architecture). Specifically, people have >> reported very inefficient code generated, see "Cortex M0/M0+/M1/M23 >> BAD Optimisation in GCC" https://embdev.net/topic/426508 . >> >> But while so far people reported only inefficient code, I would like >> to make people aware of *incorrect* code generated by GNU-ARM for >> Cortex-M0/M0+. >> >> The issue was detected with interrupt disabling and has been >> documented in a bug report for the QP framework, see >> https://sourceforge.net/p/qpc/bugs/184/ . The experiments performed >> with the latest available GUN-ARM (GNU Tools for ARM Embedded >> Processors 6-2017-q2-update, 6.3.1 20170620 release) clearly show >> incorrect code generated at optimization level -O, while the same code >> compiled at -O2 level seemed to be correct. >> >> Please be careful with GNU-ARM for ARMv6-M architecture and preferably >> avoid using it for these CPUs as long as the issue remains unresolved. >> >> Miro Samek >> state-machine.com >> > > It is impossible for anyone to determine if this is a bug in the > compiler or a bug in the QS macros without giving us the source of the > test.&nbsp; Can you give us the source of these macros (or if they are > proprietary, a roughly equivalent source that shows the same problems)? > I'd like to see it, and try it on a simple case such as the example in > the linked page. > > void crit_section_test(void) { > &nbsp;&nbsp;&nbsp; uint32_t i; > &nbsp;&nbsp;&nbsp; for(i = 0; i < 10; i++) { > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_BEGIN(123, 0); > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_U32(8, 0); > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_END(); > &nbsp;&nbsp;&nbsp; } > } > > My guess here is that there is a misunderstanding or error in the > embedded assembly in these macros.&nbsp; gcc inline assembly can be a bit > fiddly to get exactly right.
I just wonder if QP attempts use the exclusive access instruction pairs (LDREX / STREX), which do not exist in M0 and M1. -- -TV
On 10/10/17 21:35, Tauno Voipio wrote:
> On 10.10.17 21:55, David Brown wrote: >> On 10/10/17 16:47, StateMachineCOM wrote: >>> The popular GNU-ARM toolset has had long-known issues for the >>> Cortex-M0/M0+/M1 (ARMv6-M architecture). Specifically, people have >>> reported very inefficient code generated, see "Cortex M0/M0+/M1/M23 >>> BAD Optimisation in GCC" https://embdev.net/topic/426508 . >>> >>> But while so far people reported only inefficient code, I would like >>> to make people aware of *incorrect* code generated by GNU-ARM for >>> Cortex-M0/M0+. >>> >>> The issue was detected with interrupt disabling and has been >>> documented in a bug report for the QP framework, see >>> https://sourceforge.net/p/qpc/bugs/184/ . The experiments performed >>> with the latest available GUN-ARM (GNU Tools for ARM Embedded >>> Processors 6-2017-q2-update, 6.3.1 20170620 release) clearly show >>> incorrect code generated at optimization level -O, while the same >>> code compiled at -O2 level seemed to be correct. >>> >>> Please be careful with GNU-ARM for ARMv6-M architecture and >>> preferably avoid using it for these CPUs as long as the issue remains >>> unresolved. >>> >>> Miro Samek >>> state-machine.com >>> >> >> It is impossible for anyone to determine if this is a bug in the >> compiler or a bug in the QS macros without giving us the source of the >> test.&nbsp; Can you give us the source of these macros (or if they are >> proprietary, a roughly equivalent source that shows the same >> problems)? I'd like to see it, and try it on a simple case such as the >> example in the linked page. >> >> void crit_section_test(void) { >> &nbsp;&nbsp;&nbsp;&nbsp; uint32_t i; >> &nbsp;&nbsp;&nbsp;&nbsp; for(i = 0; i < 10; i++) { >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_BEGIN(123, 0); >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
QS_U32(8, 0);
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_END(); >> &nbsp;&nbsp;&nbsp;&nbsp; } >> } >> >> My guess here is that there is a misunderstanding or error in the >> embedded assembly in these macros.&nbsp; gcc inline assembly can be a bit >> fiddly to get exactly right. > > > I just wonder if QP attempts use the exclusive access instruction > pairs (LDREX / STREX), which do not exist in M0 and M1. >
From the link he gave, there is a screendump of the generated assembly - there is no LDREX or STREX there. My guesses for the problem are missing "volatile" in the asm statements, multiple independent asm statements where there should be a single one, or incorrect dependency information in the asm statements or other code. gcc does a lot of optimisation and re-arrangement of code, including with inline assembly. It is easy to get it wrong when you depend on the order of the code in a way that the compiler does not know about.
Il 10/10/2017 20:55, David Brown ha scritto:
> On 10/10/17 16:47, StateMachineCOM wrote: >> The popular GNU-ARM toolset has had long-known issues for the >> Cortex-M0/M0+/M1 (ARMv6-M architecture). Specifically, people have >> reported very inefficient code generated, see "Cortex M0/M0+/M1/M23 >> BAD Optimisation in GCC" https://embdev.net/topic/426508 . >> >> But while so far people reported only inefficient code, I would like >> to make people aware of *incorrect* code generated by GNU-ARM for >> Cortex-M0/M0+. >> >> The issue was detected with interrupt disabling and has been >> documented in a bug report for the QP framework, see >> https://sourceforge.net/p/qpc/bugs/184/ . The experiments performed >> with the latest available GUN-ARM (GNU Tools for ARM Embedded >> Processors 6-2017-q2-update, 6.3.1 20170620 release) clearly show >> incorrect code generated at optimization level -O, while the same code >> compiled at -O2 level seemed to be correct. >> >> Please be careful with GNU-ARM for ARMv6-M architecture and preferably >> avoid using it for these CPUs as long as the issue remains unresolved. >> >> Miro Samek >> state-machine.com >> > > It is impossible for anyone to determine if this is a bug in the > compiler or a bug in the QS macros without giving us the source of the > test.&nbsp; Can you give us the source of these macros (or if they are > proprietary, a roughly equivalent source that shows the same problems)?
I think QP/C is open-source project (even if it isn't free-to-use for commercial business). The source code is here: https://github.com/QuantumLeaps/qpc
> I'd like to see it, and try it on a simple case such as the example in > the linked page. > > void crit_section_test(void) { > &nbsp;&nbsp;&nbsp; uint32_t i; > &nbsp;&nbsp;&nbsp; for(i = 0; i < 10; i++) { > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_BEGIN(123, 0); > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_U32(8, 0); > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_END(); > &nbsp;&nbsp;&nbsp; } > } > > My guess here is that there is a misunderstanding or error in the > embedded assembly in these macros.&nbsp; gcc inline assembly can be a bit > fiddly to get exactly right. >
QS_BEGIN and QS_END are defined in include/qs.h, but depends on many other macros.
On 11/10/17 16:01, pozz wrote:
> Il 10/10/2017 20:55, David Brown ha scritto: >> On 10/10/17 16:47, StateMachineCOM wrote: >>> The popular GNU-ARM toolset has had long-known issues for the >>> Cortex-M0/M0+/M1 (ARMv6-M architecture). Specifically, people have >>> reported very inefficient code generated, see "Cortex M0/M0+/M1/M23 >>> BAD Optimisation in GCC" https://embdev.net/topic/426508 . >>> >>> But while so far people reported only inefficient code, I would like >>> to make people aware of *incorrect* code generated by GNU-ARM for >>> Cortex-M0/M0+. >>> >>> The issue was detected with interrupt disabling and has been >>> documented in a bug report for the QP framework, see >>> https://sourceforge.net/p/qpc/bugs/184/ . The experiments performed >>> with the latest available GUN-ARM (GNU Tools for ARM Embedded >>> Processors 6-2017-q2-update, 6.3.1 20170620 release) clearly show >>> incorrect code generated at optimization level -O, while the same >>> code compiled at -O2 level seemed to be correct. >>> >>> Please be careful with GNU-ARM for ARMv6-M architecture and >>> preferably avoid using it for these CPUs as long as the issue remains >>> unresolved. >>> >>> Miro Samek >>> state-machine.com >>> >> >> It is impossible for anyone to determine if this is a bug in the >> compiler or a bug in the QS macros without giving us the source of the >> test.&nbsp; Can you give us the source of these macros (or if they are >> proprietary, a roughly equivalent source that shows the same problems)? > > I think QP/C is open-source project (even if it isn't free-to-use for > commercial business). > > The source code is here: > https://github.com/QuantumLeaps/qpc > > >> I'd like to see it, and try it on a simple case such as the example in >> the linked page. >> >> void crit_section_test(void) { >> &nbsp;&nbsp;&nbsp;&nbsp; uint32_t i; >> &nbsp;&nbsp;&nbsp;&nbsp; for(i = 0; i < 10; i++) { >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_BEGIN(123, 0); >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
QS_U32(8, 0);
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QS_END(); >> &nbsp;&nbsp;&nbsp;&nbsp; } >> } >> >> My guess here is that there is a misunderstanding or error in the >> embedded assembly in these macros.&nbsp; gcc inline assembly can be a bit >> fiddly to get exactly right. >> > > QS_BEGIN and QS_END are defined in include/qs.h, but depends on many > other macros.
Yes, I saw the source was there - but I have no interest in the project, and no interest in digging through all the source of that project to try to find the problem. The OP is one of the people behind that project, as far as I can see - he should be able to provide a small self-contained equivalent definition for the macros so that we can get to the bottom of his problem. My take on this at the moment is that it is most likely to be a flaw in the QP code, not the compiler. I am happy to help, whether it turns out to be a compiler problem or a QP problem. But the OP has to do some work here, not just give a hit-and-run FUD about the compiler that is far and away the dominant tool for these microcontrollers. "Avoid using gcc for the M0/M0+" is advice to avoid those microcontrollers entirely.
Thank you everyone for attention. There is really no need to be hostile. I'm NOT
trying to sell you anything. I merely didn't have the time to distill the problem to
be completely "context free".

But I was was able to distill the problem to a relatively small snippet of code
without any external dependencies or macros. I filed this information as an official
bug report at GCC-ARM-Embedded, please see:

https://bugs.launchpad.net/gcc-arm-embedded/+bug/1722849

As I experimented with this code, the excessive type casting in the condition for
the if statement seems to be implicated (the bug goes away if I remove some of this
type casting). The type casting has been added in the first place to satisfy static
analysis with PC-Lint for MISRA-C compliance.

--MMS
On 11/10/17 18:03, StateMachineCOM wrote:
> Thank you everyone for attention. There is really no need to be hostile. I'm NOT
trying to sell you anything. I merely didn't have the time to distill the problem to be completely "context free".
> > But I was was able to distill the problem to a relatively small snippet of code
without any external dependencies or macros. I filed this information as an official bug report at GCC-ARM-Embedded, please see:
> > https://bugs.launchpad.net/gcc-arm-embedded/+bug/1722849 > > As I experimented with this code, the excessive type casting in the condition for
the if statement seems to be implicated (the bug goes away if I remove some of this type casting). The type casting has been added in the first place to satisfy static analysis with PC-Lint for MISRA-C compliance.
>
No hostility was intended - I just want to make sure that this issue is considered properly, and followed up properly. I have seen too many people drop into a newsgroup like this and make claims about compiler bugs, then disappear (perhaps in embarrassment) when it is their own code that is found faulty. I want to push you to follow the thread here and keep things updated Thank you for posting the test code (in the launchpad bug report). I can't see anything wrong with the code you wrote so far. I am a little short on time just now (it's dinner time here :-) ) but I will do some experiments with the code as soon as I get the chance, and get back to you.