EmbeddedRelated.com
Forums
Memfault Beyond the Launch

gnu compiler optimizes out "asm" statements

Started by Tim Wescott May 28, 2015
David Brown wrote:
> On 31/05/15 18:28, Les Cargill wrote: >> David Brown wrote:
<snip>
>> > > If your C macros are similarly incorrect, then you are merely lucky that > they have worked. Feel free to post those macros, and I will try to > comment on them too. >
David - sorry. I had apparently missed your post in the maelstrom. I was unfamiliar with the "memory clobber" idiom. Thanks for that, David. For what it is worth, the reference to extern class routines and variables was sufficient last time I'd faced this. As you say, I'd probably been lucky.
> Do you believe that my class, as posted earlier, has errors? If so, I > would like to know - while I am confident that it is correct, I would be > glad to hear of any suspected problems or situations where it would not > work. >
Nope - it was very clear once I'd found it. I would not have posted at all had I seen that.
> But assuming it /is/ correct, why would you prefer to use a solution > that is more limited in its use cases, dependent on the compiler's > optimisation flags, and big and slow in the cases when it might work? >
Because it had worked for me ( when we were using inline asm to do something horrible ) once before. Like you say, it was probably just luck that it worked. I did not find it particularly big and slow - the "iniline' did what you'd expect. We weren't exactly tight for cycles. The "static" thing, though, was me not understanding the intention of Tim's original code. I really thought it was just "on or off" and didn't see any nesting needed.
> >
-- Les Cargill
On 01/06/15 05:28, Les Cargill wrote:
> David Brown wrote: >> On 31/05/15 18:28, Les Cargill wrote: >>> David Brown wrote: > <snip> >>> >> >> If your C macros are similarly incorrect, then you are merely lucky that >> they have worked. Feel free to post those macros, and I will try to >> comment on them too. >> > > David - sorry. > > I had apparently missed your post in the maelstrom. I was > unfamiliar with the "memory clobber" idiom. > > Thanks for that, David.
Sorry if I was a bit crass with you in my posts last night - it hadn't occurred to me that you had simply failed to see my posts in the mass in this thread.
> > For what it is worth, the reference to extern class routines and > variables was sufficient last time I'd faced this. As you say, > I'd probably been lucky.
Compilers usually generate code in the order of operations in the source code, unless they have particular reason to re-order them. This means that you can very often get away with code that doesn't enforce the order specifically. And if you don't have LTO or other sorts of whole-program optimisation enabled, then external function calls form a barrier to a fair amount of re-ordering. Thus the external function calls here /usually/ work. However, compilers are getting smarter and smarter - and they are optimising over a wider range with more cross-function and cross-module optimisation. Techniques that worked well enough before, are now failing. Some people think this is a bad thing - that compilers are getting "smart-ass" rather than "smart". Personally, I am a fan of compiler optimisation, but it means that we need to be more accurate in telling the compiler /exactly/ what we want. But it has the advantage of letting us write neater and clearer source code, knowing that the compiler will produce efficient object code in the end. As well as varying between compilers (or versions of compilers, or selection of optimisation flags), these things also vary between targets. When you are using an ARM Cortex-M3/M4 (as in this thread), the cpu is pretty straightforward and does one thing at a time. Thus the compiler can order generated instructions in the same way as the source code, without any penalty - this makes debugging and single-stepping easier, as well as being easier to understand the generated code. But if you are using a PowerPC, or a MIPS processor, especially the higher-end devices with multiple issue super-scaler cpus, the compiler does a great deal of re-ordering and re-scheduling to get maximum overlap from instructions. With such targets, you need to be /very/ careful about your order enforcements (and may also need additional assembly instructions to flush the cpu's write buffers, or synchronise pipelines in some way).
> >> Do you believe that my class, as posted earlier, has errors? If so, I >> would like to know - while I am confident that it is correct, I would be >> glad to hear of any suspected problems or situations where it would not >> work. >> > > Nope - it was very clear once I'd found it. I would not have posted > at all had I seen that. >
It's nice to have the confirmation.
>> But assuming it /is/ correct, why would you prefer to use a solution >> that is more limited in its use cases, dependent on the compiler's >> optimisation flags, and big and slow in the cases when it might work? >> > > > Because it had worked for me ( when we were using inline asm to > do something horrible ) once before. Like you say, it was > probably just luck that it worked. > > I did not find it particularly big and slow - the "iniline' did what > you'd expect. We weren't exactly tight for cycles. > > The "static" thing, though, was me not understanding the intention of > Tim's original code. I really thought it was just "on or off" and > didn't see any nesting needed. >
Tim didn't mention nesting anywhere (at least, not that I noticed) - I took it as an implicit requirement (in particular, it is one of the reasons you need to restore the previous interrupt state rather than assuming interrupts are always on at the constructor). Once you have thought of the idea of being able to nest the critical sections, it is obviously a useful feature. If nothing else, it helps make the class simpler to use.
David Brown wrote:
> On 01/06/15 05:28, Les Cargill wrote: >> David Brown wrote: >>> On 31/05/15 18:28, Les Cargill wrote: >>>> David Brown wrote: >> <snip> >>>> >>> >>> If your C macros are similarly incorrect, then you are merely lucky that >>> they have worked. Feel free to post those macros, and I will try to >>> comment on them too. >>> >> >> David - sorry. >> >> I had apparently missed your post in the maelstrom. I was >> unfamiliar with the "memory clobber" idiom. >> >> Thanks for that, David. > > Sorry if I was a bit crass with you in my posts last night - it hadn't > occurred to me that you had simply failed to see my posts in the mass in > this thread. >
No apology needed - I was ... quite confused :) You were quite professional about it. I saw your *postS*, but not the critical *post*. As soon as I realized that there was a construct specifically for warning the toolchain that "here we made memory changes", I realized that that was probably the only way to reliably enforce this - this issue had been deliberated on, and the solution provided wasn't just there as decoration.
>> >> For what it is worth, the reference to extern class routines and >> variables was sufficient last time I'd faced this. As you say, >> I'd probably been lucky. > > Compilers usually generate code in the order of operations in the source > code, unless they have particular reason to re-order them. This means > that you can very often get away with code that doesn't enforce the > order specifically. And if you don't have LTO or other sorts of > whole-program optimisation enabled, then external function calls form a > barrier to a fair amount of re-ordering. Thus the external function > calls here /usually/ work. >
Yes. Emphasis "usually".
> However, compilers are getting smarter and smarter - and they are > optimising over a wider range with more cross-function and cross-module > optimisation. Techniques that worked well enough before, are now > failing. Some people think this is a bad thing - that compilers are > getting "smart-ass" rather than "smart".
:)
> Personally, I am a fan of > compiler optimisation, but it means that we need to be more accurate in > telling the compiler /exactly/ what we want. But it has the advantage > of letting us write neater and clearer source code, knowing that the > compiler will produce efficient object code in the end. >
Indeed.
> As well as varying between compilers (or versions of compilers, or > selection of optimisation flags), these things also vary between > targets. When you are using an ARM Cortex-M3/M4 (as in this thread), > the cpu is pretty straightforward and does one thing at a time. Thus > the compiler can order generated instructions in the same way as the > source code, without any penalty - this makes debugging and > single-stepping easier, as well as being easier to understand the > generated code. But if you are using a PowerPC, or a MIPS processor, > especially the higher-end devices with multiple issue super-scaler cpus, > the compiler does a great deal of re-ordering and re-scheduling to get > maximum overlap from instructions. With such targets, you need to be > /very/ careful about your order enforcements (and may also need > additional assembly instructions to flush the cpu's write buffers, or > synchronise pipelines in some way). > >> >>> Do you believe that my class, as posted earlier, has errors? If so, I >>> would like to know - while I am confident that it is correct, I would be >>> glad to hear of any suspected problems or situations where it would not >>> work. >>> >> >> Nope - it was very clear once I'd found it. I would not have posted >> at all had I seen that. >> > > It's nice to have the confirmation. >
Always good to find the "bug". :)
>>> But assuming it /is/ correct, why would you prefer to use a solution >>> that is more limited in its use cases, dependent on the compiler's >>> optimisation flags, and big and slow in the cases when it might work? >>> >> >> >> Because it had worked for me ( when we were using inline asm to >> do something horrible ) once before. Like you say, it was >> probably just luck that it worked. >> >> I did not find it particularly big and slow - the "iniline' did what >> you'd expect. We weren't exactly tight for cycles. >> >> The "static" thing, though, was me not understanding the intention of >> Tim's original code. I really thought it was just "on or off" and >> didn't see any nesting needed. >> > > Tim didn't mention nesting anywhere (at least, not that I noticed) - I > took it as an implicit requirement (in particular, it is one of the > reasons you need to restore the previous interrupt state rather than > assuming interrupts are always on at the constructor).
Sure.
> Once you have > thought of the idea of being able to nest the critical sections, it is > obviously a useful feature. If nothing else, it helps make the class > simpler to use. >
Absolutely. -- Les Cargill

Memfault Beyond the Launch