gnu compiler optimizes out "asm" statements| page 6

Reply by Les Cargill ●June 1, 20152015-06-01

David Brown wrote:
> On 31/05/15 18:28, Les Cargill wrote:
>> David Brown wrote:
<snip>
>>
>
> If your C macros are similarly incorrect, then you are merely lucky that
> they have worked.  Feel free to post those macros, and I will try to
> comment on them too.
>

David - sorry.

I had apparently missed your post in the maelstrom. I was
unfamiliar with the "memory clobber" idiom.

Thanks for that, David.

For what it is worth, the reference to extern class routines and 
variables was sufficient last time I'd faced this. As you say,
I'd probably been lucky.

> Do you believe that my class, as posted earlier, has errors?  If so, I
> would like to know - while I am confident that it is correct, I would be
> glad to hear of any suspected problems or situations where it would not
> work.
>

Nope - it was very clear once I'd found it. I would not have posted
at all had I seen that.

> But assuming it /is/ correct, why would you prefer to use a solution
> that is more limited in its use cases, dependent on the compiler's
> optimisation flags, and big and slow in the cases when it might work?
>

Because it had worked for me ( when we were using inline asm to
do something horrible ) once before. Like you say, it was
probably just luck that it worked.

I did not find it particularly big and slow - the "iniline' did what 
you'd expect. We weren't exactly tight for cycles.

The "static" thing, though, was me not understanding the intention of
Tim's original code. I really thought it was just "on or off" and
didn't see any nesting needed.

>
>

-- 
Les Cargill

Reply by David Brown ●June 1, 20152015-06-01

On 01/06/15 05:28, Les Cargill wrote:
> David Brown wrote:
>> On 31/05/15 18:28, Les Cargill wrote:
>>> David Brown wrote:
> <snip>
>>>
>>
>> If your C macros are similarly incorrect, then you are merely lucky that
>> they have worked.  Feel free to post those macros, and I will try to
>> comment on them too.
>>
> 
> David - sorry.
> 
> I had apparently missed your post in the maelstrom. I was
> unfamiliar with the "memory clobber" idiom.
> 
> Thanks for that, David.

Sorry if I was a bit crass with you in my posts last night - it hadn't
occurred to me that you had simply failed to see my posts in the mass in
this thread.

> 
> For what it is worth, the reference to extern class routines and
> variables was sufficient last time I'd faced this. As you say,
> I'd probably been lucky.

Compilers usually generate code in the order of operations in the source
code, unless they have particular reason to re-order them.  This means
that you can very often get away with code that doesn't enforce the
order specifically.  And if you don't have LTO or other sorts of
whole-program optimisation enabled, then external function calls form a
barrier to a fair amount of re-ordering.  Thus the external function
calls here /usually/ work.

However, compilers are getting smarter and smarter - and they are
optimising over a wider range with more cross-function and cross-module
optimisation.  Techniques that worked well enough before, are now
failing.  Some people think this is a bad thing - that compilers are
getting "smart-ass" rather than "smart".  Personally, I am a fan of
compiler optimisation, but it means that we need to be more accurate in
telling the compiler /exactly/ what we want.  But it has the advantage
of letting us write neater and clearer source code, knowing that the
compiler will produce efficient object code in the end.

As well as varying between compilers (or versions of compilers, or
selection of optimisation flags), these things also vary between
targets.  When you are using an ARM Cortex-M3/M4 (as in this thread),
the cpu is pretty straightforward and does one thing at a time.  Thus
the compiler can order generated instructions in the same way as the
source code, without any penalty - this makes debugging and
single-stepping easier, as well as being easier to understand the
generated code.  But if you are using a PowerPC, or a MIPS processor,
especially the higher-end devices with multiple issue super-scaler cpus,
the compiler does a great deal of re-ordering and re-scheduling to get
maximum overlap from instructions.  With such targets, you need to be
/very/ careful about your order enforcements (and may also need
additional assembly instructions to flush the cpu's write buffers, or
synchronise pipelines in some way).

> 
>> Do you believe that my class, as posted earlier, has errors?  If so, I
>> would like to know - while I am confident that it is correct, I would be
>> glad to hear of any suspected problems or situations where it would not
>> work.
>>
> 
> Nope - it was very clear once I'd found it. I would not have posted
> at all had I seen that.
> 

It's nice to have the confirmation.

>> But assuming it /is/ correct, why would you prefer to use a solution
>> that is more limited in its use cases, dependent on the compiler's
>> optimisation flags, and big and slow in the cases when it might work?
>>
> 
> 
> Because it had worked for me ( when we were using inline asm to
> do something horrible ) once before. Like you say, it was
> probably just luck that it worked.
> 
> I did not find it particularly big and slow - the "iniline' did what
> you'd expect. We weren't exactly tight for cycles.
> 
> The "static" thing, though, was me not understanding the intention of
> Tim's original code. I really thought it was just "on or off" and
> didn't see any nesting needed.
> 

Tim didn't mention nesting anywhere (at least, not that I noticed) - I
took it as an implicit requirement (in particular, it is one of the
reasons you need to restore the previous interrupt state rather than
assuming interrupts are always on at the constructor).  Once you have
thought of the idea of being able to nest the critical sections, it is
obviously a useful feature.  If nothing else, it helps make the class
simpler to use.

Reply by Les Cargill ●June 1, 20152015-06-01

David Brown wrote:
> On 01/06/15 05:28, Les Cargill wrote:
>> David Brown wrote:
>>> On 31/05/15 18:28, Les Cargill wrote:
>>>> David Brown wrote:
>> <snip>
>>>>
>>>
>>> If your C macros are similarly incorrect, then you are merely lucky that
>>> they have worked.  Feel free to post those macros, and I will try to
>>> comment on them too.
>>>
>>
>> David - sorry.
>>
>> I had apparently missed your post in the maelstrom. I was
>> unfamiliar with the "memory clobber" idiom.
>>
>> Thanks for that, David.
>
> Sorry if I was a bit crass with you in my posts last night - it hadn't
> occurred to me that you had simply failed to see my posts in the mass in
> this thread.
>

No apology needed - I was ... quite confused :) You were quite
professional about it.

I saw your *postS*, but not the critical *post*.

As soon as I realized that there was a construct specifically for
warning the toolchain that "here we made memory changes", I realized
that that was probably the only way to reliably enforce this - this
issue had been deliberated on, and the solution provided wasn't
just there as decoration.

>>
>> For what it is worth, the reference to extern class routines and
>> variables was sufficient last time I'd faced this. As you say,
>> I'd probably been lucky.
>
> Compilers usually generate code in the order of operations in the source
> code, unless they have particular reason to re-order them.  This means
> that you can very often get away with code that doesn't enforce the
> order specifically.  And if you don't have LTO or other sorts of
> whole-program optimisation enabled, then external function calls form a
> barrier to a fair amount of re-ordering.  Thus the external function
> calls here /usually/ work.
>

Yes. Emphasis "usually".

> However, compilers are getting smarter and smarter - and they are
> optimising over a wider range with more cross-function and cross-module
> optimisation.  Techniques that worked well enough before, are now
> failing.  Some people think this is a bad thing - that compilers are
> getting "smart-ass" rather than "smart".

:)

> Personally, I am a fan of
> compiler optimisation, but it means that we need to be more accurate in
> telling the compiler /exactly/ what we want.  But it has the advantage
> of letting us write neater and clearer source code, knowing that the
> compiler will produce efficient object code in the end.
>

Indeed.

> As well as varying between compilers (or versions of compilers, or
> selection of optimisation flags), these things also vary between
> targets.  When you are using an ARM Cortex-M3/M4 (as in this thread),
> the cpu is pretty straightforward and does one thing at a time.  Thus
> the compiler can order generated instructions in the same way as the
> source code, without any penalty - this makes debugging and
> single-stepping easier, as well as being easier to understand the
> generated code.  But if you are using a PowerPC, or a MIPS processor,
> especially the higher-end devices with multiple issue super-scaler cpus,
> the compiler does a great deal of re-ordering and re-scheduling to get
> maximum overlap from instructions.  With such targets, you need to be
> /very/ careful about your order enforcements (and may also need
> additional assembly instructions to flush the cpu's write buffers, or
> synchronise pipelines in some way).
>
>>
>>> Do you believe that my class, as posted earlier, has errors?  If so, I
>>> would like to know - while I am confident that it is correct, I would be
>>> glad to hear of any suspected problems or situations where it would not
>>> work.
>>>
>>
>> Nope - it was very clear once I'd found it. I would not have posted
>> at all had I seen that.
>>
>
> It's nice to have the confirmation.
>

Always good to find the "bug". :)

>>> But assuming it /is/ correct, why would you prefer to use a solution
>>> that is more limited in its use cases, dependent on the compiler's
>>> optimisation flags, and big and slow in the cases when it might work?
>>>
>>
>>
>> Because it had worked for me ( when we were using inline asm to
>> do something horrible ) once before. Like you say, it was
>> probably just luck that it worked.
>>
>> I did not find it particularly big and slow - the "iniline' did what
>> you'd expect. We weren't exactly tight for cycles.
>>
>> The "static" thing, though, was me not understanding the intention of
>> Tim's original code. I really thought it was just "on or off" and
>> didn't see any nesting needed.
>>
>
> Tim didn't mention nesting anywhere (at least, not that I noticed) - I
> took it as an implicit requirement (in particular, it is one of the
> reasons you need to restore the previous interrupt state rather than
> assuming interrupts are always on at the constructor).

Sure.

> Once you have
> thought of the idea of being able to nest the critical sections, it is
> obviously a useful feature.  If nothing else, it helps make the class
> simpler to use.
>

Absolutely.


-- 
Les Cargill

Previous 4 56Next

gnu compiler optimizes out "asm" statements

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group