Reply by David Brown April 28, 20122012-04-28
On 28/04/12 13:24, Paul Curtis wrote:
> David,
>
> > gcc also has a mechanism for telling the compiler about the most likely
> > result of a conditional branch (although all compilers have
> heuristics for
> > common cases, such as loops, that are generally quite good). gcc has a
> > function "__builtin_expect(exp, c)" which returns "exp" without
> generating
> > any code - but tells the compiler that "exp" will probably equal "c".
> > Knowledge of branch direction can be very useful - for bigger processors,
> > you sometimes have static branch prediction hints in the
> instructions, and
> > for many processors there are significant cycle differences in taking or
> > skipping a conditional branch, so you
> > (normally) want the common case to be the fastest.
>
> I had reason to try this out in ARM GCC only recently; it didn't have any
> effect on code generation for that particular application. I wanted to
> optimize the hot path through the code, and GCC decided that it would rather
> use 'cbz' to branch to the hot code.
>
> This type of code is not uncommon:
>
> #define CLEARLY_I_CANT_DO_SOMETHING_TRIVIAL_WITH(X) ((X) == 0)
>
> __attribute__((noreturn)) void throw_an_exception(void);
>
> void do_something_trivial_with(int x)
> {
> if (CLEARLY_I_CANT_DO_SOMETHING_TRIVIAL_WITH(x))
> throw_an_exception();
>
> // Carry on to do something trivial with x, no function calls.
> }
>
> In this case, I want an early-out to throw an exception, which is a function
> call. The code that has the meat is trivial. Hence, I would like the hot
> path to execute straight and, as it has no function call, there is no need
> to preserve the link register.
>
> Unfortunately, try as I might, I could not convince gcc to code this such
> that throw_an_exception() was off the hot patch (i.e. was branched to) using
> __builtin_expect, nor could I tease GCC into not saving the registers for
> the hot path--it decided it needed to save the registers in a common header
> for both paths.
>
> Oh well.
>

Many of the compilers I've used - gcc included - are good, but none of
them are perfect. There is always scope for more.

gcc 4.6 introduced partial inlining optimisations. In a case like your
one above, if the compiler is able to inline "do_something_trivial_with"
then it could inline the hotpath, but leave the cold path as a function
call. I haven't tested anything like that, but maybe it will give you
what you want - or at least come close.

mvh.,

David

Beginning Microcontrollers with the MSP430

Reply by Paul Curtis April 28, 20122012-04-28
David,

> gcc also has a mechanism for telling the compiler about the most likely
> result of a conditional branch (although all compilers have heuristics for
> common cases, such as loops, that are generally quite good). gcc has a
> function "__builtin_expect(exp, c)" which returns "exp" without generating
> any code - but tells the compiler that "exp" will probably equal "c".
> Knowledge of branch direction can be very useful - for bigger processors,
> you sometimes have static branch prediction hints in the instructions, and
> for many processors there are significant cycle differences in taking or
> skipping a conditional branch, so you
> (normally) want the common case to be the fastest.

I had reason to try this out in ARM GCC only recently; it didn't have any
effect on code generation for that particular application. I wanted to
optimize the hot path through the code, and GCC decided that it would rather
use 'cbz' to branch to the hot code.

This type of code is not uncommon:

#define CLEARLY_I_CANT_DO_SOMETHING_TRIVIAL_WITH(X) ((X) == 0)

__attribute__((noreturn)) void throw_an_exception(void);

void do_something_trivial_with(int x)
{
if (CLEARLY_I_CANT_DO_SOMETHING_TRIVIAL_WITH(x))
throw_an_exception();

// Carry on to do something trivial with x, no function calls.
}

In this case, I want an early-out to throw an exception, which is a function
call. The code that has the meat is trivial. Hence, I would like the hot
path to execute straight and, as it has no function call, there is no need
to preserve the link register.

Unfortunately, try as I might, I could not convince gcc to code this such
that throw_an_exception() was off the hot patch (i.e. was branched to) using
__builtin_expect, nor could I tease GCC into not saving the registers for
the hot path--it decided it needed to save the registers in a common header
for both paths.

Oh well.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

Reply by Hugo Brunert April 28, 20122012-04-28
You couldn't have said it better.!! J

From: m... [mailto:m...] On Behalf
Of David Brown
Sent: Saturday, April 28, 2012 7:00 AM
To: m...
Cc: Jon Kirwan
Subject: Re: [msp430] Re: Warning in IAR when performing bitwise not on
unsigned char (corrected)

On 27/04/12 23:39, Jon Kirwan wrote:
> On Fri, 27 Apr 2012 12:31:09 +0200, you wrote:
>
>> On 26/04/2012 22:52, Jon Kirwan wrote:
>>
>> An assembly programmer will normally know a lot more details of
>> the usage of data and resources - a C compiler will only know it if
>> it is clearly told. So an assembly programmer will know that "n" is
>> not used in the external function "bar", and can be hoisted to the
>> beginning of a loop - the C compiler can only do that if you tell
>> it (such as by making "n" static or local to the function).
>>
>> This extra knowledge gives the assembly programmer more freedom to
>> optimise - the compiler has to /prove/ such knowledge.
>
> This additional knowledge is, in many cases, easily provided by the
> programmer to the compiler. This was done in many cases, but the one
> I am most familiar with is Bulldog. In that compiler, the programmer
> could inform the compiler which condition of a branch was "more
> likely" and since you could later profile the code through execution
> and collect statistics any such early guess could be refined by
> empirical data and then folded back into the source code. Just as an
> example.
>

There are actually several mechanisms you can provide more information
to a C compiler - some common to the C standards, some specific to
particular compilers - and there are ways for C compilers to figure out
more knowledge if they are smart enough.

For example, the "n" in this case could be made "static" - then the
compiler would know it could not be affected (legally) by an external
function, assuming the address of "n" was never passed out of the
module. Some programmers use "static" on file-level functions and
variables whenever possible - this results in better modularisation and
static error checking as well as giving the compiler better
opportunities for optimisations. Others follow the K&R habit of writing
without thinking and using the default "everything global" linkage.

Other hints are compiler-specific. I hope Paul will forgive me for
using gcc mechanisms here - I am more familiar with them, and there are
quite a few of them. For example, if a function only uses its inputs,
neither reads nor writes global data, and has no side effects, then you
can mark it with the "const" attribute. The compiler then knows it can
optimise calls to that function far more aggressively.

gcc also has a mechanism for telling the compiler about the most likely
result of a conditional branch (although all compilers have heuristics
for common cases, such as loops, that are generally quite good). gcc
has a function "__builtin_expect(exp, c)" which returns "exp" without
generating any code - but tells the compiler that "exp" will probably
equal "c". Knowledge of branch direction can be very useful - for
bigger processors, you sometimes have static branch prediction hints in
the instructions, and for many processors there are significant cycle
differences in taking or skipping a conditional branch, so you
(normally) want the common case to be the fastest.

Then there are compiler mechanisms for figuring out these things. One
example is profiling - many toolchains have support for profiling
programs and feeding that profile data back to the compiler. Rather
than relying on the programmer to say which branch is most likely, you
get the real-world results. Of course, profiling is seldom
non-intrusive, and is particularly challenging in the embedded world.

Some compilers will track the ranges variables and data can have, and
use that for optimising. In particular, if they find that a function is
only ever called with a single constant value for one of its parameters,

then that parameter can be optimised away and replaced with the
constant. Sometimes compilers will use "assert" macros, or
compiler-specific builtins, to let the programmer tell it more details.

It used to be the case that compilers just worked on one function at a
time. These days it is standard practice that when optimising, the
compiler works on a whole file at a time. This means that the compiler
has a great deal of knowledge - in particular it knows all about
"static" functions and data (assuming they don't escape by passing
pointers around), and can use that for optimisation. The "big guns"
here is whole-program or link-time optimisation.

> Similarly, it is quite possible to specify to a compiler that two
> edges of code need to execute in the same time so that the entire
> conditional and associated code blocks must have fixed execution
> time. A compiler, which I'm sure you would admit because I think you
> yourself have brought this point up many times in defense of
> compilers over assembly coding, can do the "bookkeeping" better than
> a human can -- certainly, more consistently well if not as tightly
> perhaps.

Here you have a case that C cannot handle - it has no concept of timing.

It would be nice if it did, and that it could provide features like
this (as you say, a program can do the bookkeeping with less effort and
more accuracy than a human). However, it's worth remembering that on a
steadily higher proportion of systems, such exact timing is no longer
possible in software - even when coding in assembly. Things like
pipelining, scheduling, caches, flash accelerators, etc., mean that
execution times vary for all but the smallest microcontrollers. Such
precise software timing is still sometimes useful - but I think it is a
rare, niche usage, and it is usually better to use hardware (timers,
communication hardware, etc.).

>
> The short of it is that compilers haven't really been extended in
> ways where, as you say, the programmer could provide more of what
> they know about what they are doing than perhaps could be the case.
>

I both agree and disagree - you can give quite a lot of information to
the compiler, but there are certainly cases where it would be good to
give more. The most obvious and simple case is that it would be very
useful to be able to give the compiler a range for values (like a
subrange type in Pascal or Ada), which could improve static error
checking and code generation.

> The long of it gets back to why I posted on this at all. It was news
> to me that "many" compilers can't hoist something that simple. I
> believe Paul has as comprehensive a view as anyone I may know. So I
> learned something new and, frankly, I was surprised because of the
> simplicity of the optimization we are talking about. (I'm not talking
> about cases where the compiler cannot know for sure, because of side
> effects that are possible. I'm talking only about cases where the
> compiler could know full well and still doesn't do it.)

It was news to me too. I know there are many "cheap and cheerful"
compilers around for different targets, where the emphasis is on
ease-of-use and friendly support rather than small and fast code
generation, but I had expected such a "big name" compiler to handle such

a simple optimisation.

>
> Frankly, I had always _assumed_ (I think I checked one time a long
> time back -- perhaps it was with the Apollo DN 3000 system) that it
> did hoist such things.
>

Well, I've always said that if you want the best out of your tools, you
have to know them well and you have to pick them carefully. While I see
much less need for writing in assembly than you do, I don't think you
can get the best out of your C programming unless you are able to read
the generated assembly with enough understanding to see if it could be
improved.

> It's things like that, which would cause me to start MODIFYING my
> code writing.

Sometimes it is necessary to adapt your code style to the target or
compiler. This may be a bad thing - after all, it lessens portability.
But it may also be a good thing - as a general rule, better
modularisation and localisation (such as using "static" and local
variables where possible) gives better generated code, and better
quality source code. Writing clear source code is not necessarily at
odds with writing code that the compiler handles well.

> And David, you of all people are one of those who would
> argue, I believe, that a coder should focus on clear, expository
> writing of code and "let the compiler do its job." But now knowing
> that something this silly, this easy to hoist, isn't hoisted?? I will
> sometimes now write C code differently given this new knowledge. And
> it won't be as easy to read because of it. I don't like being pushed
> that way. (That is, I may do so _after_ first writing simply and
> making it work correctly, and then later "fixing" things to do hand
> optimizations. Something, that in this case, I don't think I should
> be forced to even think about. It's just too easy to handle by a
> compiler, in my opinion. Certainly, it SHOULD be handled by it. I
> should just write simply in this case and the compiler should produce
> identical code as though I'd hand-hoisted it.)
>

Well, in this particular case, there are two simple choices - each of
which gives /better/ source code, and should also give better target
code (unless the compiler is outrageously bad). One is to make "n"
static to the module - if it does not have to have global linkage, then
it is always better to make it static. The other is to make a local
variable in the function set to "n+2" - it should definitely hoist the
operation, and it gives you a place to comment on why you are using
"n+2" in the first place.

In general, however, you have a very good point. And the only answer is
"it depends". Sometimes it really is important to squeeze every cycle
out of the code - and then you /do/ have to test whether a "for" loop or

a "while" loop gives the fastest results on this particular
target/compiler combination, and you have to accept that the code you
write looks messier.

But most of the time, "good enough" is good enough - you can write the
clearest code you can, and accept that there are missed optimisation
opportunities. (This is not specific to C, of course - you get exactly
the same effect when programming in assembly. No sane developer aims to
make /all/ their assembly code run as fast as possible at the expense of

development time and code maintainability.)

You also have to balance portability and code efficiency and clarity.
If the code will only ever run on an 8-bit micro, it's okay to use
"uint8_t" types - even if they are less efficient than "uint32_t" when
running on a 32-bit device, and even if you /could/ have used
"uint_fast8_t" to get optimal code on both. If your code is
target-specific, then emphasis clarity first (and "uint8_t" is clearer
to read than "uint_fast8_t"), efficiency second, and portability third.
If you are writing a library for heavy use, however, you might put
portability first, efficiency second and clarity third - typical library

sources are full of conditional compilations and horrible macros in
order to work well on a range of toolchains.

So when you are coding, identify your targets and toolchains - don't try

to make your code work with everything. So what if one compiler that
you will never use generates poor code on a target that you will never
use? What matters for /your/ coding is the targets that /you/ will use
(unless you are aiming explicitly for particularly portable code).

>> > Well, I would not bother with ancient tools - compiler optimisation
>> has improved with time.
>
> But Paul specifically attempted something with 2008 Visual Studio.
> And Paul has not been sweeping in his claim that ALL compilers fail
> in this regard. Given a modern compiler that doesn't do it and given
> the implications I take from Paul's unwllingness to make a sweeping
> statement about all compilers, I conclude that some compilers may
> handle it well. And if a modern one used for mainstream programming
> on the widest operating system used doesn't, then a place to start
> looking is elsewhere. That doesn't close off modern compilers -- need
> to look there, too. But it argues for some backward examination. And
> better yet, if I find that there was been reverse evolution in tools,
> that would also be interesting.
>

Paul has found an example of a compiler that produces poor code in this
case - a sweeping claim that /all/ modern compilers will do the hoist
here is therefore clearly wrong. But a less sweeping claim that /most/
/good/ /optimising/ compilers will do the hoist is far from disproved.
As has been noted by others, this particular compiler is not considered
to be very good - MS is famous for many things, but not the quality of
their compilers (they are also the only case I know of who changed the
ordering of bitfields between two minor revisions of their tools!).
People writing software for Windows only use MSVC if they don't worry
much about the code speed - if they want faster code they use Intel's
compiler (or gcc, or a few other options). So please, don't get too
hung up on this one example.

mvh.,

David

>> Jon
>



Reply by David Brown April 28, 20122012-04-28
On 27/04/12 23:39, Jon Kirwan wrote:
> On Fri, 27 Apr 2012 12:31:09 +0200, you wrote:
>
>> On 26/04/2012 22:52, Jon Kirwan wrote:
>>
>> An assembly programmer will normally know a lot more details of
>> the usage of data and resources - a C compiler will only know it if
>> it is clearly told. So an assembly programmer will know that "n" is
>> not used in the external function "bar", and can be hoisted to the
>> beginning of a loop - the C compiler can only do that if you tell
>> it (such as by making "n" static or local to the function).
>>
>> This extra knowledge gives the assembly programmer more freedom to
>> optimise - the compiler has to /prove/ such knowledge.
>
> This additional knowledge is, in many cases, easily provided by the
> programmer to the compiler. This was done in many cases, but the one
> I am most familiar with is Bulldog. In that compiler, the programmer
> could inform the compiler which condition of a branch was "more
> likely" and since you could later profile the code through execution
> and collect statistics any such early guess could be refined by
> empirical data and then folded back into the source code. Just as an
> example.
>

There are actually several mechanisms you can provide more information
to a C compiler - some common to the C standards, some specific to
particular compilers - and there are ways for C compilers to figure out
more knowledge if they are smart enough.

For example, the "n" in this case could be made "static" - then the
compiler would know it could not be affected (legally) by an external
function, assuming the address of "n" was never passed out of the
module. Some programmers use "static" on file-level functions and
variables whenever possible - this results in better modularisation and
static error checking as well as giving the compiler better
opportunities for optimisations. Others follow the K&R habit of writing
without thinking and using the default "everything global" linkage.

Other hints are compiler-specific. I hope Paul will forgive me for
using gcc mechanisms here - I am more familiar with them, and there are
quite a few of them. For example, if a function only uses its inputs,
neither reads nor writes global data, and has no side effects, then you
can mark it with the "const" attribute. The compiler then knows it can
optimise calls to that function far more aggressively.

gcc also has a mechanism for telling the compiler about the most likely
result of a conditional branch (although all compilers have heuristics
for common cases, such as loops, that are generally quite good). gcc
has a function "__builtin_expect(exp, c)" which returns "exp" without
generating any code - but tells the compiler that "exp" will probably
equal "c". Knowledge of branch direction can be very useful - for
bigger processors, you sometimes have static branch prediction hints in
the instructions, and for many processors there are significant cycle
differences in taking or skipping a conditional branch, so you
(normally) want the common case to be the fastest.

Then there are compiler mechanisms for figuring out these things. One
example is profiling - many toolchains have support for profiling
programs and feeding that profile data back to the compiler. Rather
than relying on the programmer to say which branch is most likely, you
get the real-world results. Of course, profiling is seldom
non-intrusive, and is particularly challenging in the embedded world.
Some compilers will track the ranges variables and data can have, and
use that for optimising. In particular, if they find that a function is
only ever called with a single constant value for one of its parameters,
then that parameter can be optimised away and replaced with the
constant. Sometimes compilers will use "assert" macros, or
compiler-specific builtins, to let the programmer tell it more details.

It used to be the case that compilers just worked on one function at a
time. These days it is standard practice that when optimising, the
compiler works on a whole file at a time. This means that the compiler
has a great deal of knowledge - in particular it knows all about
"static" functions and data (assuming they don't escape by passing
pointers around), and can use that for optimisation. The "big guns"
here is whole-program or link-time optimisation.
> Similarly, it is quite possible to specify to a compiler that two
> edges of code need to execute in the same time so that the entire
> conditional and associated code blocks must have fixed execution
> time. A compiler, which I'm sure you would admit because I think you
> yourself have brought this point up many times in defense of
> compilers over assembly coding, can do the "bookkeeping" better than
> a human can -- certainly, more consistently well if not as tightly
> perhaps.

Here you have a case that C cannot handle - it has no concept of timing.
It would be nice if it did, and that it could provide features like
this (as you say, a program can do the bookkeeping with less effort and
more accuracy than a human). However, it's worth remembering that on a
steadily higher proportion of systems, such exact timing is no longer
possible in software - even when coding in assembly. Things like
pipelining, scheduling, caches, flash accelerators, etc., mean that
execution times vary for all but the smallest microcontrollers. Such
precise software timing is still sometimes useful - but I think it is a
rare, niche usage, and it is usually better to use hardware (timers,
communication hardware, etc.).

>
> The short of it is that compilers haven't really been extended in
> ways where, as you say, the programmer could provide more of what
> they know about what they are doing than perhaps could be the case.
>

I both agree and disagree - you can give quite a lot of information to
the compiler, but there are certainly cases where it would be good to
give more. The most obvious and simple case is that it would be very
useful to be able to give the compiler a range for values (like a
subrange type in Pascal or Ada), which could improve static error
checking and code generation.

> The long of it gets back to why I posted on this at all. It was news
> to me that "many" compilers can't hoist something that simple. I
> believe Paul has as comprehensive a view as anyone I may know. So I
> learned something new and, frankly, I was surprised because of the
> simplicity of the optimization we are talking about. (I'm not talking
> about cases where the compiler cannot know for sure, because of side
> effects that are possible. I'm talking only about cases where the
> compiler could know full well and still doesn't do it.)

It was news to me too. I know there are many "cheap and cheerful"
compilers around for different targets, where the emphasis is on
ease-of-use and friendly support rather than small and fast code
generation, but I had expected such a "big name" compiler to handle such
a simple optimisation.

>
> Frankly, I had always _assumed_ (I think I checked one time a long
> time back -- perhaps it was with the Apollo DN 3000 system) that it
> did hoist such things.
>

Well, I've always said that if you want the best out of your tools, you
have to know them well and you have to pick them carefully. While I see
much less need for writing in assembly than you do, I don't think you
can get the best out of your C programming unless you are able to read
the generated assembly with enough understanding to see if it could be
improved.

> It's things like that, which would cause me to start MODIFYING my
> code writing.

Sometimes it is necessary to adapt your code style to the target or
compiler. This may be a bad thing - after all, it lessens portability.
But it may also be a good thing - as a general rule, better
modularisation and localisation (such as using "static" and local
variables where possible) gives better generated code, and better
quality source code. Writing clear source code is not necessarily at
odds with writing code that the compiler handles well.

> And David, you of all people are one of those who would
> argue, I believe, that a coder should focus on clear, expository
> writing of code and "let the compiler do its job." But now knowing
> that something this silly, this easy to hoist, isn't hoisted?? I will
> sometimes now write C code differently given this new knowledge. And
> it won't be as easy to read because of it. I don't like being pushed
> that way. (That is, I may do so _after_ first writing simply and
> making it work correctly, and then later "fixing" things to do hand
> optimizations. Something, that in this case, I don't think I should
> be forced to even think about. It's just too easy to handle by a
> compiler, in my opinion. Certainly, it SHOULD be handled by it. I
> should just write simply in this case and the compiler should produce
> identical code as though I'd hand-hoisted it.)
>

Well, in this particular case, there are two simple choices - each of
which gives /better/ source code, and should also give better target
code (unless the compiler is outrageously bad). One is to make "n"
static to the module - if it does not have to have global linkage, then
it is always better to make it static. The other is to make a local
variable in the function set to "n+2" - it should definitely hoist the
operation, and it gives you a place to comment on why you are using
"n+2" in the first place.

In general, however, you have a very good point. And the only answer is
"it depends". Sometimes it really is important to squeeze every cycle
out of the code - and then you /do/ have to test whether a "for" loop or
a "while" loop gives the fastest results on this particular
target/compiler combination, and you have to accept that the code you
write looks messier.

But most of the time, "good enough" is good enough - you can write the
clearest code you can, and accept that there are missed optimisation
opportunities. (This is not specific to C, of course - you get exactly
the same effect when programming in assembly. No sane developer aims to
make /all/ their assembly code run as fast as possible at the expense of
development time and code maintainability.)

You also have to balance portability and code efficiency and clarity.
If the code will only ever run on an 8-bit micro, it's okay to use
"uint8_t" types - even if they are less efficient than "uint32_t" when
running on a 32-bit device, and even if you /could/ have used
"uint_fast8_t" to get optimal code on both. If your code is
target-specific, then emphasis clarity first (and "uint8_t" is clearer
to read than "uint_fast8_t"), efficiency second, and portability third.
If you are writing a library for heavy use, however, you might put
portability first, efficiency second and clarity third - typical library
sources are full of conditional compilations and horrible macros in
order to work well on a range of toolchains.

So when you are coding, identify your targets and toolchains - don't try
to make your code work with everything. So what if one compiler that
you will never use generates poor code on a target that you will never
use? What matters for /your/ coding is the targets that /you/ will use
(unless you are aiming explicitly for particularly portable code).

>> > Well, I would not bother with ancient tools - compiler optimisation
>> has improved with time.
>
> But Paul specifically attempted something with 2008 Visual Studio.
> And Paul has not been sweeping in his claim that ALL compilers fail
> in this regard. Given a modern compiler that doesn't do it and given
> the implications I take from Paul's unwllingness to make a sweeping
> statement about all compilers, I conclude that some compilers may
> handle it well. And if a modern one used for mainstream programming
> on the widest operating system used doesn't, then a place to start
> looking is elsewhere. That doesn't close off modern compilers -- need
> to look there, too. But it argues for some backward examination. And
> better yet, if I find that there was been reverse evolution in tools,
> that would also be interesting.
>

Paul has found an example of a compiler that produces poor code in this
case - a sweeping claim that /all/ modern compilers will do the hoist
here is therefore clearly wrong. But a less sweeping claim that /most/
/good/ /optimising/ compilers will do the hoist is far from disproved.
As has been noted by others, this particular compiler is not considered
to be very good - MS is famous for many things, but not the quality of
their compilers (they are also the only case I know of who changed the
ordering of bitfields between two minor revisions of their tools!).
People writing software for Windows only use MSVC if they don't worry
much about the code speed - if they want faster code they use Intel's
compiler (or gcc, or a few other options). So please, don't get too
hung up on this one example.
mvh.,

David
>> Jon
>
Reply by Jon Kirwan April 28, 20122012-04-28
On Fri, 27 Apr 2012 12:31:09 +0200, you wrote:

>On 26/04/2012 22:52, Jon Kirwan wrote:
>
>Since you were posting to me, I guess I'd better reply with some more
>over-generalised pronouncements! (I know I do this - sometimes
>intentionally to emphasise a point or provoke discussions, sometimes
>unintentionally.)
>
>> On Thu, 26 Apr 2012 16:28:38 +0200, David wrote:
>>
>> >OK, I won't argue any more!
>> >
>> >I wonder if there is a particular reason that MSVC is bad here (I assume
>> >you agree that gcc has generated valid code with the n+2 moved out of
>> >the loop). Perhaps MSVC turns off optimisations when there is a
>> >volatile in the code?
>> >
>> >Anyway, this is getting way off topic for this group - perhaps it's best
>> >just to drop this branch of the thread.
>>
>> Actually, it is germane. One of the perennial topics that
>> comes up from time to time -- and I think you are guilty here
>> of making over-reaching comments in this regard (as are we
>> all on some subjects, I admit) -- is whether or not assembly
>> coding is "dead" or not.
>>I think you are stretching the topic a lot more here - there is a big
>step between "how does the compiler optimise or fail to optimise" to "is
>assembly dead?". There is a relationship, certainly, but it is not a
>direct jump.
>
>Still, this is a discussion group, and if you and others want to discuss
>this here and now, then that's fine. I only suggested dropping the
>branch because it looked like few people were interested.
>
>> This kind of optimization is dead obvious for any assembly
>> coder who would, almost without thought about it, do the
>> promotion out of the loop (at some point -- not necessarily
>> as the first step in getting things working right.) If C
>> compilers cannot, in this day and age now, be trusted to do
>> such trivia then it leaves open many other questions.
>>An assembly programmer will normally know a lot more details of the
>usage of data and resources - a C compiler will only know it if it is
>clearly told. So an assembly programmer will know that "n" is not used
>in the external function "bar", and can be hoisted to the beginning of a
>loop - the C compiler can only do that if you tell it (such as by making
>"n" static or local to the function).
>
>This extra knowledge gives the assembly programmer more freedom to
>optimise - the compiler has to /prove/ such knowledge.

This additional knowledge is, in many cases, easily provided
by the programmer to the compiler. This was done in many
cases, but the one I am most familiar with is Bulldog. In
that compiler, the programmer could inform the compiler which
condition of a branch was "more likely" and since you could
later profile the code through execution and collect
statistics any such early guess could be refined by empirical
data and then folded back into the source code. Just as an
example.

Similarly, it is quite possible to specify to a compiler that
two edges of code need to execute in the same time so that
the entire conditional and associated code blocks must have
fixed execution time. A compiler, which I'm sure you would
admit because I think you yourself have brought this point up
many times in defense of compilers over assembly coding, can
do the "bookkeeping" better than a human can -- certainly,
more consistently well if not as tightly perhaps.

The short of it is that compilers haven't really been
extended in ways where, as you say, the programmer could
provide more of what they know about what they are doing than
perhaps could be the case.

The long of it gets back to why I posted on this at all. It
was news to me that "many" compilers can't hoist something
that simple. I believe Paul has as comprehensive a view as
anyone I may know. So I learned something new and, frankly, I
was surprised because of the simplicity of the optimization
we are talking about. (I'm not talking about cases where the
compiler cannot know for sure, because of side effects that
are possible. I'm talking only about cases where the compiler
could know full well and still doesn't do it.)

Frankly, I had always _assumed_ (I think I checked one time a
long time back -- perhaps it was with the Apollo DN 3000
system) that it did hoist such things.

It's things like that, which would cause me to start
MODIFYING my code writing. And David, you of all people are
one of those who would argue, I believe, that a coder should
focus on clear, expository writing of code and "let the
compiler do its job." But now knowing that something this
silly, this easy to hoist, isn't hoisted?? I will sometimes
now write C code differently given this new knowledge. And it
won't be as easy to read because of it. I don't like being
pushed that way. (That is, I may do so _after_ first writing
simply and making it work correctly, and then later "fixing"
things to do hand optimizations. Something, that in this
case, I don't think I should be forced to even think about.
It's just too easy to handle by a compiler, in my opinion.
Certainly, it SHOULD be handled by it. I should just write
simply in this case and the compiler should produce identical
code as though I'd hand-hoisted it.)

>

>Well, I would not bother with ancient tools - compiler optimisation has
>improved with time.

But Paul specifically attempted something with 2008 Visual
Studio. And Paul has not been sweeping in his claim that ALL
compilers fail in this regard. Given a modern compiler that
doesn't do it and given the implications I take from Paul's
unwllingness to make a sweeping statement about all
compilers, I conclude that some compilers may handle it well.
And if a modern one used for mainstream programming on the
widest operating system used doesn't, then a place to start
looking is elsewhere. That doesn't close off modern compilers
-- need to look there, too. But it argues for some backward
examination. And better yet, if I find that there was been
reverse evolution in tools, that would also be interesting.

>

Jon
Reply by Joe Radomski April 27, 20122012-04-27
Can't say I was a child of the 70s.... I did some programming in '79, but really started in 1980.. Learned 6502 assembler on a Comodore Pet 4004.. Was in heaven when I got An ATARI 800 with OMNIMON installed.. One of the best debuggers I ever used... Could break into any program, assemble and disasemble directly to/from memory and then continue execution..  Best Basic I ever used (that looked like basic) was from OS systems.. It originally was called Basix XL then became Basic XE.. So many built in functions.. I loved the way it extended ATARI basic and kept their unique string handling and ability to easily integrate 6502 calls into Basic..
 
BTW you ever notice the 6502 code in Terminator? (atleast thats what I vaguely remember it was)
 
By the mid 80s I was into microcontrollers (although I was programming 68k as well), and used the 805x,6805 and tms7000 derivitives.. Believe it or not I never really got into C (although I could use it when needed) until the 2000s.. Most of the projects I worked on didn't need to be portable and I loved the freedom of assembler so I stayed away from higher level languages.
 
 
 
>________________________________
>From: Paul Curtis
>To: m...
>Sent: Thursday, April 26, 2012 7:31 PM
>Subject: Re: [msp430] Warning in IAR when performing bitwise not on unsigned char (corrected)
>
>>> In general, it is not worthwhile to implement a complex system in assembly
>>> because the ROI is low to negative.  If you have a complex >system with a
>>> great ROI which must be coded to squeeze the best from specialist hardware,
>>> then you require somebody who can get the >most out of your hardware, who
>>> may well not be an assembly programmer--just an excellent programmer--and
>>> can turn himself to >programming such hardware.
>>
>> In general, only works in general.  Some things only work well when you code
>> for them in assembly ( or a variant thereof ) things like HAL/CL I'd
>> consider more assembly than C for instance. In general I find that the
>> programmers who come from an assembly background are much more useful in
>> these areas, at least this has been my experience in the games industry.
>
>OK, well as a child of the 60s, and growing up in the 70s with the micro revolution, I passed through FORTRAN IV to BASIC to 6502 assembly code to Z80 assembly code and then to Pascal and 68k assembly code…  I guess you haven't lived as a programmer unless you've written meaningful code at a low level.  Personally I loved the built-in 6502 assembler of the Acorn Atom--absolutely stinking stroke of genius.
>
>>
>> So in general at your regular coding company who's turning out average
>> performing software that’s really not going to gain much from assembler,
>> you're right. But we still have a need for the others.
>>
>
>Although I have had games published in the past, which I guess I am proud of as it got me through uni, I now have zero time for any sort of game.  I should qualify this: I believe I have more years behind me than I have productive years ahead of me, and I want to squeeze the most that I can out of those remaining years.  As much as I love writing code, and my sig shows you I still love writing software and creating Defender from scratch, I have no desire to play games.  I find writing software so much more fulfilling.
>
>>> Right now, we don't churn out "assembly programmers".  I think that's a
>>> good thing.  We should be turning out programmers that are >generally
>>> useful and know about different aspects of programming, including low-level
>>> aspects.  Unfortunately the dire state of ICT in >the UK means we are up
>>> shit creek and have very little native technical resource, unless you need
>>> somebody to cook up some HTML or >print a Word document.
>>
>> I disagree, I think it’s a bad thing, but I'm basing it on my work
>> experience. Learning asm first gave me a greater insight into coding, most
>> of the programmers I know wouldn't be able to do look at the asm output from
>> the compiler as you did earlier to even know if its broken or generating
>> good code.
>
>Doesn't this simply come from experience?  Given a normal distribution of programmers, I guess you really are interested in only the top one percent or better for your positions.
>
>>
>>> Having done my fair share of coding in DSP assembly language for data
>>> broadcast and Eurofighter radar real time systems, I know that >there is a
>>> place for assembly coding--but not the whole system, we just coded the
>>> absolutely time-critical parts in assembly code and >the
>>> tie-it-all-together code was in Pascal, Modula-2, C or Ada.
>>
>> I most do mostly graphics rendering related tasks (have done eurofighter sim
>> in the distant past), so in that field its very useful.
>
>This just comes down to squeezing the best you can out of what you have.  To do that, no compiler will compete with a competent programmer for much of the intricate pieces of the jigsaw.  State of the art is not quite there yet in compiler tech.
>
>>
>> Agreed tie in the code in C/C+, but learn assembly, we do use it all the
>> time in our industry, If I post a job for a programmer, I get other
>> recruiters and companies calling me asking me to pass on the ones that we
>> pass on because everyone is hurting so bad, I can get common garden C++
>> people fairly easily, but they're useless when it comes to under the hood.
>>
>>
>> Quoting Colin Chapman, simplify, and add lightness. If he were a programmer,
>> I reckon he'd do it in asm ;)
>
>Wicked.  :-)  Did you know that happens to have special significance for me?
>
>-- Paul.
>
>
Reply by Hugo Brunert April 27, 20122012-04-27
In the Windows PC world, I would be willing to bet that less than 1% of
the programmers look at the size and speed of their generated code, and
care about it.

MS is not a realtime operating system, and therefore they don't care
what size or speed their products generate. If it's too big, get more
ram, if it's too slow, reboot,( UGHhhh ) MS's answer to every problem. I
don't think they have requirements put on them to produce something
measured in size or speed. Just do it.

Embedded programmers DO really care about the size and speed of their
product, sometimes spending hours trying to reduce it by 10 bytes or
shaving uS or nS off a function.

Just about every requirement I have has a size and speed ruler attached
to it.

We are comparing 2 worlds which even though the programmers have the
same title, the end result is way too different to compare.

From: m... [mailto:m...] On Behalf
Of David Brown
Sent: Friday, April 27, 2012 6:31 AM
To: m...
Subject: Re: [msp430] Re: Warning in IAR when performing bitwise not on
unsigned char (corrected)

On 26/04/2012 22:52, Jon Kirwan wrote:

Since you were posting to me, I guess I'd better reply with some more
over-generalised pronouncements! (I know I do this - sometimes
intentionally to emphasise a point or provoke discussions, sometimes
unintentionally.)

> On Thu, 26 Apr 2012 16:28:38 +0200, David wrote:
>
> >OK, I won't argue any more!
> >
> >I wonder if there is a particular reason that MSVC is bad here (I
assume
> >you agree that gcc has generated valid code with the n+2 moved out of
> >the loop). Perhaps MSVC turns off optimisations when there is a
> >volatile in the code?
> >
> >Anyway, this is getting way off topic for this group - perhaps it's
best
> >just to drop this branch of the thread.
>
> Actually, it is germane. One of the perennial topics that
> comes up from time to time -- and I think you are guilty here
> of making over-reaching comments in this regard (as are we
> all on some subjects, I admit) -- is whether or not assembly
> coding is "dead" or not.
>

I think you are stretching the topic a lot more here - there is a big
step between "how does the compiler optimise or fail to optimise" to "is

assembly dead?". There is a relationship, certainly, but it is not a
direct jump.

Still, this is a discussion group, and if you and others want to discuss

this here and now, then that's fine. I only suggested dropping the
branch because it looked like few people were interested.

> This kind of optimization is dead obvious for any assembly
> coder who would, almost without thought about it, do the
> promotion out of the loop (at some point -- not necessarily
> as the first step in getting things working right.) If C
> compilers cannot, in this day and age now, be trusted to do
> such trivia then it leaves open many other questions.
>

An assembly programmer will normally know a lot more details of the
usage of data and resources - a C compiler will only know it if it is
clearly told. So an assembly programmer will know that "n" is not used
in the external function "bar", and can be hoisted to the beginning of a

loop - the C compiler can only do that if you tell it (such as by making

"n" static or local to the function).

This extra knowledge gives the assembly programmer more freedom to
optimise - the compiler has to /prove/ such knowledge.

However, you must be aware that the assembly programmer is not scalable.

You can keep track of a few items in your head, and you can make use
of documentation or coding standards to help. But for larger or more
complex systems, especially if there are many programmers involved, it's

going to get messy. In this example, what happens when a programmer
later modifies the implementation of "bar" so that it /does/ change "n"?

Things fall to pieces - and it can only be prevented by manually
checking the documentation of what code is allowed to use which
variables.

I have always agreed that an assembly programmer can write code that is
smaller and faster than a C compiler will generate. But an assembly
programmer has a much harder time writing /structured/, /maintainable/,
/standardised/, /scalable/, /low risk/ code that is as small and fast -
and generally cannot do so in a similar development time. The C
compiler may not be quite as smart as the assembler programmer about
optimising the code - but it will generally make correct code (assuming
correct source code, of course) and will not have trouble when
assumptions about variable usage changes over time. With assembly, your
assumptions are all in documentation or in the programmers' heads - with

C, many assumptions can be expressed in the language and checked by the
tools.

If you want to compare the generated code of compiles and assembly
programmers, you must also understand how to get the best from the
compiler - many (most?) C programmers do not. And knowledge of assembly
is useful here too - to get to know your compiler, you should look at
the generated assembly, and see what comes out. In the original example
with a call to an external function "bar" in the loop, then there are
several ways to adjust the source code to let the compiler generate
better code. But if you don't know these things, or don't use them,
then you will get bigger and slower code.

> I will be looking over other compilers on this. I have a LOT
> of old tools lying about. Metaware, for example, from the mid
> 1980's is sitting on 5 1/4" floppies. Fran DeRemer and Tom
> Pennello, despite some excessive Christian evangelism in
> their technical documents, did take a different and
> interesting direction in their compilers and I'm interested
> to see how they behave here. Also, Microsoft today may not do
> that optimization, but I have decades of older tools from
> them and not all may have similar behaviors. It will be
> interesting to do, in a little while when I get around to
> collecting up the tools, installing them under Win98SE and
> then trying them out, again.
>

Well, I would not bother with ancient tools - compiler optimisation has
improved with time.

As I said, I am somewhat surprised that MSVC code was so bad here.

> I'm also still very interested in Paul's comments about
> register coloring and I need to think (and refresh myself
> about the algorithms again) about this more. I just picked up
> the newer Aho book and haven't had a chance to read it, yet.
> So this might be a motivation to rummage through that, as
> well.

"Register colouring" is a term used in algorithms for allocating
registers - what data goes into which registers at which time. For
small functions and processors with few registers, it's easy enough -
for larger functions it can get very complicated trying to track the
lifetimes of data, variables, intermediate results, etc., and allocating

them to different registers and different times. Paul was, I think,
alluding to the possibility that it is sometimes more efficient to
re-calculate "n+2" than to calculate it once and store it - perhaps
because for other code the compiler also needs to track "n" in a
register and dedicating another register to "n+2" would slow down other
code.

This is another situation where compilers will normally beat assembly
programmers - if you have lots of registers and complex code, then the
compiler can get more optimal register allocation simply because it is
very difficult for the assembly programmer to keep everything in his
head. (Of course, the assembly programmer /could/ work it all out by
hand on paper - but that takes a lot of time and effort, and simple
changes to the algorithm can mean having to re-do the whole thing.)

>
> Bottom line is -- this argues unfavorably (admittedly only
> one data point and there remains many many counter arguments
> as well) to those who say that there is no longer ANY reason
> for assembly coding because of the supposed quality of modern
> C compilers. If they can't handle this.... well.
>

No - the bottom line is don't use MS's compiler if you want small and
fast code :-) gcc handled it perfectly well, with as good code as one
could write by hand.

To be more accurate, the bottom line is that if you want to code in C
and generate small and fast code, you need to make sure you have a good
compiler and you know how to use it properly. After all, the C code was
written in a way that was not giving the compiler the best chance,
especially on an x86.



Reply by David Brown April 27, 20122012-04-27
On 27/04/2012 12:50, Paul Curtis wrote:
> > > http://www.worldofspectrum.org/infoseekid.cgi?id04851
> > >
> > > http://www.worldofspectrum.org/infoseekid.cgi?id04849
> > >
> > > All developed on a 48K spectrum...
> > >
> >
> > Very nice. But I bet your tape recorder could read the tapes it wrote, so
> > you didn't have to start from scratch for each crash, reset, or when your
> > parents told you to turn the damn thing off and go to bed! (I was about
> > 12 or so when I got a Spectrum.)
>
> I was ~18 when I wrote the above and in the first year of uni; Starbike was
> developed with the help of a Microdrive. Well, I say help, but microdrives
> are not known for their reliability, so source was regularly lost... I had
> three cartridges which I rotated, but really they were not enough to sustain
> development...
>

I had a Microdrive myself later on, and as you say their reliability was
not legendary. Still, they were faster and safer than tapes.

At university, we didn't go in much for practical programming. We were
more followers of Knuth - "Be careful with this program. I haven't
tested it, only proven it correct." The preferred medium for
programming was paper and pencil - computers were only used to type up
reports proving that the programs work.

Reply by Paul Curtis April 27, 20122012-04-27
> > http://www.worldofspectrum.org/infoseekid.cgi?id=0004851
> >
> > http://www.worldofspectrum.org/infoseekid.cgi?id=0004849
> >
> > All developed on a 48K spectrum...
> >
>
> Very nice. But I bet your tape recorder could read the tapes it wrote, so
> you didn't have to start from scratch for each crash, reset, or when your
> parents told you to turn the damn thing off and go to bed! (I was about
> 12 or so when I got a Spectrum.)

I was ~18 when I wrote the above and in the first year of uni; Starbike was
developed with the help of a Microdrive. Well, I say help, but microdrives
are not known for their reliability, so source was regularly lost... I had
three cartridges which I rotated, but really they were not enough to sustain
development...

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

Reply by David Brown April 27, 20122012-04-27
On 27/04/2012 11:06, Paul Curtis wrote:
> > > OK, well as a child of the 60s, and growing up in the 70s with the
> > > micro revolution, I passed through FORTRAN IV to BASIC to 6502
> > > assembly code to Z80 assembly code and then to Pascal and 68k assembly
> > > code I guess you haven't lived as a programmer unless you've written
> > > meaningful code at a low level. Personally I loved the built-in 6502
> > > assembler of the Acorn Atom--absolutely stinking stroke of genius.
> >
> > I never used the Atom, but assuming it's assembler was similar to that of
> > the BBC Micro then I agree. What I found particularly cool was how well
> > you could mix Basic and Assembly. BBC Basic was probably the best Basic I
> > ever used. And in addition you could use the A%, X% and Y% fast integer
> > variables in Basic to pass data back and forth between assembly snippets.
> > You could also call OS functions easily from assembly.
>
> BBC BASIC? Huh, spoilt!
>
> > But you haven't done /real/ programming until you have hand-assembled
> code
> > for a ZX spectrum (Z80 processor), typed it all into a Basic loader
> > program on the chewing-gum keyboard, and done your debugging based on the
> > sound made by the power supply. Of course, if it hangs you pull the plug
> > and start again (or load from a tape recorder, if you are lucky).
> > It encouraged careful thought and accurate programming, rather than "try
> > it and see" coding!
>
> Indeed.
>
> http://www.worldofspectrum.org/infoseekid.cgi?id04851
>
> http://www.worldofspectrum.org/infoseekid.cgi?id04849
>
> All developed on a 48K spectrum...
>

Very nice. But I bet your tape recorder could read the tapes it wrote,
so you didn't have to start from scratch for each crash, reset, or when
your parents told you to turn the damn thing off and go to bed! (I was
about 12 or so when I got a Spectrum.)