Reply by Paul Curtis ●April 25, 20122012-04-25

> >> Paul, just a side-bar question. If the loop body didn't include a
> >> function call which arguably may modify n, but instead was a block of
> >> C statements that clearly didn't modify n, would the 'n+2'
> >> computation then be lifted outside the loop?
> >
> > It could be. Whether it is or not is a different question that I
> > cannot answer, in general.
>
> I was asking this question 'in particular.' I meant in the case of the
> compiler you used for the disassembly I see above.

The answer is no, it would not.

> > In compilers it is common to make integer conversions explicit in the
> > intermediate representation.
>
> Ah hah! That's a concrete tidbit to work from. So if I understand this
> correctly, the integer promotion is made during parsing and syntax
> analysis and before anything else.

Integer promotions are inserted into the tree to make explicit the
conversion that is implicit in the source code. Integer conversions are
also explicit. The compiler may well use the fact that a conversion is
implicit during analysis to issue a warning, but it should never issue a
warning for a "programmer knows best" type case (though there are cases
where some do, e.g. casting away const).

> So the original programmer "hint" is, in effect, lost to the compiler and
> isn't recoverable should there be a reason to do so.

As far as I can see, there is no hint in my source code.

> > The optimiser
> > can then narrow conversion or even discard conversions during some
> > phases of optimisation--generally there is more than one place in the
> > compiler where such things can be detected and discarded, and it's
> > usually a pragmatic decision on where it's easiest to detect and
> > modify "an" IR.
>
> Can you think of cases where the "narrowing conversion" logic would _FAIL_
> to make a narrowing conversion, lacking the original programmer's explicit
> syntax declaring the type,

Lots of compilers fail to do this. Consider global x and y...

signed char x;
signed char y;

x = y - 1;

Now, this you compile in a lot of different ways. Simple:

(1) Load byte y; sign extend to int; subtract one; assign byte to y
(implicit truncation). Finish.

Or by recognizing you can narrow:

(2) Load byte y; subtract one; assign byte to x. Finish.

Or even:

(3) Load byte y; assign byte to x; subtract (byte) one from x. Finish.

Now, (1) is great for a fist cut compiler to ensure that you get code
working on the chip. All implementers have been here. (2) requires more
analysis. (3) happens to reduce register pressure on memory-memory
architectures as there is no register involved, at the possible expense of
bigger/slower code.
> but where it may have been possible to do more
> _with_ that information present at the time the narrowing conversion logic
> operates?

I don't think that a well-implemented compiler would be at any disadvantage.

> Or is there a 1:1 situation where ALL such cases where narrowing
> is possible under the rules, that the loss of the explicit syntax has
> absolutely no possible impact on the options available during narrowing?

If I understand your question correctly, a well-implemented compiler would
not be at any disadvantage when narrowing expressions.

-- Paul.

Beginning Microcontrollers with the MSP430

Reply by David Brown ●April 26, 20122012-04-26

On 24/04/2012 23:17, Jon Kirwan wrote:
> Paul, just a side-bar question. If the loop body didn't
> include a function call which arguably may modify n, but
> instead was a block of C statements that clearly didn't
> modify n, would the 'n+2' computation then be lifted outside
> the loop? (n isn't volatile and n+2 should be unvarying
> during loop execution.) It's seems to me the answer is yes,
> it would lift the unvarying subexpression outside the loop.
> But I'm curious, anyway, and just thought I'd ask.
>

In most cases (assuming you enable optimisation, of course), I think
compilers will lift out the "n+2" if they can be sure that "n" is not
affected by bar() - either by knowing the definition of bar(), or if "n"
is static and it's address never leaks outside the module. But if "n"
is an externally linked global (i.e., just a plain "signed char n" like
here), then the compiler can't guarantee that bar() won't change it - so
it can't optimise away the access even though it is not volatile.

> Also, and this is a weird recollection about the C standards
> (can't recall if C89 or C99 or both -- and did you bring up
> the 2011 standard???) but I seem to recall that for the
> comparison under discussion that takes place, x < n+2, the
> compiler is permitted to avoid explicit "integer promotions"
> in the generated code if the compiler can determine that the
> actual emitted code acts "as if" the promotions had occurred.
> (While the situation you show above may not permit this to be
> fully determinable without knowledge of the value of n on
> entry, I could easily pony up one that should be calculable
> by a compiler as working "as if" without the promotions
> having to occur. So the question would remain.) While I'm
> probably wrong about that recollection, I'm also curious
> about your thoughts there.
>
> Jon
>

The "as if" rule /always/ applies in C (and C++), and always has done.
C is described in terms of an abstract machine - the generated object
code always has to generate the same outputs from the same inputs, in
the same ordering, as if it ran on the "abstract machine". Other than
that, the compiler can do whatever it wants regarding optimisations,
pessimisations (if that's a real word), manipulations, transformations,
etc. The code has to match on certain key points - program entry and
exit (including arguments and return values), external library calls,
all volatile accesses, some types of inline assembly, etc. But in
between these it can do as it wants.

So if the compiler thinks it would like to implement the "signed char"
loop using floating point, it can do so - as long as it is sure it runs
the correct number of loops.

Reply by Paul Curtis ●April 26, 20122012-04-26

> In most cases (assuming you enable optimisation, of course), I think
> compilers will lift out the "n+2" if they can be sure that "n" is not
> affected by bar() - either by knowing the definition of bar(), or if "n"
> is static and it's address never leaks outside the module.

This is a generalization that you cannot make: using "most cases" and
"will".

A compiler will do this only if (1) it performs the analysis in order to
detect this transformation (2) it deems it profitable to do so. Not only
that, the compiler may detect this, perform the transformation, and later
realize during coloring that things are not going to plan and decide to
rematerialize the computation to reduce register pressure and find a
coloring.

I just tried the fragment below on Visual Studio 2008's C compiler at the
highest optimization level (/Ox).

volatile int i;
signed char n;

void foo(void)
{
signed char x;
for (x = 1; x < n+2; ++x)
++i;
}

This is what I got:

00E41990 movsx edx,byte ptr [n (0EE58BEh)]
00E41997 mov ecx,1
00E4199C add edx,2
00E4199F cmp edx,ecx
00E419A1 mov al,cl
00E419A3 jle foo+3Ah (0E419CAh)
00E419A5 push esi
00E419A6 jmp foo+20h (0E419B0h)
00E419A8 lea esp,[esp]
00E419AF nop
++i;
00E419B0 add dword ptr [i (0EE5954h)],ecx
00E419B6 movsx edx,byte ptr [n (0EE58BEh)]
00E419BD add al,cl
00E419BF movsx esi,al
00E419C2 add edx,2
00E419C5 cmp esi,edx
00E419C7 jl foo+20h (0E419B0h)
00E419C9 pop esi

Clearly Visual Studio doesn't lift out the loop-invariant addition of 2 or
realize that n itself is loop invariant. Visual Studio also sign extends x
each time through the loop.

Regards,

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

Reply by David Brown ●April 26, 20122012-04-26

On 26/04/2012 10:06, Paul Curtis wrote:
> > In most cases (assuming you enable optimisation, of course), I think
> > compilers will lift out the "n+2" if they can be sure that "n" is not
> > affected by bar() - either by knowing the definition of bar(), or if "n"
> > is static and it's address never leaks outside the module.
>
> This is a generalization that you cannot make: using "most cases" and
> "will".
>

Well, it is always a risk to generalise too much.

> A compiler will do this only if (1) it performs the analysis in order to
> detect this transformation (2) it deems it profitable to do so. Not only
> that, the compiler may detect this, perform the transformation, and later
> realize during coloring that things are not going to plan and decide to
> rematerialize the computation to reduce register pressure and find a
> coloring.

There are sometimes circumstances in which "obvious" optimisations are
not really optimal, and the compiler will often take more into account
than you see at a single glance. Things like instruction pipeline
scheduling can often mean the best code is different from what one might
think.

But in this particular case, I would be surprised to see an optimising
compiler that does not lift the n+2 calculation. And if this really is
the code MSVS produces for "optimised" code, then I /am/ surprised.
It's crap code.

>
> I just tried the fragment below on Visual Studio 2008's C compiler at the
> highest optimization level (/Ox).
>
> volatile int i;
> signed char n;
>
> void foo(void)
> {
> signed char x;
> for (x = 1; x < n+2; ++x)
> ++i;
> }
>
> This is what I got:
>
> 00E41990 movsx edx,byte ptr [n (0EE58BEh)]
> 00E41997 mov ecx,1
> 00E4199C add edx,2
> 00E4199F cmp edx,ecx
> 00E419A1 mov al,cl
> 00E419A3 jle foo+3Ah (0E419CAh)
> 00E419A5 push esi
> 00E419A6 jmp foo+20h (0E419B0h)
> 00E419A8 lea esp,[esp]
> 00E419AF nop
> ++i;
> 00E419B0 add dword ptr [i (0EE5954h)],ecx
> 00E419B6 movsx edx,byte ptr [n (0EE58BEh)]
> 00E419BD add al,cl
> 00E419BF movsx esi,al
> 00E419C2 add edx,2
> 00E419C5 cmp esi,edx
> 00E419C7 jl foo+20h (0E419B0h)
> 00E419C9 pop esi
>
> Clearly Visual Studio doesn't lift out the loop-invariant addition of 2 or
> realize that n itself is loop invariant. Visual Studio also sign extends x
> each time through the loop.
>
> Regards,

gcc produces similar code without optimising:

foo:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movb $1, -1(%ebp)
jmp .L2
.L3:
movl i, %eax
addl $1, %eax
movl %eax, i
addb $1, -1(%ebp)
.L2:
movsbl -1(%ebp),%eax
movzbl n, %edx
movsbl %dl, %edx
addl $2, %edx
cmpl %edx, %eax
jl .L3
leave
ret
But with optimisation (-Os, which means most standard optimisations with
a balance on low space), we get:

foo:
movsbl n, %edx
xorl %eax, %eax
pushl %ebp
movl %esp, %ebp
incl %edx
jmp .L2
.L3:
movl i, %ecx
incl %ecx
movl %ecx, i
.L2:
incl %eax
cmpl %eax, %edx
jge .L3
popl %ebp
ret
Not only does it move out the "n+2", but it in fact gets eliminated
entirely.

Reply by Paul Curtis ●April 26, 20122012-04-26

> But with optimisation (-Os, which means most standard optimisations with
> a balance on low space), we get:
>
> foo:
> movsbl n, %edx
> xorl %eax, %eax
> pushl %ebp
> movl %esp, %ebp
> incl %edx
> jmp .L2
> .L3:
> movl i, %ecx
> incl %ecx
> movl %ecx, i
> .L2:
> incl %eax
> cmpl %eax, %edx
> jge .L3
> popl %ebp
> ret
>
>
> Not only does it move out the "n+2", but it in fact gets eliminated
> entirely.

I think you're mistaken. x < n+2 is rewritten as x <= n+1 with n+1 computed
in the preheader.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

Reply by David Brown ●April 26, 20122012-04-26

On 26/04/2012 15:29, Paul Curtis wrote:
> > But with optimisation (-Os, which means most standard optimisations with
> > a balance on low space), we get:
> >
> > foo:
> > movsbl n, %edx
> > xorl %eax, %eax
> > pushl %ebp
> > movl %esp, %ebp
> > incl %edx
> > jmp .L2
> > .L3:
> > movl i, %ecx
> > incl %ecx
> > movl %ecx, i
> > .L2:
> > incl %eax
> > cmpl %eax, %edx
> > jge .L3
> > popl %ebp
> > ret
> >
> >
> > Not only does it move out the "n+2", but it in fact gets eliminated
> > entirely.
>
> I think you're mistaken. x < n+2 is rewritten as x <= n+1 with n+1 computed
> in the preheader.
>

Yes, you are right - I should think before I write...

Still, I am surprised that MSVS generated such poor code when optimising
- are you sure you have the compiler options correct?

Reply by Paul Curtis ●April 26, 20122012-04-26

> Yes, you are right - I should think before I write...
>
> Still, I am surprised that MSVS generated such poor code when optimising
> - are you sure you have the compiler options correct?

100% sure. I changed the optimization levels and saw the code change.

If I compile a plain vanilla C app:

E:\tmp>more n.c

volatile int i;
signed char n;

void foo(void)
{
signed char x;
for (x = 1; x < n+2; ++x)
++i;
}

E:\tmp>cl /Ox /c /Fan.s n.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for
80x86
Copyright (C) Microsoft Corporation. All rights reserved.

n.c

E:\tmp>more n.s
; Listing generated by Microsoft (R) Optimizing Compiler Version
15.00.21022.08

TITLE E:\tmp\n.c
.686P
.XMM
include listing.inc
.model flat

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

_DATA SEGMENT
COMM _i:DWORD
COMM _n:BYTE
_DATA ENDS
PUBLIC _foo
; Function compile flags: /Ogtpy
_TEXT SEGMENT
_foo PROC
; File e:\tmp\n.c
; Line 7
movsx edx, BYTE PTR _n
mov ecx, 1
add edx, 2
cmp edx, ecx
mov al, cl
jle SHORT $LN1@foo
push esi
npad 10
$LL3@foo:
; Line 8
add DWORD PTR _i, ecx
movsx edx, BYTE PTR _n
add al, cl
movsx esi, al
add edx, 2
cmp esi, edx
jl SHORT $LL3@foo
pop esi
$LN1@foo:
; Line 9
ret 0
_foo ENDP
_TEXT ENDS
END

I don't think you can argue with that.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

Reply by David Brown ●April 26, 20122012-04-26

On 26/04/2012 16:30, Paul Curtis wrote:
> > Yes, you are right - I should think before I write...
> >
> > Still, I am surprised that MSVS generated such poor code when optimising
> > - are you sure you have the compiler options correct?
>
> 100% sure. I changed the optimization levels and saw the code change.
>
> If I compile a plain vanilla C app:
>
> E:\tmp>more n.c
>
> volatile int i;
> signed char n;
>
> void foo(void)
> {
> signed char x;
> for (x = 1; x < n+2; ++x)
> ++i;
> }
>
> E:\tmp>cl /Ox /c /Fan.s n.c
> Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for
> 80x86
> Copyright (C) Microsoft Corporation. All rights reserved.
>
> n.c
>
> E:\tmp>more n.s
> ; Listing generated by Microsoft (R) Optimizing Compiler Version
> 15.00.21022.08
>
> TITLE E:\tmp\n.c
> .686P
> .XMM
> include listing.inc
> .model flat
>
> INCLUDELIB LIBCMT
> INCLUDELIB OLDNAMES
>
> _DATA SEGMENT
> COMM _i:DWORD
> COMM _n:BYTE
> _DATA ENDS
> PUBLIC _foo
> ; Function compile flags: /Ogtpy
> _TEXT SEGMENT
> _foo PROC
> ; File e:\tmp\n.c
> ; Line 7
> movsx edx, BYTE PTR _n
> mov ecx, 1
> add edx, 2
> cmp edx, ecx
> mov al, cl
> jle SHORT $LN1@foo
> push esi
> npad 10
> $LL3@foo:
> ; Line 8
> add DWORD PTR _i, ecx
> movsx edx, BYTE PTR _n
> add al, cl
> movsx esi, al
> add edx, 2
> cmp esi, edx
> jl SHORT $LL3@foo
> pop esi
> $LN1@foo:
> ; Line 9
> ret 0
> _foo ENDP
> _TEXT ENDS
> END
>
> I don't think you can argue with that.
>

OK, I won't argue any more!

I wonder if there is a particular reason that MSVC is bad here (I assume
you agree that gcc has generated valid code with the n+2 moved out of
the loop). Perhaps MSVC turns off optimisations when there is a
volatile in the code?

Anyway, this is getting way off topic for this group - perhaps it's best
just to drop this branch of the thread.

mvh,.

David

Reply by Jon Kirwan ●April 26, 20122012-04-26

On Thu, 26 Apr 2012 16:28:38 +0200, David wrote:

>OK, I won't argue any more!
>
>I wonder if there is a particular reason that MSVC is bad here (I assume
>you agree that gcc has generated valid code with the n+2 moved out of
>the loop). Perhaps MSVC turns off optimisations when there is a
>volatile in the code?
>
>Anyway, this is getting way off topic for this group - perhaps it's best
>just to drop this branch of the thread.

Actually, it is germane. One of the perennial topics that
comes up from time to time -- and I think you are guilty here
of making over-reaching comments in this regard (as are we
all on some subjects, I admit) -- is whether or not assembly
coding is "dead" or not.

This kind of optimization is dead obvious for any assembly
coder who would, almost without thought about it, do the
promotion out of the loop (at some point -- not necessarily
as the first step in getting things working right.) If C
compilers cannot, in this day and age now, be trusted to do
such trivia then it leaves open many other questions.

I will be looking over other compilers on this. I have a LOT
of old tools lying about. Metaware, for example, from the mid
1980's is sitting on 5 1/4" floppies. Fran DeRemer and Tom
Pennello, despite some excessive Christian evangelism in
their technical documents, did take a different and
interesting direction in their compilers and I'm interested
to see how they behave here. Also, Microsoft today may not do
that optimization, but I have decades of older tools from
them and not all may have similar behaviors. It will be
interesting to do, in a little while when I get around to
collecting up the tools, installing them under Win98SE and
then trying them out, again.

I'm also still very interested in Paul's comments about
register coloring and I need to think (and refresh myself
about the algorithms again) about this more. I just picked up
the newer Aho book and haven't had a chance to read it, yet.
So this might be a motivation to rummage through that, as
well.

Bottom line is -- this argues unfavorably (admittedly only
one data point and there remains many many counter arguments
as well) to those who say that there is no longer ANY reason
for assembly coding because of the supposed quality of modern
C compilers. If they can't handle this.... well.

Jon

Reply by Paul Curtis ●April 26, 20122012-04-26

> Actually, it is germane. One of the perennial topics that
> comes up from time to time -- and I think you are guilty here
> of making over-reaching comments in this regard (as are we
> all on some subjects, I admit) -- is whether or not assembly
> coding is "dead" or not.

Assembly coding for small systems is alive and well; for large, complex systems, it is dead because the effort required to implement such a system in assembly code is simply not worthwhile.

> This kind of optimization is dead obvious for any assembly
> coder who would, almost without thought about it, do the
> promotion out of the loop (at some point -- not necessarily
> as the first step in getting things working right.) If C
> compilers cannot, in this day and age now, be trusted to do
> such trivia then it leaves open many other questions.

Compilers for x86 have a thankless task: there is no way that they will be able to, in general, predict execution of a program on the hardware so will need to take a "balanced view". Why expect Microsoft compilers to do a bang-up job? That's not what Microsoft does and it was Intel that got frustrated at MS for not improving their compilers.

This isn't a QoI question, either. There are a whole lot of ways that the internals of a compiler works, a whole lot of issues with ABI specifications which get forced on compilers, and also what a programmer should expect of a compiler.

> I will be looking over other compilers on this. I have a LOT
> of old tools lying about. Metaware, for example, from the mid
> 1980's is sitting on 5 1/4" floppies. Fran DeRemer and Tom
> Pennello, despite some excessive Christian evangelism in
> their technical documents, did take a different and
> interesting direction in their compilers and I'm interested
> to see how they behave here.

Good luck. No such thing as "signed" in classic K&R C.

> I'm also still very interested in Paul's comments about
> register coloring and I need to think (and refresh myself
> about the algorithms again) about this more.

http://en.wikipedia.org/wiki/Rematerialization

> I just picked up
> the newer Aho book and haven't had a chance to read it, yet.

Yeah, don't bother. If you are burning personal time, this is better (but check the errata!)

http://www.amazon.com/Advanced-Compiler-Design-Implementation-Muchnick/dp/1558603204/

> So this might be a motivation to rummage through that, as
> well.

Like I said, don't bother. It's a bit of a yawn.

> Bottom line is -- this argues unfavorably (admittedly only
> one data point and there remains many many counter arguments
> as well) to those who say that there is no longer ANY reason
> for assembly coding because of the supposed quality of modern
> C compilers. If they can't handle this.... well.

Compilers are tools: programmers must practice their craft.

-- Paul.

2 345 6 7 Next

Warning in IAR when performing bitwise not on unsigned char (corrected)

Beginning Microcontrollers with the MSP430

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group