C18 Compiler again| page 2

Reply by Thad Smith ●June 7, 20102010-06-07

David Brown wrote:
> D Yuniskis wrote:
>> Hi Meindert,
>>
>> Meindert Sprang wrote:
>>> Unbelievable.....
>>>
>>> I'm playing around with the Microchip C18 compiler after a 
>>> hair-splitting
>>> experience with CCS. Apparently the optimizer of C18 is not that 
>>> good. For
>>> instance:  LATF = addr >> 16; where addr is an uint32, is compiled 
>>> into a
>>> loop where 4 registers really get shifted 16 times in a loop. Any decent
>>> compiler should recognise that a shift by 16, stored to an 8 bit port 
>>> could
>>> easily be done by simply accessing the 3rd byte.... sheesh....
>>
>> Is LATF *defined* as a uint8_t?  (i.e., does the compiler *know* it
>> can discard all but the lowest 8 bits?)
>>
> 
> That's irrelevant (or should be!) - expressions are evaluated in their 
> own right, and /then/ cast to the type of the LHS.  The compiler should, 
> as it does, initially treat it as a 32-bit shift, but it's a poor 
> compiler that can't optimise a 32-bit shift by 16 to something better 
> than this.  Optimising it to a single byte transfer comes logically at a 
> later stage.

And the later stage optimally comes before generating final code.
It is logical that a good optimizer transform the statement to single byte move.

-- 
Thad

Reply by George Neuner ●June 7, 20102010-06-07

On Mon, 7 Jun 2010 20:18:35 +0000 (UTC), Grant Edwards
<invalid@invalid.invalid> wrote:

>On 2010-06-07, George Neuner <gneuner2@comcast.net> wrote:
>
>> I've been programming since 1977 and I have never seen any compiler
>> turn a long word shift (and/or mask) into a corresponding short word
>> or byte access.  Every compiler I have ever worked with would perform
>> the shift.
>
>Really?
>
>I've seen quite a few compilers do that.  For example, gcc for ARM
>does:

Interesting.  But now that I think about it, I almost use shift with a
constant count - it's almost always a computed shift - and even when
the shift is constant, the value is often in a variable anyway due to
surrounding processing.

- What version of GCC is it?
- What does it do if the shift count is a variable?  
- What does it do for ((ul & 0xFFFFFF) >> 8) or ((ul >> 8) & 0xFFFF)?

If it recognizes the last as wanting just the middle word then that
would be impressive.

George

Reply by Grant Edwards ●June 7, 20102010-06-07

On 2010-06-08, George Neuner <gneuner2@comcast.net> wrote:
> On Mon, 7 Jun 2010 20:18:35 +0000 (UTC), Grant Edwards
><invalid@invalid.invalid> wrote:
>
>>On 2010-06-07, George Neuner <gneuner2@comcast.net> wrote:
>>
>>> I've been programming since 1977 and I have never seen any compiler
>>> turn a long word shift (and/or mask) into a corresponding short word
>>> or byte access.  Every compiler I have ever worked with would perform
>>> the shift.
>>
>>Really?
>>
>>I've seen quite a few compilers do that.  For example, gcc for ARM
>>does:
>
> Interesting.  But now that I think about it, I almost use shift with a
> constant count - it's almost always a computed shift - and even when
> the shift is constant, the value is often in a variable anyway due to
> surrounding processing.
>
> - What version of GCC is it?

4.4.3

> - What does it do if the shift count is a variable?

It uses a shift instruction.  There's not really anyting else it could
do with a variable shift count.

> - What does it do for ((ul & 0xFFFFFF) >> 8)

        ldr     r0, [r3, #0]
        mov     r0, r0, asl #8
        mov     r0, r0, lsr #16
        
> or ((ul >> 8) & 0xFFFF)?

        ldr     r0, [r3, #0]
        mov     r0, r0, asl #8
        mov     r0, r0, lsr #16

> If it recognizes the last as wanting just the middle word then that
> would be impressive.

Recognizing the last two as wanting just the middle word is moot because
that 16-bit word is misaligned and can't be accessed using a 16-bit load
instruction.

-- 
Grant

Reply by George Neuner ●June 8, 20102010-06-08

On Mon, 7 Jun 2010 20:18:35 +0000 (UTC), Grant Edwards
<invalid@invalid.invalid> wrote:

>On 2010-06-07, George Neuner <gneuner2@comcast.net> wrote:
>
>> I've been programming since 1977 and I have never seen any compiler
>> turn a long word shift (and/or mask) into a corresponding short word
>> or byte access.  Every compiler I have ever worked with would perform
>> the shift.
>
>Really?
>
>I've seen quite a few compilers do that.  For example, gcc for ARM
>does:
>
>------------------------------testit.c------------------------------
>unsigned long ul;
>
>unsigned char foo(void)
>{
>  return ul>>8;
>}
>
>unsigned short bar(void)
>{
>  return ul>>16;
>}
>------------------------------testit.c------------------------------
>
>$ /home/nextgen/toolchain/bin/arm-linux-gcc -c -Os -S -fomit-frame-pointer testit.c                   
>
>------------------------------testit.s------------------------------
>        .arch armv5te
>[...]
>        .file   "testit.c"
>        .text
>        .align  2
>        .global foo
>        .type   foo, %function
>foo:
>        ldr     r3, .L3
>        ldrb    r0, [r3, #1]    @ zero_extendqisi2
>        bx      lr
>.L4:
>        .align  2
>.L3:
>        .word   ul
>        .size   foo, .-foo
>        .align  2
>        .global bar
>        .type   bar, %function
>bar:
>        ldr     r3, .L7
>        ldrh    r0, [r3, #2]
>        bx      lr
>.L8:
>        .align  2
>.L7:
>        .word   ul
>        .size   bar, .-bar
>        .comm   ul,4,4
>[...]
>------------------------------testit.s------------------------------


GCC 4.4.0 on x86 with the same flags gives: 

------------------------------testit.s------------------------------
	.file	"testit.c"
	.text
.globl _foo
	.def	_foo;	.scl	2;	.type	32;	.endef
_foo:
	movl	_ul, %eax
	shrl	$8, %eax
	ret
.globl _bar
	.def	_bar;	.scl	2;	.type	32;	.endef
_bar:
	movzwl	_ul+2, %eax
	ret
	.comm	_ul, 16	 # 4
------------------------------testit.s------------------------------

It optimized the half shift but not the quarter shift.

George

Reply by John Temples ●June 8, 20102010-06-08

On 2010-06-07, Meindert Sprang <ms@NOJUNKcustomORSPAMware.nl> wrote:
> Apparently the optimizer of C18 is not that good. For
> instance:  LATF = addr >> 16; where addr is an uint32, is compiled into a
> loop where 4 registers really get shifted 16 times in a loop.

Here's what Hi-Tech's PIC18 compiler does:

   853                           ;t.c: 59: LATF = addr >> 16;
   854  00FFFA  C0FE  FF8E          movff   _addr+2,3982    ;volatile

-- 
John W. Temples, III

Reply by David Brown ●June 8, 20102010-06-08

On 08/06/2010 04:34, Thad Smith wrote:
> David Brown wrote:
>> D Yuniskis wrote:
>>> Hi Meindert,
>>>
>>> Meindert Sprang wrote:
>>>> Unbelievable.....
>>>>
>>>> I'm playing around with the Microchip C18 compiler after a
>>>> hair-splitting
>>>> experience with CCS. Apparently the optimizer of C18 is not that
>>>> good. For
>>>> instance: LATF = addr >> 16; where addr is an uint32, is compiled
>>>> into a
>>>> loop where 4 registers really get shifted 16 times in a loop. Any
>>>> decent
>>>> compiler should recognise that a shift by 16, stored to an 8 bit
>>>> port could
>>>> easily be done by simply accessing the 3rd byte.... sheesh....
>>>
>>> Is LATF *defined* as a uint8_t? (i.e., does the compiler *know* it
>>> can discard all but the lowest 8 bits?)
>>>
>>
>> That's irrelevant (or should be!) - expressions are evaluated in their
>> own right, and /then/ cast to the type of the LHS. The compiler
>> should, as it does, initially treat it as a 32-bit shift, but it's a
>> poor compiler that can't optimise a 32-bit shift by 16 to something
>> better than this. Optimising it to a single byte transfer comes
>> logically at a later stage.
>
> And the later stage optimally comes before generating final code.

Yes, I meant a later logical stage within the compiler.  Note that it 
may be an /actual/ later stage (such as a peephole optimisation), or 
combined with earlier optimisations.  It comes later logically, but the 
actual order is implementation dependent.

> It is logical that a good optimizer transform the statement to single
> byte move.
>

Reply by David Brown ●June 8, 20102010-06-08

On 08/06/2010 04:47, Grant Edwards wrote:
> On 2010-06-08, George Neuner<gneuner2@comcast.net>  wrote:
>> On Mon, 7 Jun 2010 20:18:35 +0000 (UTC), Grant Edwards
>> <invalid@invalid.invalid>  wrote:
>>
>>> On 2010-06-07, George Neuner<gneuner2@comcast.net>  wrote:
>>>
>>>> I've been programming since 1977 and I have never seen any compiler
>>>> turn a long word shift (and/or mask) into a corresponding short word
>>>> or byte access.  Every compiler I have ever worked with would perform
>>>> the shift.
>>>
>>> Really?
>>>
>>> I've seen quite a few compilers do that.  For example, gcc for ARM
>>> does:
>>

Some compilers will use shifts, some will use byte or word movements.

On the ARM, a compiler will often use shifts because shifts (especially 
by constants) are very cheap on the ARM architecture, while unaligned 
and non-32-bit memory accesses may be expensive or illegal (depending on 
the ARM variant).

A quick test with avr-gcc shows that it uses byte register movements 
rather than shifts, although it's not optimal for 32-bit values (it is 
fine for 16-bit values, which are much more common in an 8-bit world). 
For your example below of "((ul&  0xFFFFFF)>>  8)" it is close to perfect.

>> Interesting.  But now that I think about it, I almost use shift with a
>> constant count - it's almost always a computed shift - and even when
>> the shift is constant, the value is often in a variable anyway due to
>> surrounding processing.
>>
>> - What version of GCC is it?
>
> 4.4.3
>
>> - What does it do if the shift count is a variable?
>
> It uses a shift instruction.  There's not really anyting else it could
> do with a variable shift count.
>
>> - What does it do for ((ul&  0xFFFFFF)>>  8)
>
>          ldr     r0, [r3, #0]
>          mov     r0, r0, asl #8
>          mov     r0, r0, lsr #16
>
>> or ((ul>>  8)&  0xFFFF)?
>
>          ldr     r0, [r3, #0]
>          mov     r0, r0, asl #8
>          mov     r0, r0, lsr #16
>
>> If it recognizes the last as wanting just the middle word then that
>> would be impressive.
>
> Recognizing the last two as wanting just the middle word is moot because
> that 16-bit word is misaligned and can't be accessed using a 16-bit load
> instruction.
>

That's very nice code generation - faster (on an ARM anyway) than using 
masking.

Reply by Meindert Sprang ●June 8, 20102010-06-08

"D Yuniskis" <not.going.to.be@seen.com> wrote in message
news:hujf2k$5et$1@speranza.aioe.org...
> Hi Meindert,
> Is LATF *defined* as a uint8_t?  (i.e., does the compiler *know* it
> can discard all but the lowest 8 bits?)

Yes.

> Is uuint32_t *really* unsigned (and not a cheap hack to "long int")?

Yes.

> I.e., can the compiler be confused (by the definition) to thinking
> it is signed and opting for a sign-preserving shift?

Both types are explicitly typed as unsigned. That is as far as my influence
goes. Even the crappy toy compiler of CCS does this right. My Imagecraft AVR
compiler does it right. I even remember that my old Franklin/Keil C51
compiler does it right.

Meindert

Reply by Meindert Sprang ●June 8, 20102010-06-08

"George Neuner" <gneuner2@comcast.net> wrote in message
news:a5hq06549580ro9ufrk65v1icujokkomho@4ax.com...
> On Mon, 7 Jun 2010 11:17:34 +0200, "Meindert Sprang"
> You're asking a lot.

 I beg to differ...

> I've been programming since 1977 and I have never seen any compiler
> turn a long word shift (and/or mask) into a corresponding short word
> or byte access.  Every compiler I have ever worked with would perform
> the shift.

Well, my experience with embedded cross compilers is different, see my other
post. And I think it is fair to demand such a thing since embedded compilers
are supposed to be tight on hardware resources.

My AVR compiler for instance does a real load-OR-store operation when more
than one bit is set in the constant but a nice single SBI instruction when
only one bit needs to be set. This keeps the C code ANSI compliant and this
makes optimal use of processor resources. And that is IMO how an embedded
cross compiler should work.

Meindert

Reply by Meindert Sprang ●June 8, 20102010-06-08

"D Yuniskis" <not.going.to.be@seen.com> wrote in message
news:hujj28$bn5$1@speranza.aioe.org...
> It would be informative to know what sort of "helper routines"
> the compiler calls on.  E.g., it might (inelegantly) treat this
> as "CALL SHIFT_LONG_RIGHT, repeat" -- in which case the
> 4 temp access is the canned representation of *any* "long int".

This is the code that does the shift:

 0FCC8    0E10     MOVLW 0x10
 0FCCA    90D8     BCF 0xfd8, 0, ACCESS
 0FCCC    3203     RRCF 0x3, F, ACCESS
 0FCCE    3202     RRCF 0x2, F, ACCESS
 0FCD0    3201     RRCF 0x1, F, ACCESS
 0FCD2    3200     RRCF 0, F, ACCESS
 0FCD4    06E8     DECF 0xfe8, F, ACCESS
 0FCD6    E1F9     BNZ 0xfcca

The loop is executed 16 times (>>16) and 4 locations are shifted through the
carry bit, if I undestand this correctly.... yuck!

Meindert

Previous 123 4 5 6 Next

C18 Compiler again

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group