EmbeddedRelated.com
Forums

C18 Compiler again

Started by Meindert Sprang June 7, 2010
Unbelievable.....

I'm playing around with the Microchip C18 compiler after a hair-splitting
experience with CCS. Apparently the optimizer of C18 is not that good. For
instance:  LATF = addr >> 16; where addr is an uint32, is compiled into a
loop where 4 registers really get shifted 16 times in a loop. Any decent
compiler should recognise that a shift by 16, stored to an 8 bit port could
easily be done by simply accessing the 3rd byte.... sheesh....

Meindert


Hi Meindert,

Meindert Sprang wrote:
> Unbelievable..... > > I'm playing around with the Microchip C18 compiler after a hair-splitting > experience with CCS. Apparently the optimizer of C18 is not that good. For > instance: LATF = addr >> 16; where addr is an uint32, is compiled into a > loop where 4 registers really get shifted 16 times in a loop. Any decent > compiler should recognise that a shift by 16, stored to an 8 bit port could > easily be done by simply accessing the 3rd byte.... sheesh....
Is LATF *defined* as a uint8_t? (i.e., does the compiler *know* it can discard all but the lowest 8 bits?) Is uuint32_t *really* unsigned (and not a cheap hack to "long int")? I.e., can the compiler be confused (by the definition) to thinking it is signed and opting for a sign-preserving shift? How about: uint8_t pointer; pointer = (uint8_t *) &addr; LATF = pointer[2]; Clumsy, admittedly, but perhaps more obvious what's going on? (I would have added that this would be easy for an optimizer to reduce to an "addressing operation" but I also would have expected your shift to be recognized as an easy optimization!)
D Yuniskis wrote:
> Hi Meindert, > > Meindert Sprang wrote: >> Unbelievable..... >> >> I'm playing around with the Microchip C18 compiler after a hair-splitting >> experience with CCS. Apparently the optimizer of C18 is not that good. >> For >> instance: LATF = addr >> 16; where addr is an uint32, is compiled into a >> loop where 4 registers really get shifted 16 times in a loop. Any decent >> compiler should recognise that a shift by 16, stored to an 8 bit port >> could >> easily be done by simply accessing the 3rd byte.... sheesh.... > > Is LATF *defined* as a uint8_t? (i.e., does the compiler *know* it > can discard all but the lowest 8 bits?) > > Is uuint32_t *really* unsigned (and not a cheap hack to "long int")? > I.e., can the compiler be confused (by the definition) to thinking > it is signed and opting for a sign-preserving shift? > > How about: > > uint8_t pointer;
uint8_t *pointer; (sorry, too early in the morning to be writing code :> )
> pointer = (uint8_t *) &addr; > LATF = pointer[2]; > > Clumsy, admittedly, but perhaps more obvious what's going on? > (I would have added that this would be easy for an optimizer > to reduce to an "addressing operation" but I also would have > expected your shift to be recognized as an easy optimization!)
On Mon, 07 Jun 2010 11:17:34 +0200, Meindert Sprang wrote:

> Unbelievable..... > > I'm playing around with the Microchip C18 compiler after a > hair-splitting experience with CCS. Apparently the optimizer of C18 is > not that good. For instance: LATF = addr >> 16; where addr is an > uint32, is compiled into a loop where 4 registers really get shifted 16 > times in a loop. Any decent compiler should recognise that a shift by > 16, stored to an 8 bit port could easily be done by simply accessing the > 3rd byte.... sheesh.... > > Meindert
From the Microchip supplied USB code POINTER addr; LATF = addr.bHigh; //simple and to the point If addr is static this will probably compile to a simple movff. If addr is on the stack it gets a little more complicated. #ifndef TYPEDEFS_H #define TYPEDEFS_H typedef unsigned char byte; // 8-bit typedef unsigned int word; // 16-bit typedef unsigned long dword; // 32-bit typedef union _BYTE { byte _byte; struct { unsigned b0:1; unsigned b1:1; unsigned b2:1; unsigned b3:1; unsigned b4:1; unsigned b5:1; unsigned b6:1; unsigned b7:1; }; } BYTE; typedef union _WORD { word _word; struct { byte byte0; byte byte1; }; struct { BYTE Byte0; BYTE Byte1; }; struct { BYTE LowB; BYTE HighB; }; struct { byte v[2]; }; } WORD; #define LSB(a) ((a).v[0]) #define MSB(a) ((a).v[1]) typedef union _DWORD { dword _dword; struct { byte byte0; byte byte1; byte byte2; byte byte3; }; struct { word word0; word word1; }; struct { BYTE Byte0; BYTE Byte1; BYTE Byte2; BYTE Byte3; }; struct { WORD Word0; WORD Word1; }; struct { byte v[4]; }; } DWORD; #define LOWER_LSB(a) ((a).v[0]) #define LOWER_MSB(a) ((a).v[1]) #define UPPER_LSB(a) ((a).v[2]) #define UPPER_MSB(a) ((a).v[3]) typedef void(*pFunc)(void); typedef union _POINTER { struct { byte bLow; byte bHigh; }; word _word; // bLow & bHigh byte* bRam; // Ram byte pointer: 2 bytes pointer pointing // to 1 byte of data word* wRam; // Ram word poitner: 2 bytes poitner pointing // to 2 bytes of data rom byte* bRom; // Size depends on compiler setting rom word* wRom; } POINTER; typedef enum _BOOL { FALSE = 0, TRUE } BOOL; #define OK TRUE #define FAIL FALSE #endif //TYPEDEFS_H -- Joe Chisolm Marble Falls, Tx.
On Mon, 7 Jun 2010 11:17:34 +0200, "Meindert Sprang"
<ms@NOJUNKcustomORSPAMware.nl> wrote:

>Unbelievable..... > >I'm playing around with the Microchip C18 compiler after a hair-splitting >experience with CCS. Apparently the optimizer of C18 is not that good. For >instance: LATF = addr >> 16; where addr is an uint32, is compiled into a >loop where 4 registers really get shifted 16 times in a loop. Any decent >compiler should recognise that a shift by 16, stored to an 8 bit port could >easily be done by simply accessing the 3rd byte.... sheesh.... > >Meindert
You're asking a lot. I've been programming since 1977 and I have never seen any compiler turn a long word shift (and/or mask) into a corresponding short word or byte access. Every compiler I have ever worked with would perform the shift. That said, something is wrong if it takes 4 registers. I don't know the PIC18, but I never encountered any chip that required more than 2 registers to shift a value. Many chips have only a 1-bit shifter and require a loop to do larger shifts - but many such chips microcode the shift loop so the programmer sees only a simple instruction. But, occasionally, you do run into oddballs that need large shifts spelled out. Most likely you're somehow reading the (dis)assembly incorrectly: 4 temporaries that are really mapped into the same register. If the compiler (or chip) really does need 4 registers to do a shift, then it's a piece of sh*t. George
On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote:

> On Mon, 7 Jun 2010 11:17:34 +0200, "Meindert Sprang" > <ms@NOJUNKcustomORSPAMware.nl> wrote: > >>Unbelievable..... >> >>I'm playing around with the Microchip C18 compiler after a >>hair-splitting experience with CCS. Apparently the optimizer of C18 is >>not that good. For instance: LATF = addr >> 16; where addr is an >>uint32, is compiled into a loop where 4 registers really get shifted 16 >>times in a loop. Any decent compiler should recognise that a shift by >>16, stored to an 8 bit port could easily be done by simply accessing the >>3rd byte.... sheesh.... >> >>Meindert > > You're asking a lot. > > I've been programming since 1977 and I have never seen any compiler turn > a long word shift (and/or mask) into a corresponding short word or byte > access. Every compiler I have ever worked with would perform the shift. > > That said, something is wrong if it takes 4 registers. I don't know the > PIC18, but I never encountered any chip that required more than 2 > registers to shift a value. Many chips have only a 1-bit shifter and > require a loop to do larger shifts - but many such chips microcode the > shift loop so the programmer sees only a simple instruction. But, > occasionally, you do run into oddballs that need large shifts spelled > out. > > Most likely you're somehow reading the (dis)assembly incorrectly: 4 > temporaries that are really mapped into the same register. If the > compiler (or chip) really does need 4 registers to do a shift, then it's > a piece of sh*t. > > George
You have a 8 bit architecture shifting a 32 bit value, shifting out of one byte and into the next, thus 4 temps. You have 1 bit shifts. I suspect the compiler is generating a right shift into carry so the code can tell if a 1 needs to be moved into the most significant bit of the next byte. -- Joe Chisolm Marble Falls, Tx.
Hi Joe,

Joe Chisolm wrote:
> On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote: > >> You're asking a lot. >> >> I've been programming since 1977 and I have never seen any compiler turn >> a long word shift (and/or mask) into a corresponding short word or byte >> access. Every compiler I have ever worked with would perform the shift. >> >> That said, something is wrong if it takes 4 registers. I don't know the >> PIC18, but I never encountered any chip that required more than 2 >> registers to shift a value. Many chips have only a 1-bit shifter and >> require a loop to do larger shifts - but many such chips microcode the >> shift loop so the programmer sees only a simple instruction. But, >> occasionally, you do run into oddballs that need large shifts spelled >> out. >> >> Most likely you're somehow reading the (dis)assembly incorrectly: 4 >> temporaries that are really mapped into the same register. If the >> compiler (or chip) really does need 4 registers to do a shift, then it's >> a piece of sh*t.
It would be informative to know what sort of "helper routines" the compiler calls on. E.g., it might (inelegantly) treat this as "CALL SHIFT_LONG_RIGHT, repeat" -- in which case the 4 temp access is the canned representation of *any* "long int".
> You have a 8 bit architecture shifting a 32 bit value, shifting out of one > byte and into the next, thus 4 temps. You have 1 bit shifts. I suspect > the compiler is generating a right shift into carry so the code can > tell if a 1 needs to be moved into the most significant bit of the next > byte.
I think George is commenting that a *smart* compiler can realize that an (e.g.) 8 bit shift is: foo[2] = foo[3] foo[1] = foo[2] foo[0] = foo[1] (if you are casting to a narrower data type and can discard foo[3]) and a *9* bit shift is the same as the above with a *single* bit shift introduced (i.e., you operate on a byte at a time instead of the entire "long") (recall, the shift amount is a constant available at compile time)
On 2010-06-07, George Neuner <gneuner2@comcast.net> wrote:

> I've been programming since 1977 and I have never seen any compiler > turn a long word shift (and/or mask) into a corresponding short word > or byte access. Every compiler I have ever worked with would perform > the shift.
Really? I've seen quite a few compilers do that. For example, gcc for ARM does: ------------------------------testit.c------------------------------ unsigned long ul; unsigned char foo(void) { return ul>>8; } unsigned short bar(void) { return ul>>16; } ------------------------------testit.c------------------------------ $ /home/nextgen/toolchain/bin/arm-linux-gcc -c -Os -S -fomit-frame-pointer testit.c ------------------------------testit.s------------------------------ .arch armv5te [...] .file "testit.c" .text .align 2 .global foo .type foo, %function foo: ldr r3, .L3 ldrb r0, [r3, #1] @ zero_extendqisi2 bx lr .L4: .align 2 .L3: .word ul .size foo, .-foo .align 2 .global bar .type bar, %function bar: ldr r3, .L7 ldrh r0, [r3, #2] bx lr .L8: .align 2 .L7: .word ul .size bar, .-bar .comm ul,4,4 [...] ------------------------------testit.s------------------------------ -- Grant Edwards grant.b.edwards Yow! I'm young ... I'm at HEALTHY ... I can HIKE gmail.com THRU CAPT GROGAN'S LUMBAR REGIONS!
D Yuniskis wrote:
> Hi Meindert, > > Meindert Sprang wrote: >> Unbelievable..... >> >> I'm playing around with the Microchip C18 compiler after a hair-splitting >> experience with CCS. Apparently the optimizer of C18 is not that good. >> For >> instance: LATF = addr >> 16; where addr is an uint32, is compiled into a >> loop where 4 registers really get shifted 16 times in a loop. Any decent >> compiler should recognise that a shift by 16, stored to an 8 bit port >> could >> easily be done by simply accessing the 3rd byte.... sheesh.... > > Is LATF *defined* as a uint8_t? (i.e., does the compiler *know* it > can discard all but the lowest 8 bits?) >
That's irrelevant (or should be!) - expressions are evaluated in their own right, and /then/ cast to the type of the LHS. The compiler should, as it does, initially treat it as a 32-bit shift, but it's a poor compiler that can't optimise a 32-bit shift by 16 to something better than this. Optimising it to a single byte transfer comes logically at a later stage.
> Is uuint32_t *really* unsigned (and not a cheap hack to "long int")? > I.e., can the compiler be confused (by the definition) to thinking > it is signed and opting for a sign-preserving shift? >
I believe that uint32_t /must/ be an unsigned 32-bit integer. If the compiler cannot work with such a type, then no such type should exist in <stdint.h>. A standards-compliant compiler is not allowed to cheat in that way. Of course, I don't know if Microchip's compiler claims to be standards compliant...
> How about: > > uint8_t pointer; > > pointer = (uint8_t *) &addr; > LATF = pointer[2]; > > Clumsy, admittedly, but perhaps more obvious what's going on? > (I would have added that this would be easy for an optimizer > to reduce to an "addressing operation" but I also would have > expected your shift to be recognized as an easy optimization!)
On Mon, 07 Jun 2010 12:59:49 -0700, D Yuniskis wrote:

> Hi Joe, > > Joe Chisolm wrote: >> On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote: >> >>> You're asking a lot. >>> >>> I've been programming since 1977 and I have never seen any compiler >>> turn a long word shift (and/or mask) into a corresponding short word >>> or byte access. Every compiler I have ever worked with would perform >>> the shift. >>> >>> That said, something is wrong if it takes 4 registers. I don't know >>> the PIC18, but I never encountered any chip that required more than 2 >>> registers to shift a value. Many chips have only a 1-bit shifter and >>> require a loop to do larger shifts - but many such chips microcode the >>> shift loop so the programmer sees only a simple instruction. But, >>> occasionally, you do run into oddballs that need large shifts spelled >>> out. >>> >>> Most likely you're somehow reading the (dis)assembly incorrectly: 4 >>> temporaries that are really mapped into the same register. If the >>> compiler (or chip) really does need 4 registers to do a shift, then >>> it's a piece of sh*t. > > It would be informative to know what sort of "helper routines" the > compiler calls on. E.g., it might (inelegantly) treat this as "CALL > SHIFT_LONG_RIGHT, repeat" -- in which case the 4 temp access is the > canned representation of *any* "long int". >
I agree with your statement. The C18 suite has some canned libraries like 32 bit division and such. There are other helper routines for doing delays and such.
>> You have a 8 bit architecture shifting a 32 bit value, shifting out of >> one byte and into the next, thus 4 temps. You have 1 bit shifts. I >> suspect the compiler is generating a right shift into carry so the code >> can tell if a 1 needs to be moved into the most significant bit of the >> next byte. > > I think George is commenting that a *smart* compiler can realize that an > (e.g.) 8 bit shift is: foo[2] = foo[3] > foo[1] = foo[2] > foo[0] = foo[1] > (if you are casting to a narrower data type and can discard foo[3]) > > and a *9* bit shift is the same as the above with a *single* bit shift > introduced (i.e., you operate on a byte at a time instead of the entire > "long") > > (recall, the shift amount is a constant available at compile time)
I just did a test using C18. I choose a 18F86J10 (for no particular reason other than I remember it has a port F and thus a LATF) For: static unsigned long addr; LATF = addr >> 16; I get results similar to what you have above. The compiler "shifts" addr into a 32 bit temp by doing two byte moves and two clear byte instructions. It then does a 1 byte move into LATF from the temp. I'm not sure what version the OP is using or what else might be going on behind the scenes with addr. I agree a compiler should be smarter but for the price (free) C18 is not bad for smaller projects. BTW: I did a quick test with gcc 4.4.1 and it does a load, shift 16 and a store byte. -- Joe Chisolm Marble Falls, Tx.