C18 Compiler again

Unbelievable.....

I'm playing around with the Microchip C18 compiler after a hair-splitting
experience with CCS. Apparently the optimizer of C18 is not that good. For
instance:  LATF = addr >> 16; where addr is an uint32, is compiled into a
loop where 4 registers really get shifted 16 times in a loop. Any decent
compiler should recognise that a shift by 16, stored to an 8 bit port could
easily be done by simply accessing the 3rd byte.... sheesh....

Meindert

Reply by D Yuniskis ●June 7, 20102010-06-07

Hi Meindert,

Meindert Sprang wrote:
> Unbelievable.....
> 
> I'm playing around with the Microchip C18 compiler after a hair-splitting
> experience with CCS. Apparently the optimizer of C18 is not that good. For
> instance:  LATF = addr >> 16; where addr is an uint32, is compiled into a
> loop where 4 registers really get shifted 16 times in a loop. Any decent
> compiler should recognise that a shift by 16, stored to an 8 bit port could
> easily be done by simply accessing the 3rd byte.... sheesh....

Is LATF *defined* as a uint8_t?  (i.e., does the compiler *know* it
can discard all but the lowest 8 bits?)

Is uuint32_t *really* unsigned (and not a cheap hack to "long int")?
I.e., can the compiler be confused (by the definition) to thinking
it is signed and opting for a sign-preserving shift?

How about:

uint8_t pointer;

pointer = (uint8_t *) &addr;
LATF = pointer[2];

Clumsy, admittedly, but perhaps more obvious what's going on?
(I would have added that this would be easy for an optimizer
to reduce to an "addressing operation" but I also would have
expected your shift to be recognized as an easy optimization!)

Reply by D Yuniskis ●June 7, 20102010-06-07

D Yuniskis wrote:
> Hi Meindert,
> 
> Meindert Sprang wrote:
>> Unbelievable.....
>>
>> I'm playing around with the Microchip C18 compiler after a hair-splitting
>> experience with CCS. Apparently the optimizer of C18 is not that good. 
>> For
>> instance:  LATF = addr >> 16; where addr is an uint32, is compiled into a
>> loop where 4 registers really get shifted 16 times in a loop. Any decent
>> compiler should recognise that a shift by 16, stored to an 8 bit port 
>> could
>> easily be done by simply accessing the 3rd byte.... sheesh....
> 
> Is LATF *defined* as a uint8_t?  (i.e., does the compiler *know* it
> can discard all but the lowest 8 bits?)
> 
> Is uuint32_t *really* unsigned (and not a cheap hack to "long int")?
> I.e., can the compiler be confused (by the definition) to thinking
> it is signed and opting for a sign-preserving shift?
> 
> How about:
> 
> uint8_t pointer;

uint8_t *pointer;

(sorry, too early in the morning to be writing code  :> )

> pointer = (uint8_t *) &addr;
> LATF = pointer[2];
> 
> Clumsy, admittedly, but perhaps more obvious what's going on?
> (I would have added that this would be easy for an optimizer
> to reduce to an "addressing operation" but I also would have
> expected your shift to be recognized as an easy optimization!)

Reply by Joe Chisolm ●June 7, 20102010-06-07

On Mon, 07 Jun 2010 11:17:34 +0200, Meindert Sprang wrote:

> Unbelievable.....
> 
> I'm playing around with the Microchip C18 compiler after a
> hair-splitting experience with CCS. Apparently the optimizer of C18 is
> not that good. For instance:  LATF = addr >> 16; where addr is an
> uint32, is compiled into a loop where 4 registers really get shifted 16
> times in a loop. Any decent compiler should recognise that a shift by
> 16, stored to an 8 bit port could easily be done by simply accessing the
> 3rd byte.... sheesh....
> 
> Meindert

From the Microchip supplied USB code

POINTER addr;

LATF = addr.bHigh;  //simple and to the point


If addr is static this will probably compile to a simple movff.  If addr
is on the stack it gets a little more complicated.


#ifndef TYPEDEFS_H
#define TYPEDEFS_H

typedef unsigned char   byte;           // 8-bit
typedef unsigned int    word;           // 16-bit
typedef unsigned long   dword;          // 32-bit

typedef union _BYTE
{
    byte _byte;
    struct
    {
        unsigned b0:1;
        unsigned b1:1;
        unsigned b2:1;
        unsigned b3:1;
        unsigned b4:1;
        unsigned b5:1;
        unsigned b6:1;
        unsigned b7:1;
    };
} BYTE;

typedef union _WORD
{
    word _word;
    struct
    {
        byte byte0;
        byte byte1;
    };
    struct
    {
        BYTE Byte0;
        BYTE Byte1;
    };
    struct
    {
        BYTE LowB;
        BYTE HighB;
    };
    struct
    {
        byte v[2];
    };
} WORD;
#define LSB(a)      ((a).v[0])
#define MSB(a)      ((a).v[1])

typedef union _DWORD
{
    dword _dword;
    struct
    {
        byte byte0;
        byte byte1;
        byte byte2;
        byte byte3;
    };
    struct
    {
        word word0;
        word word1;
    };
    struct
    {
        BYTE Byte0;
        BYTE Byte1;
        BYTE Byte2;
        BYTE Byte3;
    };
    struct
    {
        WORD Word0;
        WORD Word1;
    };
    struct
    {
        byte v[4];
    };
} DWORD;
#define LOWER_LSB(a)    ((a).v[0])
#define LOWER_MSB(a)    ((a).v[1])
#define UPPER_LSB(a)    ((a).v[2])
#define UPPER_MSB(a)    ((a).v[3])

typedef void(*pFunc)(void);

typedef union _POINTER
{
    struct
    {
        byte bLow;
        byte bHigh;
    };
    word _word;     // bLow & bHigh
    byte* bRam;     // Ram byte pointer: 2 bytes pointer pointing
                    // to 1 byte of data
    word* wRam;     // Ram word poitner: 2 bytes poitner pointing
                    // to 2 bytes of data

    rom byte* bRom;   // Size depends on compiler setting
    rom word* wRom;
} POINTER;

typedef enum _BOOL { FALSE = 0, TRUE } BOOL;

#define OK      TRUE
#define FAIL    FALSE

#endif //TYPEDEFS_H

-- 
Joe Chisolm
Marble Falls, Tx.

Reply by George Neuner ●June 7, 20102010-06-07

On Mon, 7 Jun 2010 11:17:34 +0200, "Meindert Sprang"
<ms@NOJUNKcustomORSPAMware.nl> wrote:

>Unbelievable.....
>
>I'm playing around with the Microchip C18 compiler after a hair-splitting
>experience with CCS. Apparently the optimizer of C18 is not that good. For
>instance:  LATF = addr >> 16; where addr is an uint32, is compiled into a
>loop where 4 registers really get shifted 16 times in a loop. Any decent
>compiler should recognise that a shift by 16, stored to an 8 bit port could
>easily be done by simply accessing the 3rd byte.... sheesh....
>
>Meindert

You're asking a lot.

I've been programming since 1977 and I have never seen any compiler
turn a long word shift (and/or mask) into a corresponding short word
or byte access.  Every compiler I have ever worked with would perform
the shift.

That said, something is wrong if it takes 4 registers.  I don't know
the PIC18, but I never encountered any chip that required more than 2
registers to shift a value.  Many chips have only a 1-bit shifter and
require a loop to do larger shifts - but many such chips microcode the
shift loop so the programmer sees only a simple instruction.  But,
occasionally, you do run into oddballs that need large shifts spelled
out.

Most likely you're somehow reading the (dis)assembly incorrectly: 4
temporaries that are really mapped into the same register.  If the
compiler (or chip) really does need 4 registers to do a shift, then
it's a piece of sh*t.

George

Reply by Joe Chisolm ●June 7, 20102010-06-07

On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote:

> On Mon, 7 Jun 2010 11:17:34 +0200, "Meindert Sprang"
> <ms@NOJUNKcustomORSPAMware.nl> wrote:
> 
>>Unbelievable.....
>>
>>I'm playing around with the Microchip C18 compiler after a
>>hair-splitting experience with CCS. Apparently the optimizer of C18 is
>>not that good. For instance:  LATF = addr >> 16; where addr is an
>>uint32, is compiled into a loop where 4 registers really get shifted 16
>>times in a loop. Any decent compiler should recognise that a shift by
>>16, stored to an 8 bit port could easily be done by simply accessing the
>>3rd byte.... sheesh....
>>
>>Meindert
> 
> You're asking a lot.
> 
> I've been programming since 1977 and I have never seen any compiler turn
> a long word shift (and/or mask) into a corresponding short word or byte
> access.  Every compiler I have ever worked with would perform the shift.
> 
> That said, something is wrong if it takes 4 registers.  I don't know the
> PIC18, but I never encountered any chip that required more than 2
> registers to shift a value.  Many chips have only a 1-bit shifter and
> require a loop to do larger shifts - but many such chips microcode the
> shift loop so the programmer sees only a simple instruction.  But,
> occasionally, you do run into oddballs that need large shifts spelled
> out.
> 
> Most likely you're somehow reading the (dis)assembly incorrectly: 4
> temporaries that are really mapped into the same register.  If the
> compiler (or chip) really does need 4 registers to do a shift, then it's
> a piece of sh*t.
> 
> George

You have a 8 bit architecture shifting a 32 bit value, shifting out of one
byte and into the next, thus 4 temps.  You have 1 bit shifts.  I suspect
the compiler is generating a right shift into carry so the code can
tell if a 1 needs to be moved into the most significant bit of the next
byte.

-- 
Joe Chisolm
Marble Falls, Tx.

Reply by D Yuniskis ●June 7, 20102010-06-07

Hi Joe,

Joe Chisolm wrote:
> On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote:
> 
>> You're asking a lot.
>>
>> I've been programming since 1977 and I have never seen any compiler turn
>> a long word shift (and/or mask) into a corresponding short word or byte
>> access.  Every compiler I have ever worked with would perform the shift.
>>
>> That said, something is wrong if it takes 4 registers.  I don't know the
>> PIC18, but I never encountered any chip that required more than 2
>> registers to shift a value.  Many chips have only a 1-bit shifter and
>> require a loop to do larger shifts - but many such chips microcode the
>> shift loop so the programmer sees only a simple instruction.  But,
>> occasionally, you do run into oddballs that need large shifts spelled
>> out.
>>
>> Most likely you're somehow reading the (dis)assembly incorrectly: 4
>> temporaries that are really mapped into the same register.  If the
>> compiler (or chip) really does need 4 registers to do a shift, then it's
>> a piece of sh*t.

It would be informative to know what sort of "helper routines"
the compiler calls on.  E.g., it might (inelegantly) treat this
as "CALL SHIFT_LONG_RIGHT, repeat" -- in which case the
4 temp access is the canned representation of *any* "long int".

> You have a 8 bit architecture shifting a 32 bit value, shifting out of one
> byte and into the next, thus 4 temps.  You have 1 bit shifts.  I suspect
> the compiler is generating a right shift into carry so the code can
> tell if a 1 needs to be moved into the most significant bit of the next
> byte.

I think George is commenting that a *smart* compiler can
realize that an (e.g.) 8 bit shift is:
foo[2] = foo[3]
foo[1] = foo[2]
foo[0] = foo[1]
(if you are casting to a narrower data type and can discard foo[3])

and a *9* bit shift is the same as the above with a *single*
bit shift introduced (i.e., you operate on a byte at a time
instead of the entire "long")

(recall, the shift amount is a constant available at
compile time)

Reply by Grant Edwards ●June 7, 20102010-06-07

On 2010-06-07, George Neuner <gneuner2@comcast.net> wrote:

> I've been programming since 1977 and I have never seen any compiler
> turn a long word shift (and/or mask) into a corresponding short word
> or byte access.  Every compiler I have ever worked with would perform
> the shift.

Really?

I've seen quite a few compilers do that.  For example, gcc for ARM
does:

------------------------------testit.c------------------------------
unsigned long ul;

unsigned char foo(void)
{
  return ul>>8;
}

unsigned short bar(void)
{
  return ul>>16;
}
------------------------------testit.c------------------------------

$ /home/nextgen/toolchain/bin/arm-linux-gcc -c -Os -S -fomit-frame-pointer testit.c                   

------------------------------testit.s------------------------------
        .arch armv5te
[...]
        .file   "testit.c"
        .text
        .align  2
        .global foo
        .type   foo, %function
foo:
        ldr     r3, .L3
        ldrb    r0, [r3, #1]    @ zero_extendqisi2
        bx      lr
.L4:
        .align  2
.L3:
        .word   ul
        .size   foo, .-foo
        .align  2
        .global bar
        .type   bar, %function
bar:
        ldr     r3, .L7
        ldrh    r0, [r3, #2]
        bx      lr
.L8:
        .align  2
.L7:
        .word   ul
        .size   bar, .-bar
        .comm   ul,4,4
[...]
------------------------------testit.s------------------------------


-- 
Grant Edwards               grant.b.edwards        Yow! I'm young ... I'm
                                  at               HEALTHY ... I can HIKE
                              gmail.com            THRU CAPT GROGAN'S LUMBAR
                                                   REGIONS!

Reply by David Brown ●June 7, 20102010-06-07

D Yuniskis wrote:
> Hi Meindert,
> 
> Meindert Sprang wrote:
>> Unbelievable.....
>>
>> I'm playing around with the Microchip C18 compiler after a hair-splitting
>> experience with CCS. Apparently the optimizer of C18 is not that good. 
>> For
>> instance:  LATF = addr >> 16; where addr is an uint32, is compiled into a
>> loop where 4 registers really get shifted 16 times in a loop. Any decent
>> compiler should recognise that a shift by 16, stored to an 8 bit port 
>> could
>> easily be done by simply accessing the 3rd byte.... sheesh....
> 
> Is LATF *defined* as a uint8_t?  (i.e., does the compiler *know* it
> can discard all but the lowest 8 bits?)
> 

That's irrelevant (or should be!) - expressions are evaluated in their 
own right, and /then/ cast to the type of the LHS.  The compiler should, 
as it does, initially treat it as a 32-bit shift, but it's a poor 
compiler that can't optimise a 32-bit shift by 16 to something better 
than this.  Optimising it to a single byte transfer comes logically at a 
later stage.

> Is uuint32_t *really* unsigned (and not a cheap hack to "long int")?
> I.e., can the compiler be confused (by the definition) to thinking
> it is signed and opting for a sign-preserving shift?
> 

I believe that uint32_t /must/ be an unsigned 32-bit integer.  If the 
compiler cannot work with such a type, then no such type should exist in 
<stdint.h>.  A standards-compliant compiler is not allowed to cheat in 
that way.  Of course, I don't know if Microchip's compiler claims to be 
standards compliant...

> How about:
> 
> uint8_t pointer;
> 
> pointer = (uint8_t *) &addr;
> LATF = pointer[2];
> 
> Clumsy, admittedly, but perhaps more obvious what's going on?
> (I would have added that this would be easy for an optimizer
> to reduce to an "addressing operation" but I also would have
> expected your shift to be recognized as an easy optimization!)

Reply by Joe Chisolm ●June 7, 20102010-06-07

On Mon, 07 Jun 2010 12:59:49 -0700, D Yuniskis wrote:

> Hi Joe,
> 
> Joe Chisolm wrote:
>> On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote:
>> 
>>> You're asking a lot.
>>>
>>> I've been programming since 1977 and I have never seen any compiler
>>> turn a long word shift (and/or mask) into a corresponding short word
>>> or byte access.  Every compiler I have ever worked with would perform
>>> the shift.
>>>
>>> That said, something is wrong if it takes 4 registers.  I don't know
>>> the PIC18, but I never encountered any chip that required more than 2
>>> registers to shift a value.  Many chips have only a 1-bit shifter and
>>> require a loop to do larger shifts - but many such chips microcode the
>>> shift loop so the programmer sees only a simple instruction.  But,
>>> occasionally, you do run into oddballs that need large shifts spelled
>>> out.
>>>
>>> Most likely you're somehow reading the (dis)assembly incorrectly: 4
>>> temporaries that are really mapped into the same register.  If the
>>> compiler (or chip) really does need 4 registers to do a shift, then
>>> it's a piece of sh*t.
> 
> It would be informative to know what sort of "helper routines" the
> compiler calls on.  E.g., it might (inelegantly) treat this as "CALL
> SHIFT_LONG_RIGHT, repeat" -- in which case the 4 temp access is the
> canned representation of *any* "long int".
> 

I agree with your statement. The C18 suite has some canned libraries like
32 bit division and such.  There are other helper routines for doing
delays and such.

>> You have a 8 bit architecture shifting a 32 bit value, shifting out of
>> one byte and into the next, thus 4 temps.  You have 1 bit shifts.  I
>> suspect the compiler is generating a right shift into carry so the code
>> can tell if a 1 needs to be moved into the most significant bit of the
>> next byte.
> 
> I think George is commenting that a *smart* compiler can realize that an
> (e.g.) 8 bit shift is: foo[2] = foo[3]
> foo[1] = foo[2]
> foo[0] = foo[1]
> (if you are casting to a narrower data type and can discard foo[3])
> 
> and a *9* bit shift is the same as the above with a *single* bit shift
> introduced (i.e., you operate on a byte at a time instead of the entire
> "long")
> 
> (recall, the shift amount is a constant available at compile time)

I just did a test using C18.  I choose a 18F86J10 (for no particular
reason other than I remember it has a port F and thus a LATF)

For:
static unsigned long addr;
LATF = addr >> 16;

I get results similar to what you have above.  The compiler "shifts"
addr into a 32 bit temp by doing two byte moves and two clear byte
instructions.  It then does a 1 byte move into LATF from the temp.
I'm not sure what version the OP is using or what else might be going
on behind the scenes with addr.  I agree a compiler should be smarter
but for the price (free) C18 is not bad for smaller projects.

BTW: I did a quick test with gcc 4.4.1 and it does a load, shift 16 and
a store byte.


-- 
Joe Chisolm
Marble Falls, Tx.

Previous12 3 4 5 6 Next

C18 Compiler again

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group