EmbeddedRelated.com
Forums

Warning in IAR when performing bitwise not on unsigned char (corrected)

Started by distantship101 April 3, 2012
So?

"And now for something completely different."

From Monty Python.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

> -----Original Message-----
> From: m... [mailto:m...] On Behalf Of
> Frances Fischer
> Sent: 13 April 2012 13:00
> To: m...
> Subject: RE: [msp430] Re: Warning in IAR when performing bitwise not on
> unsigned char (corrected)
>
>
>
> From TI:
>
>
>
> MSP430 does not provide a list of all the op-codes because there are many
> available addressing modes. However, a description is available for the
> individual bits that make up the various opcodes, depending on instruction
> and addressing mode.
>
> The MSP430xxxx Family User`s Guide(For Eg: MSP430x1xx Family User`s Guide
> ,MSP430x2xx Family User`s Guide) shows all the information available for
> the instruction set in the `RISC 16-Bit CPU` chapter. The `Addressing
> Modes` section explains the `As` and `Ad` bits. In the `Instruction Set`
> section you can see how the HEX representation of an instruction is built
> from the
> bits:
>
> 1. opcode
> 2. S-Reg (0b0000 = R0, 0b0001 = R1 ... 0b1111 = R15)
> 3. D-Reg (0b0000 = R0, 0b0001 = R1 ... 0b1111 = R15)
> 4. Ad
> 5. As
> 6. Byte or Word operation (B/W)
>
> The section `Instruction Set Description` contains the Core Instruction
> Map.
>
>
> The section `Instruction Cycles and Lengths` outlines the number of clock
> cycles used for the instructions. The number of CPU clock cycles required
> for an instruction depends on the instruction format and the addressing
> modes used - not the instruction itself. The number of clock cycles refers
> to the MCLK.
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Beginning Microcontrollers with the MSP430

Hi,

> > char is the smallest addressable unit; the plain char type, which is
> > either signed or unsigned, is different from both signed char and
> unsigned char.
> > The plain char type is selected, by the implementation, to be either
> > sign or unsigned to generate the best code (usually fastest/smallest).
> > Thus there is most certainly a signed char type and the valid
> > character '\377' could well be -1. This matters when you are collating
> strings.
>
> My point is that the /name/ "char" is inappropriate for numbers - it is
> illogical, and code written to store numbers or do arithmetic on a "char"
> does not make sense.

typedef char a_small_integer_t; // solve ills

> It (presumably) will work - but writing code that
> works is only half the battle of writing good quality software. K&R could
> just as well have called the type a "banana" - it would do the same job,
> and it would be equally silly to talk about "signed banana" as "signed
> char".

The world has come to know what a signed char is.

> Contrast it with Pascal, which defines a "Character" to hold characters,
> and "ShortInt" to hold 8-bit integers (Pascal is equally bad at specifying
> exact sizes, though you can improve it with subrange types).

True Pascal has never had the short integer type: it only has the integer
type and subrange types for integers, and REAL for floating point. Borland
extended Pascal with "useful stuff" such that it should not have been called
Pascal anymore and SHORTINT was something they invented. Ken Bowles and the
USCD crowd did similar stuff with long integers and the long integer unit.
Don't get me wrong, I like Pascal, I love Modula-2, and I adore Lisp, but
language design is an art, not a science.

> Both "Character" and "ShortInt" boil down to an 8-bit unit on most
> systems, or a bigger unit on systems that can't address bytes. But the
> names are distinct to indicate different usage, and help people write
> clear, logical and correct code.

A typedef would sort out naming.

> > > The appropriate choice of type for 8-bit arithmetic values is >
> > "uint8_t" or "int8_t" - these names say what you mean, so that the
> > code > you write is clearer and more logical.
> >
> > No. These are defined for STORAGE only: uint_t is stored in exactly
> > n bits in two's complement form without any padding. That type name
> > may not
>
> Unsigned data is not "two's complement", as it is unsigned. And to my
> knowledge, C standards don't actually require two's complement for signed
> data.

Slip of the finger with the u; however, if you have an int8_t, and you are
on a two's complement machine, these types require two's complement storage.
Section 7.20.1.1 of the 2011 standard. Other plain types can be stored
signed-magnitude.

[ snip ]

> Yes, all arithmetic on uint8_t and int8_t is done by first promoting the
> types to "int" (or, if necessary, unsigned int).
>
> That doesn't change the fact that it makes sense to write "uint8_t x =
> 100;" but it is meaningless to write "unsigned char x = 100;" - even
> though the generated code is identical.

char can store a character of the ASCII character set, as can an int. There
is no "character" data type in C, everything is an integer or unsigned or
longer.

> > All variants of char must meet minimum requirements according to the
> > standard and can be relied upon to be present. Therefore, char is more
> > portable than uint8_t and, by its definition, both signed and unsigned
> > char can be relied upon to store something between [-127, 127] (yes,
> > that is
> > correct) or [0, 255].
>
> As explained above, char is not portable - one of the reasons being
> precisely that it is only defined with minimum ranges. When you need
> highly portable, appropriately-named types with these ranges, you use
> "uint_least8_t" and "int_least8_t".

No sane software engineer is going to use these types; I mean, how many
times have you come across these in code? How many times have you seen the
use of the PRI macros?

And developers don't need to: anything of any use today runs on a two's
complement byte-addressable machine with ISO 60559 floating point--anything
else can be considered out of the ordinary and relegated. C 2011 has enough
bits and pieces to deal with such "non-standard" architectures.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

On 13/04/2012 14:36, Paul Curtis wrote:
> Hi,
>
>>> char is the smallest addressable unit; the plain char type, which is
>>> either signed or unsigned, is different from both signed char and
>> unsigned char.
>>> The plain char type is selected, by the implementation, to be either
>>> sign or unsigned to generate the best code (usually fastest/smallest).
>>> Thus there is most certainly a signed char type and the valid
>>> character '\377' could well be -1. This matters when you are collating
>> strings.
>>
>> My point is that the /name/ "char" is inappropriate for numbers - it is
>> illogical, and code written to store numbers or do arithmetic on a "char"
>> does not make sense.
>
> typedef char a_small_integer_t; // solve ills

I /really/ hope no one writes code like that! You are relying on "char"
being signed on a particular platform (alternatively, you are relying on
it being unsigned - and "a_small_integer_t" being unsigned).

If you think you ever need to know whether a platform uses "signed" or
"unsigned" for plain "char", then you have made a big mistake.

So having established that you need to specify "signed" or "unsigned",
and I presume you are trying to make this work even on awkward
architectures, we now have:

typedef int_least8_t a_small_integer_t; // Welcome to the 20th century!
>
>> It (presumably) will work - but writing code that
>> works is only half the battle of writing good quality software. K&R could
>> just as well have called the type a "banana" - it would do the same job,
>> and it would be equally silly to talk about "signed banana" as "signed
>> char".
>
> The world has come to know what a signed char is.

No it hasn't. Most C programmers think "signed char" is always 8-bit,
two's compliment, with a range [-128, 127]. "int8_t" may still leave
the lower limit unspecified as -128 or -127, but it is still better.

In the great majority of cases, of course, "signed char" /is/ [-128,
127]. But just because lots of people are used to writing code in a
poorer way, is no excuse for continuing to do so. (The exception, of
course, is when modifying existing code - then it is often more
important to be consistent with the existing style than to use a better
one.)
There are many MISRA rules that I disagree with, but they got this one
right - you should not use things like "char", "int", "short" or "long",
but used better specified typedefs. And if you /do/ use "char", you
must use "signed char" or "unsigned char".
Remember, "explicit is better than implicit" - write what you /mean/
when you write code.

>
>> Contrast it with Pascal, which defines a "Character" to hold characters,
>> and "ShortInt" to hold 8-bit integers (Pascal is equally bad at specifying
>> exact sizes, though you can improve it with subrange types).
>
> True Pascal has never had the short integer type: it only has the integer
> type and subrange types for integers, and REAL for floating point. Borland
> extended Pascal with "useful stuff" such that it should not have been called
> Pascal anymore and SHORTINT was something they invented. Ken Bowles and the
> USCD crowd did similar stuff with long integers and the long integer unit.
> Don't get me wrong, I like Pascal, I love Modula-2, and I adore Lisp, but
> language design is an art, not a science.
>

You are right about the history of Pascal, of course - but it's
difficult to distinguish "pure Pascal" and "Borland Pascal", especially
as "Borland Pascal" is the de facto standard.

I agree entirely about language design being an art - and it depends
heavily on the expected use of the language. I am a Python fan, and I
don't complain that numbers there are far less specified. But at least
number types are called "int" and "float", not "character".

>> Both "Character" and "ShortInt" boil down to an 8-bit unit on most
>> systems, or a bigger unit on systems that can't address bytes. But the
>> names are distinct to indicate different usage, and help people write
>> clear, logical and correct code.
>
> A typedef would sort out naming.

Exactly correct - so in C programming, use an appropriate typedef name
to mean what you say, instead of illogical "native" type names. You can
find a good selection of appropriate typedef names in .

>
>>> > The appropriate choice of type for 8-bit arithmetic values is>
>>> "uint8_t" or "int8_t" - these names say what you mean, so that the
>>> code> you write is clearer and more logical.
>>>
>>> No. These are defined for STORAGE only: uint_t is stored in exactly
>>> n bits in two's complement form without any padding. That type name
>>> may not
>>
>> Unsigned data is not "two's complement", as it is unsigned. And to my
>> knowledge, C standards don't actually require two's complement for signed
>> data.
>
> Slip of the finger with the u; however, if you have an int8_t, and you are
> on a two's complement machine, these types require two's complement storage.
> Section 7.20.1.1 of the 2011 standard. Other plain types can be stored
> signed-magnitude.

Does this mean that on a two's compliment machine, something like an
"int16_t" would be required to be stored as two's complement, while a
"short int" (which is likely to be 16-bit as well) could theoretically
be stored signed-magnitude or some other representation? I can't
imagine that would ever occur, but it would be interesting to know if
the letter of the law allowed it.

>
> [ snip ]
>
>> Yes, all arithmetic on uint8_t and int8_t is done by first promoting the
>> types to "int" (or, if necessary, unsigned int).
>>
>> That doesn't change the fact that it makes sense to write "uint8_t x >> 100;" but it is meaningless to write "unsigned char x = 100;" - even
>> though the generated code is identical.
>
> char can store a character of the ASCII character set, as can an int. There
> is no "character" data type in C, everything is an integer or unsigned or
> longer.
>
>>> All variants of char must meet minimum requirements according to the
>>> standard and can be relied upon to be present. Therefore, char is more
>>> portable than uint8_t and, by its definition, both signed and unsigned
>>> char can be relied upon to store something between [-127, 127] (yes,
>>> that is
>>> correct) or [0, 255].
>>
>> As explained above, char is not portable - one of the reasons being
>> precisely that it is only defined with minimum ranges. When you need
>> highly portable, appropriately-named types with these ranges, you use
>> "uint_least8_t" and "int_least8_t".
>
> No sane software engineer is going to use these types; I mean, how many
> times have you come across these in code? How many times have you seen the
> use of the PRI macros?
>

I haven't used those types myself - but then, I've managed to avoid
using targets that can't address bytes, except for one project long ago.
Whenever anyone suggests "wouldn't a TMS320 DSP be a good choice
here", I make sure they know /exactly/ where they can put the monstrosity.

Most code doesn't have to be particularly portable, and there is no
point in sacrificing readability in the name of unnecessary portability.

On the other hand, I /have/ used int_fast8_t and uint_fast8_t on a few
occasions for code that is portable between an AVR and an msp430, and
for which the need for fast code made it worth the inconvenience.

> And developers don't need to: anything of any use today runs on a two's
> complement byte-addressable machine with ISO 60559 floating point--anything
> else can be considered out of the ordinary and relegated. C 2011 has enough
> bits and pieces to deal with such "non-standard" architectures.
>

Agreed.
Another reason to love

***.B or ***.W

Just stirring as usual.

Al

On 13/04/2012 10:33 PM, David Brown wrote:
> On 13/04/2012 14:36, Paul Curtis wrote:
>> Hi,
>>
>>>> char is the smallest addressable unit; the plain char type, which is
>>>> either signed or unsigned, is different from both signed char and
>>> unsigned char.
>>>> The plain char type is selected, by the implementation, to be either
>>>> sign or unsigned to generate the best code (usually fastest/smallest).
>>>> Thus there is most certainly a signed char type and the valid
>>>> character '\377' could well be -1. This matters when you are collating
>>> strings.
>>>
>>> My point is that the /name/ "char" is inappropriate for numbers - it is
>>> illogical, and code written to store numbers or do arithmetic on a "char"
>>> does not make sense.
>> typedef char a_small_integer_t; // solve ills
> I /really/ hope no one writes code like that! You are relying on "char"
> being signed on a particular platform (alternatively, you are relying on
> it being unsigned - and "a_small_integer_t" being unsigned).
>
> If you think you ever need to know whether a platform uses "signed" or
> "unsigned" for plain "char", then you have made a big mistake.
>
> So having established that you need to specify "signed" or "unsigned",
> and I presume you are trying to make this work even on awkward
> architectures, we now have:
>
> typedef int_least8_t a_small_integer_t; // Welcome to the 20th century!
>>> It (presumably) will work - but writing code that
>>> works is only half the battle of writing good quality software. K&R could
>>> just as well have called the type a "banana" - it would do the same job,
>>> and it would be equally silly to talk about "signed banana" as "signed
>>> char".
>> The world has come to know what a signed char is.
> No it hasn't. Most C programmers think "signed char" is always 8-bit,
> two's compliment, with a range [-128, 127]. "int8_t" may still leave
> the lower limit unspecified as -128 or -127, but it is still better.
>
> In the great majority of cases, of course, "signed char" /is/ [-128,
> 127]. But just because lots of people are used to writing code in a
> poorer way, is no excuse for continuing to do so. (The exception, of
> course, is when modifying existing code - then it is often more
> important to be consistent with the existing style than to use a better
> one.)
> There are many MISRA rules that I disagree with, but they got this one
> right - you should not use things like "char", "int", "short" or "long",
> but used better specified typedefs. And if you /do/ use "char", you
> must use "signed char" or "unsigned char".
> Remember, "explicit is better than implicit" - write what you /mean/
> when you write code.
>
>>> Contrast it with Pascal, which defines a "Character" to hold characters,
>>> and "ShortInt" to hold 8-bit integers (Pascal is equally bad at specifying
>>> exact sizes, though you can improve it with subrange types).
>> True Pascal has never had the short integer type: it only has the integer
>> type and subrange types for integers, and REAL for floating point. Borland
>> extended Pascal with "useful stuff" such that it should not have been called
>> Pascal anymore and SHORTINT was something they invented. Ken Bowles and the
>> USCD crowd did similar stuff with long integers and the long integer unit.
>> Don't get me wrong, I like Pascal, I love Modula-2, and I adore Lisp, but
>> language design is an art, not a science.
>>
> You are right about the history of Pascal, of course - but it's
> difficult to distinguish "pure Pascal" and "Borland Pascal", especially
> as "Borland Pascal" is the de facto standard.
>
> I agree entirely about language design being an art - and it depends
> heavily on the expected use of the language. I am a Python fan, and I
> don't complain that numbers there are far less specified. But at least
> number types are called "int" and "float", not "character".
>
>>> Both "Character" and "ShortInt" boil down to an 8-bit unit on most
>>> systems, or a bigger unit on systems that can't address bytes. But the
>>> names are distinct to indicate different usage, and help people write
>>> clear, logical and correct code.
>> A typedef would sort out naming.
> Exactly correct - so in C programming, use an appropriate typedef name
> to mean what you say, instead of illogical "native" type names. You can
> find a good selection of appropriate typedef names in.
>
>>>> > The appropriate choice of type for 8-bit arithmetic values is>
>>>> "uint8_t" or "int8_t" - these names say what you mean, so that the
>>>> code> you write is clearer and more logical.
>>>>
>>>> No. These are defined for STORAGE only: uint_t is stored in exactly
>>>> n bits in two's complement form without any padding. That type name
>>>> may not
>>> Unsigned data is not "two's complement", as it is unsigned. And to my
>>> knowledge, C standards don't actually require two's complement for signed
>>> data.
>> Slip of the finger with the u; however, if you have an int8_t, and you are
>> on a two's complement machine, these types require two's complement storage.
>> Section 7.20.1.1 of the 2011 standard. Other plain types can be stored
>> signed-magnitude.
> Does this mean that on a two's compliment machine, something like an
> "int16_t" would be required to be stored as two's complement, while a
> "short int" (which is likely to be 16-bit as well) could theoretically
> be stored signed-magnitude or some other representation? I can't
> imagine that would ever occur, but it would be interesting to know if
> the letter of the law allowed it.
>
>> [ snip ]
>>
>>> Yes, all arithmetic on uint8_t and int8_t is done by first promoting the
>>> types to "int" (or, if necessary, unsigned int).
>>>
>>> That doesn't change the fact that it makes sense to write "uint8_t x >>> 100;" but it is meaningless to write "unsigned char x = 100;" - even
>>> though the generated code is identical.
>> char can store a character of the ASCII character set, as can an int. There
>> is no "character" data type in C, everything is an integer or unsigned or
>> longer.
>>
>>>> All variants of char must meet minimum requirements according to the
>>>> standard and can be relied upon to be present. Therefore, char is more
>>>> portable than uint8_t and, by its definition, both signed and unsigned
>>>> char can be relied upon to store something between [-127, 127] (yes,
>>>> that is
>>>> correct) or [0, 255].
>>> As explained above, char is not portable - one of the reasons being
>>> precisely that it is only defined with minimum ranges. When you need
>>> highly portable, appropriately-named types with these ranges, you use
>>> "uint_least8_t" and "int_least8_t".
>> No sane software engineer is going to use these types; I mean, how many
>> times have you come across these in code? How many times have you seen the
>> use of the PRI macros?
>>
> I haven't used those types myself - but then, I've managed to avoid
> using targets that can't address bytes, except for one project long ago.
> Whenever anyone suggests "wouldn't a TMS320 DSP be a good choice
> here", I make sure they know /exactly/ where they can put the monstrosity.
>
> Most code doesn't have to be particularly portable, and there is no
> point in sacrificing readability in the name of unnecessary portability.
>
> On the other hand, I /have/ used int_fast8_t and uint_fast8_t on a few
> occasions for code that is portable between an AVR and an msp430, and
> for which the need for fast code made it worth the inconvenience.
>
>> And developers don't need to: anything of any use today runs on a two's
>> complement byte-addressable machine with ISO 60559 floating point--anything
>> else can be considered out of the ordinary and relegated. C 2011 has enough
>> bits and pieces to deal with such "non-standard" architectures.
>>
> Agreed.
>
Puts you in control. No time lost fighting your tools.

Glad to see you back Al!

Blakely

--- In m..., Onestone wrote:
>
> Another reason to love
>
> ***.B or ***.W
>
> Just stirring as usual.
>
> Al
>
> On 13/04/2012 10:33 PM, David Brown wrote:
> > On 13/04/2012 14:36, Paul Curtis wrote:
> >> Hi,
> >>
> >>>> char is the smallest addressable unit; the plain char type, which is
> >>>> either signed or unsigned, is different from both signed char and
> >>> unsigned char.
> >>>> The plain char type is selected, by the implementation, to be either
> >>>> sign or unsigned to generate the best code (usually fastest/smallest).
> >>>> Thus there is most certainly a signed char type and the valid
> >>>> character '\377' could well be -1. This matters when you are collating
> >>> strings.
> >>>
> >>> My point is that the /name/ "char" is inappropriate for numbers - it is
> >>> illogical, and code written to store numbers or do arithmetic on a "char"
> >>> does not make sense.
> >> typedef char a_small_integer_t; // solve ills
> > I /really/ hope no one writes code like that! You are relying on "char"
> > being signed on a particular platform (alternatively, you are relying on
> > it being unsigned - and "a_small_integer_t" being unsigned).
> >
> > If you think you ever need to know whether a platform uses "signed" or
> > "unsigned" for plain "char", then you have made a big mistake.
> >
> > So having established that you need to specify "signed" or "unsigned",
> > and I presume you are trying to make this work even on awkward
> > architectures, we now have:
> >
> > typedef int_least8_t a_small_integer_t; // Welcome to the 20th century!
> >
> >
> >>> It (presumably) will work - but writing code that
> >>> works is only half the battle of writing good quality software. K&R could
> >>> just as well have called the type a "banana" - it would do the same job,
> >>> and it would be equally silly to talk about "signed banana" as "signed
> >>> char".
> >> The world has come to know what a signed char is.
> > No it hasn't. Most C programmers think "signed char" is always 8-bit,
> > two's compliment, with a range [-128, 127]. "int8_t" may still leave
> > the lower limit unspecified as -128 or -127, but it is still better.
> >
> > In the great majority of cases, of course, "signed char" /is/ [-128,
> > 127]. But just because lots of people are used to writing code in a
> > poorer way, is no excuse for continuing to do so. (The exception, of
> > course, is when modifying existing code - then it is often more
> > important to be consistent with the existing style than to use a better
> > one.)
> >
> >
> > There are many MISRA rules that I disagree with, but they got this one
> > right - you should not use things like "char", "int", "short" or "long",
> > but used better specified typedefs. And if you /do/ use "char", you
> > must use "signed char" or "unsigned char".
> >
> >
> > Remember, "explicit is better than implicit" - write what you /mean/
> > when you write code.
> >
> >>> Contrast it with Pascal, which defines a "Character" to hold characters,
> >>> and "ShortInt" to hold 8-bit integers (Pascal is equally bad at specifying
> >>> exact sizes, though you can improve it with subrange types).
> >> True Pascal has never had the short integer type: it only has the integer
> >> type and subrange types for integers, and REAL for floating point. Borland
> >> extended Pascal with "useful stuff" such that it should not have been called
> >> Pascal anymore and SHORTINT was something they invented. Ken Bowles and the
> >> USCD crowd did similar stuff with long integers and the long integer unit.
> >> Don't get me wrong, I like Pascal, I love Modula-2, and I adore Lisp, but
> >> language design is an art, not a science.
> >>
> > You are right about the history of Pascal, of course - but it's
> > difficult to distinguish "pure Pascal" and "Borland Pascal", especially
> > as "Borland Pascal" is the de facto standard.
> >
> > I agree entirely about language design being an art - and it depends
> > heavily on the expected use of the language. I am a Python fan, and I
> > don't complain that numbers there are far less specified. But at least
> > number types are called "int" and "float", not "character".
> >
> >>> Both "Character" and "ShortInt" boil down to an 8-bit unit on most
> >>> systems, or a bigger unit on systems that can't address bytes. But the
> >>> names are distinct to indicate different usage, and help people write
> >>> clear, logical and correct code.
> >> A typedef would sort out naming.
> > Exactly correct - so in C programming, use an appropriate typedef name
> > to mean what you say, instead of illogical "native" type names. You can
> > find a good selection of appropriate typedef names in.
> >
> >>>> > The appropriate choice of type for 8-bit arithmetic values is>
> >>>> "uint8_t" or "int8_t" - these names say what you mean, so that the
> >>>> code> you write is clearer and more logical.
> >>>>
> >>>> No. These are defined for STORAGE only: uint_t is stored in exactly
> >>>> n bits in two's complement form without any padding. That type name
> >>>> may not
> >>> Unsigned data is not "two's complement", as it is unsigned. And to my
> >>> knowledge, C standards don't actually require two's complement for signed
> >>> data.
> >> Slip of the finger with the u; however, if you have an int8_t, and you are
> >> on a two's complement machine, these types require two's complement storage.
> >> Section 7.20.1.1 of the 2011 standard. Other plain types can be stored
> >> signed-magnitude.
> > Does this mean that on a two's compliment machine, something like an
> > "int16_t" would be required to be stored as two's complement, while a
> > "short int" (which is likely to be 16-bit as well) could theoretically
> > be stored signed-magnitude or some other representation? I can't
> > imagine that would ever occur, but it would be interesting to know if
> > the letter of the law allowed it.
> >
> >> [ snip ]
> >>
> >>> Yes, all arithmetic on uint8_t and int8_t is done by first promoting the
> >>> types to "int" (or, if necessary, unsigned int).
> >>>
> >>> That doesn't change the fact that it makes sense to write "uint8_t x > >>> 100;" but it is meaningless to write "unsigned char x = 100;" - even
> >>> though the generated code is identical.
> >> char can store a character of the ASCII character set, as can an int. There
> >> is no "character" data type in C, everything is an integer or unsigned or
> >> longer.
> >>
> >>>> All variants of char must meet minimum requirements according to the
> >>>> standard and can be relied upon to be present. Therefore, char is more
> >>>> portable than uint8_t and, by its definition, both signed and unsigned
> >>>> char can be relied upon to store something between [-127, 127] (yes,
> >>>> that is
> >>>> correct) or [0, 255].
> >>> As explained above, char is not portable - one of the reasons being
> >>> precisely that it is only defined with minimum ranges. When you need
> >>> highly portable, appropriately-named types with these ranges, you use
> >>> "uint_least8_t" and "int_least8_t".
> >> No sane software engineer is going to use these types; I mean, how many
> >> times have you come across these in code? How many times have you seen the
> >> use of the PRI macros?
> >>
> > I haven't used those types myself - but then, I've managed to avoid
> > using targets that can't address bytes, except for one project long ago.
> > Whenever anyone suggests "wouldn't a TMS320 DSP be a good choice
> > here", I make sure they know /exactly/ where they can put the monstrosity.
> >
> > Most code doesn't have to be particularly portable, and there is no
> > point in sacrificing readability in the name of unnecessary portability.
> >
> > On the other hand, I /have/ used int_fast8_t and uint_fast8_t on a few
> > occasions for code that is portable between an AVR and an msp430, and
> > for which the need for fast code made it worth the inconvenience.
> >
> >> And developers don't need to: anything of any use today runs on a two's
> >> complement byte-addressable machine with ISO 60559 floating point--anything
> >> else can be considered out of the ordinary and relegated. C 2011 has enough
> >> bits and pieces to deal with such "non-standard" architectures.
> >>
> > Agreed.
> >
> >
> >
> >
> >
> >
> >
> > typedef char a_small_integer_t; // solve ills
>
> I /really/ hope no one writes code like that! You are relying on "char"
> being signed on a particular platform (alternatively, you are relying on
> it being unsigned - and "a_small_integer_t" being unsigned).

I purposely stated "a small integer", I did not say "a small signed integer"
or "a small unsigned integer"; it's just "a small integer" that generates
the "best code" for a particular architecture.

> If you think you ever need to know whether a platform uses "signed" or
> "unsigned" for plain "char", then you have made a big mistake.
I agree up to a point. I would say that you need to take care and observe
caution when plain char can be signed or unsigned. This bites people when
they don't expect it:

int lut[256] ] = { ... };

void foo(char *p)
{
int x = lut[*p++];
}

If plain char is signed, then this simple-looking statement is prone to fail
when using a common locale, such as HU.hu, when the input comes from a
string with locale-specific extended characters. And when collating
strings, plain char as signed might get you unexpected results unless you
use the C library collation functions.

> I presume you are trying to make this work even on awkward architectures,
> we now have:
>
> typedef int_least8_t a_small_integer_t; // Welcome to the 20th
century!

Wow. I'm in the 21st century--way ahead of you, there. :-)

> There are many MISRA rules that I disagree with, but they got this one
> right - you should not use things like "char", "int", "short" or "long",
> but used better specified typedefs. And if you /do/ use "char", you must
> use "signed char" or "unsigned char".

There are worse things in C. The worst is the sheer number of precedence
levels for operators. Modula-2 and Pascal has this right, you need only
remember multiplication and addition and the relational operator
precendence. That served everybody because it maps easily to mathematical
terms and products.

> You are right about the history of Pascal, of course - but it's difficult
> to distinguish "pure Pascal" and "Borland Pascal", especially as "Borland
> Pascal" is the de facto standard.

ISO Extended Pascal is the one true standard; and Prospero did a good job of
implementing it. Pascal is a backwater language now.

> > Slip of the finger with the u; however, if you have an int8_t, and you
> are
> > on a two's complement machine, these types require two's complement
> storage.
> > Section 7.20.1.1 of the 2011 standard. Other plain types can be stored
> > signed-magnitude.
>
> Does this mean that on a two's compliment machine, something like an
> "int16_t" would be required to be stored as two's complement, while a
> "short int" (which is likely to be 16-bit as well) could theoretically
> be stored signed-magnitude or some other representation?

Good question.

> I can't imagine that would ever occur, but it would be interesting to know
if
> the letter of the law allowed it.

I would have to work this through; I can't imagine many signed-magnitude
machines in use today other than for decimal arithmetic. Even that is now
covered by the latest 60559 standard; this is some more delight that I need
to take in and understand.

> > No sane software engineer is going to use these types; I mean, how many
> > times have you come across these in code? How many times have you seen
> > the use of the PRI macros?
>
> I haven't used those types myself - but then, I've managed to avoid
> using targets that can't address bytes, except for one project long ago.
> Whenever anyone suggests "wouldn't a TMS320 DSP be a good choice
> here", I make sure they know /exactly/ where they can put the monstrosity.

Having programmed low-level DSP in C on a TMS320C40 for a radar application,
I must say I don't really like that architecture.

> Most code doesn't have to be particularly portable, and there is no
> point in sacrificing readability in the name of unnecessary portability.

Indeed.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

On 13/04/2012 15:53, Paul Curtis wrote:
>>> typedef char a_small_integer_t; // solve ills
>>
>> I /really/ hope no one writes code like that! You are relying on "char"
>> being signed on a particular platform (alternatively, you are relying on
>> it being unsigned - and "a_small_integer_t" being unsigned).
>
> I purposely stated "a small integer", I did not say "a small signed integer"
> or "a small unsigned integer"; it's just "a small integer" that generates
> the "best code" for a particular architecture.
>
>> If you think you ever need to know whether a platform uses "signed" or
>> "unsigned" for plain "char", then you have made a big mistake.
> I agree up to a point. I would say that you need to take care and observe
> caution when plain char can be signed or unsigned. This bites people when
> they don't expect it:
>
> int lut[256] ] = { ... };
>
> void foo(char *p)
> {
> int x = lut[*p++];
> }
>
> If plain char is signed, then this simple-looking statement is prone to fail
> when using a common locale, such as HU.hu, when the input comes from a
> string with locale-specific extended characters. And when collating
> strings, plain char as signed might get you unexpected results unless you
> use the C library collation functions.
>

There's a good reason why gcc has "-Wchar-subscripts" as a warning, and
I suspect many other compilers will warn against such issues.

>> I presume you are trying to make this work even on awkward architectures,
>> we now have:
>>
>> typedef int_least8_t a_small_integer_t; // Welcome to the 20th
> century!
>
> Wow. I'm in the 21st century--way ahead of you, there. :-)

That's 1-1 on the silly typos score :-)

>
>> There are many MISRA rules that I disagree with, but they got this one
>> right - you should not use things like "char", "int", "short" or "long",
>> but used better specified typedefs. And if you /do/ use "char", you must
>> use "signed char" or "unsigned char".
>
> There are worse things in C. The worst is the sheer number of precedence
> levels for operators. Modula-2 and Pascal has this right, you need only
> remember multiplication and addition and the relational operator
> precendence. That served everybody because it maps easily to mathematical
> terms and products.
>

Didn't I warn you not to get started here? The only sane way to write
expressions in C is to rely on precedence for basic mathematical and
relational operators, as you suggest, and use brackets for the rest.

>> You are right about the history of Pascal, of course - but it's difficult
>> to distinguish "pure Pascal" and "Borland Pascal", especially as "Borland
>> Pascal" is the de facto standard.
>
> ISO Extended Pascal is the one true standard; and Prospero did a good job of
> implementing it. Pascal is a backwater language now.
>

Yes, but the trouble with "ISO Pascal" is that it's usage is tiny
compared to "Borland Pascal" - I didn't actually know that /anyone/
implemented it. There are also a few others, such as embedded Pascal
compilers and Free Pascal, that typically support some Borland additions
and some of their own.

>>> Slip of the finger with the u; however, if you have an int8_t, and you
>> are
>>> on a two's complement machine, these types require two's complement
>> storage.
>>> Section 7.20.1.1 of the 2011 standard. Other plain types can be stored
>>> signed-magnitude.
>>
>> Does this mean that on a two's compliment machine, something like an
>> "int16_t" would be required to be stored as two's complement, while a
>> "short int" (which is likely to be 16-bit as well) could theoretically
>> be stored signed-magnitude or some other representation?
>
> Good question.
>
>> I can't imagine that would ever occur, but it would be interesting to know
> if
>> the letter of the law allowed it.
>
> I would have to work this through; I can't imagine many signed-magnitude
> machines in use today other than for decimal arithmetic. Even that is now
> covered by the latest 60559 standard; this is some more delight that I need
> to take in and understand.
>
>>> No sane software engineer is going to use these types; I mean, how many
>>> times have you come across these in code? How many times have you seen
>>> the use of the PRI macros?
>>
>> I haven't used those types myself - but then, I've managed to avoid
>> using targets that can't address bytes, except for one project long ago.
>> Whenever anyone suggests "wouldn't a TMS320 DSP be a good choice
>> here", I make sure they know /exactly/ where they can put the monstrosity.
>
> Having programmed low-level DSP in C on a TMS320C40 for a radar application,
> I must say I don't really like that architecture.
>

I'd put it a little more strongly - but then, I'm not as diplomatic as you!

>> Most code doesn't have to be particularly portable, and there is no
>> point in sacrificing readability in the name of unnecessary portability.
>
> Indeed.
>

mvh.,

David

On Fri, 13 Apr 2012 10:10:40 +0100, Paul wrote:

>This is even more prevalent when using signed char types which are not
>natively supported on the MSP430:
>
>/* 8-bit-blinkered AVR programmer assume always that "smaller is more
>efficient". */
>
>signed char n;
>void bar(int x);
>
>void foo(void)
>{
> signed char x;
> for (x = 1; x < n+2; ++x)
> bar(x);
>}
>
> MOV.B #1, R11
> JMP @48
>@49
> MOV.B R11, R15
> SXT R15
> CALL #_bar
> ADD.B #1, R11
>@48
> MOV.B &_n, R15
> SXT R15
> ADD.W #2, R15
> MOV.B R11, R14
> SXT R14
> CMP.W R15, R14
> JL @49
>
>Here we see the converse: we need to sign extend R15 after x is moved to it;
>this is because the .B operations zero the high order part of the register,
>they do not sign extend it. And the simple-looking comparison of two signed
>chars with a +1 in the mix? Yes, that looks great, doesn't it?
>



Paul, just a side-bar question. If the loop body didn't
include a function call which arguably may modify n, but
instead was a block of C statements that clearly didn't
modify n, would the 'n+2' computation then be lifted outside
the loop? (n isn't volatile and n+2 should be unvarying
during loop execution.) It's seems to me the answer is yes,
it would lift the unvarying subexpression outside the loop.
But I'm curious, anyway, and just thought I'd ask.

Also, and this is a weird recollection about the C standards
(can't recall if C89 or C99 or both -- and did you bring up
the 2011 standard???) but I seem to recall that for the
comparison under discussion that takes place, x < n+2, the
compiler is permitted to avoid explicit "integer promotions"
in the generated code if the compiler can determine that the
actual emitted code acts "as if" the promotions had occurred.
(While the situation you show above may not permit this to be
fully determinable without knowledge of the value of n on
entry, I could easily pony up one that should be calculable
by a compiler as working "as if" without the promotions
having to occur. So the question would remain.) While I'm
probably wrong about that recollection, I'm also curious
about your thoughts there.

Jon
On 24 Apr 2012, at 22:17, Jon Kirwan wrote:

> On Fri, 13 Apr 2012 10:10:40 +0100, Paul wrote:
>
>>
>>
>> This is even more prevalent when using signed char types which are not
>> natively supported on the MSP430:
>>
>> /* 8-bit-blinkered AVR programmer assume always that "smaller is more
>> efficient". */
>>
>> signed char n;
>> void bar(int x);
>>
>> void foo(void)
>> {
>> signed char x;
>> for (x = 1; x < n+2; ++x)
>> bar(x);
>> }
>>
>> MOV.B #1, R11
>> JMP @48
>> @49
>> MOV.B R11, R15
>> SXT R15
>> CALL #_bar
>> ADD.B #1, R11
>> @48
>> MOV.B &_n, R15
>> SXT R15
>> ADD.W #2, R15
>> MOV.B R11, R14
>> SXT R14
>> CMP.W R15, R14
>> JL @49
>>
>> Here we see the converse: we need to sign extend R15 after x is moved to it;
>> this is because the .B operations zero the high order part of the register,
>> they do not sign extend it. And the simple-looking comparison of two signed
>> chars with a +1 in the mix? Yes, that looks great, doesn't it?
>> Paul, just a side-bar question. If the loop body didn't
> include a function call which arguably may modify n, but
> instead was a block of C statements that clearly didn't
> modify n, would the 'n+2' computation then be lifted outside
> the loop?

It could be. Whether it is or not is a different question that I cannot answer, in general.

> (n isn't volatile and n+2 should be unvarying
> during loop execution.) It's seems to me the answer is yes,
> it would lift the unvarying subexpression outside the loop.
> But I'm curious, anyway, and just thought I'd ask.
>
> Also, and this is a weird recollection about the C standards
> (can't recall if C89 or C99 or both -- and did you bring up
> the 2011 standard???) but I seem to recall that for the
> comparison under discussion that takes place, x < n+2, the
> compiler is permitted to avoid explicit "integer promotions"
> in the generated code if the compiler can determine that the
> actual emitted code acts "as if" the promotions had occurred.

That is correct. The execution must match the abstract semantics; the compiler can do all it wants to rearrange the program as long as the side effects of the program are maintained as per the abstract machine state.

In compilers it is common to make integer conversions explicit in the intermediate representation. The optimiser can then narrow conversion or even discard conversions during some phases of optimisation--generally there is more than one place in the compiler where such things can be detected and discarded, and it's usually a pragmatic decision on where it's easiest to detect and modify "an" IR.

> (While the situation you show above may not permit this to be
> fully determinable without knowledge of the value of n on
> entry, I could easily pony up one that should be calculable
> by a compiler as working "as if" without the promotions
> having to occur. So the question would remain.) While I'm
> probably wrong about that recollection, I'm also curious
> about your thoughts there.

All ISO standards are specified using an abstract machine, so it's equally applicable to C89/90, C99, and C11.

-- Paul.

On Tue, 24 Apr 2012 23:13:30 +0100, Paul wrote:

>On 24 Apr 2012, at 22:17, Jon Kirwan wrote:
>
>> On Fri, 13 Apr 2012 10:10:40 +0100, Paul wrote:
>>
>>>
>>>
>>> This is even more prevalent when using signed char types which are not
>>> natively supported on the MSP430:
>>>
>>> /* 8-bit-blinkered AVR programmer assume always that "smaller is more
>>> efficient". */
>>>
>>> signed char n;
>>> void bar(int x);
>>>
>>> void foo(void)
>>> {
>>> signed char x;
>>> for (x = 1; x < n+2; ++x)
>>> bar(x);
>>> }
>>>
>>> MOV.B #1, R11
>>> JMP @48
>>> @49
>>> MOV.B R11, R15
>>> SXT R15
>>> CALL #_bar
>>> ADD.B #1, R11
>>> @48
>>> MOV.B &_n, R15
>>> SXT R15
>>> ADD.W #2, R15
>>> MOV.B R11, R14
>>> SXT R14
>>> CMP.W R15, R14
>>> JL @49
>>>
>>> Here we see the converse: we need to sign extend R15 after x is moved to it;
>>> this is because the .B operations zero the high order part of the register,
>>> they do not sign extend it. And the simple-looking comparison of two signed
>>> chars with a +1 in the mix? Yes, that looks great, doesn't it?
>>>
>>
>>
>>
>> Paul, just a side-bar question. If the loop body didn't
>> include a function call which arguably may modify n, but
>> instead was a block of C statements that clearly didn't
>> modify n, would the 'n+2' computation then be lifted outside
>> the loop?
>
> It could be. Whether it is or not is a different question
> that I cannot answer, in general.

I was asking this question 'in particular.' I meant in the
case of the compiler you used for the disassembly I see
above. Not in general. That answer I think I already knew.

>> (n isn't volatile and n+2 should be unvarying
>> during loop execution.) It's seems to me the answer is yes,
>> it would lift the unvarying subexpression outside the loop.
>> But I'm curious, anyway, and just thought I'd ask.
>>
>> Also, and this is a weird recollection about the C standards
>> (can't recall if C89 or C99 or both -- and did you bring up
>> the 2011 standard???) but I seem to recall that for the
>> comparison under discussion that takes place, x < n+2, the
>> compiler is permitted to avoid explicit "integer promotions"
>> in the generated code if the compiler can determine that the
>> actual emitted code acts "as if" the promotions had occurred.
>
> That is correct. The execution must match the abstract
> semantics; the compiler can do all it wants to rearrange the
> program as long as the side effects of the program are
> maintained as per the abstract machine state.

Right.

> In compilers it is common to make integer conversions
> explicit in the intermediate representation.

Ah hah! That's a concrete tidbit to work from. So if I
understand this correctly, the integer promotion is made
during parsing and syntax analysis and before anything else.
So the original programmer "hint" is, in effect, lost to the
compiler and isn't recoverable should there be a reason to do
so.

> The optimiser
> can then narrow conversion or even discard conversions
> during some phases of optimisation--generally there is more
> than one place in the compiler where such things can be
> detected and discarded, and it's usually a pragmatic
> decision on where it's easiest to detect and modify "an" IR.

Can you think of cases where the "narrowing conversion" logic
would _FAIL_ to make a narrowing conversion, lacking the
original programmer's explicit syntax declaring the type, but
where it may have been possible to do more _with_ that
information present at the time the narrowing conversion
logic operates? Or is there a 1:1 situation where ALL such
cases where narrowing is possible under the rules, that the
loss of the explicit syntax has absolutely no possible impact
on the options available during narrowing?

Now I've set myself a puzzle to consider. Maybe you already
have an answer. But I'll think about ponying up something
interesting to ask the question more precisely, too.

>> (While the situation you show above may not permit this to be
>> fully determinable without knowledge of the value of n on
>> entry, I could easily pony up one that should be calculable
>> by a compiler as working "as if" without the promotions
>> having to occur. So the question would remain.) While I'm
>> probably wrong about that recollection, I'm also curious
>> about your thoughts there.
>
> All ISO standards are specified using an abstract machine,
> so it's equally applicable to C89/90, C99, and C11.

I'm always suspicious of my own failures and leave room for
that.

Thanks,
Jon