EmbeddedRelated.com
Forums
Memfault State of IoT Report

Warning in IAR when performing bitwise not on unsigned char (corrected)

Started by distantship101 April 3, 2012
Dan,

> First off, there is no such concept as an "unsigned char" (or a "signed
> char").

But there is such a concept because (a) it is enshrined in the ISO standard
for C and (b) is the smallest addressable unit for the standard such that
sizeof() computes its result using the definition sizeof(char) =
sizeof(unsigned char) = sizeof(signed char) = 1.

> Can show me the difference between a positive letter X and a
> negative letter X? "char" is for characters, letters, and parts of
> strings.

char is the smallest addressable unit; the plain char type, which is either
signed or unsigned, is different from both signed char and unsigned char.
The plain char type is selected, by the implementation, to be either sign or
unsigned to generate the best code (usually fastest/smallest). Thus there
is most certainly a signed char type and the valid character '\377' could
well be -1. This matters when you are collating strings.

> The appropriate choice of type for 8-bit arithmetic values is
> "uint8_t" or "int8_t" - these names say what you mean, so that the code
> you write is clearer and more logical.

No. These are defined for STORAGE only: uint_t is stored in exactly n
bits in two's complement form without any padding. That type name may not
even exist in some implementations and, therefore, such types are not
universally portable--char is more portable in this regard. Returning to
the storage requirement, all arithmetic is carried out using the basic types
int, unsigned, and their long and long long variants. No arithmetic is ever
carried out in uint8_t or int8_t because those types are storage types only.

All variants of char must meet minimum requirements according to the
standard and can be relied upon to be present. Therefore, char is more
portable than uint8_t and, by its definition, both signed and unsigned char
can be relied upon to store something between [-127, 127] (yes, that is
correct) or [0, 255].

> The use of "unsigned char" as a type is a hang-over from one of K&R's
> many bad language design decisions, and should be avoided.

I think this is overstating the case rather a lot.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

Beginning Microcontrollers with the MSP430

> Paul,
>
> I always read your posts as you have taught me some important and useful
> things about compilers. Would you care to expand on this remark below? I
> always use unsigned chars as return types on the MPS430, just as you say,
> assuming it uses less space than words and will run faster. I use them as
> my default variable for the same reason. Please tell us more.

The problem with using 8-bit types on a 16-bit machine is that the internal
registers are 16 bits wide. Storing 8-bit data in a 16-bit register is not
without problems. I will simplify this a little so that it isn't too
technical.

unsigned char x;

unsigned char bar(void)
{
return x;
}

unsigned char foo(void)
{
return bar() / 3;
}

CrossWorks generates this:

bar:
MOV.B &_x, R15
RET

Foo:
CALL #_bar
MOV.W #3, R14
CALL #___int16_div
MOV.B R15, R15
RET

Now, there is an extra MOV.B R15, R15 to normalize the output of the integer
division, because bar()/3 must be computed in integer prevision, to the
range of unsigned char to return from foo. The more of these you do, the
more masking operations are required. If you go with the plain int or
unsigned, which matches the register width of the machine, then the masking
disappears.

This is even more prevalent when using signed char types which are not
natively supported on the MSP430:

/* 8-bit-blinkered AVR programmer assume always that "smaller is more
efficient". */

signed char n;
void bar(int x);

void foo(void)
{
signed char x;
for (x = 1; x < n+2; ++x)
bar(x);
}

MOV.B #1, R11
JMP @48
@49
MOV.B R11, R15
SXT R15
CALL #_bar
ADD.B #1, R11
@48
MOV.B &_n, R15
SXT R15
ADD.W #2, R15
MOV.B R11, R14
SXT R14
CMP.W R15, R14
JL @49

Here we see the converse: we need to sign extend R15 after x is moved to it;
this is because the .B operations zero the high order part of the register,
they do not sign extend it. And the simple-looking comparison of two signed
chars with a +1 in the mix? Yes, that looks great, doesn't it?

There are more examples of this. In general, you want your parameter types
and return types *not* to be "unsigned char" or "signed char", you should
prefer them to be "int" if you do not wish to be penalised by the
compiler+architecture combination.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

Hi Paul

Why is it I get a bad class in Delays.h when it defines an unsigned char ?

Thanks Fran

From: m... [mailto:m...] On Behalf Of
Paul Curtis
Sent: Friday, April 13, 2012 5:11 AM
To: m...
Subject: RE: [msp430] Re: Warning in IAR when performing bitwise not on
unsigned char (corrected)

> Paul,
>
> I always read your posts as you have taught me some important and useful
> things about compilers. Would you care to expand on this remark below? I
> always use unsigned chars as return types on the MPS430, just as you say,
> assuming it uses less space than words and will run faster. I use them as
> my default variable for the same reason. Please tell us more.

The problem with using 8-bit types on a 16-bit machine is that the internal
registers are 16 bits wide. Storing 8-bit data in a 16-bit register is not
without problems. I will simplify this a little so that it isn't too
technical.

unsigned char x;

unsigned char bar(void)
{
return x;
}

unsigned char foo(void)
{
return bar() / 3;
}

CrossWorks generates this:

bar:
MOV.B &_x, R15
RET

Foo:
CALL #_bar
MOV.W #3, R14
CALL #___int16_div
MOV.B R15, R15
RET

Now, there is an extra MOV.B R15, R15 to normalize the output of the integer
division, because bar()/3 must be computed in integer prevision, to the
range of unsigned char to return from foo. The more of these you do, the
more masking operations are required. If you go with the plain int or
unsigned, which matches the register width of the machine, then the masking
disappears.

This is even more prevalent when using signed char types which are not
natively supported on the MSP430:

/* 8-bit-blinkered AVR programmer assume always that "smaller is more
efficient". */

signed char n;
void bar(int x);

void foo(void)
{
signed char x;
for (x = 1; x < n+2; ++x)
bar(x);
}

MOV.B #1, R11
JMP @48
@49
MOV.B R11, R15
SXT R15
CALL #_bar
ADD.B #1, R11
@48
MOV.B &_n, R15
SXT R15
ADD.W #2, R15
MOV.B R11, R14
SXT R14
CMP.W R15, R14
JL @49

Here we see the converse: we need to sign extend R15 after x is moved to it;
this is because the .B operations zero the high order part of the register,
they do not sign extend it. And the simple-looking comparison of two signed
chars with a +1 in the mix? Yes, that looks great, doesn't it?

There are more examples of this. In general, you want your parameter types
and return types *not* to be "unsigned char" or "signed char", you should
prefer them to be "int" if you do not wish to be penalised by the
compiler+architecture combination.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426



> Hi Paul
>
> Why is it I get a bad class in Delays.h when it defines an unsigned char
> ?

How do you expect me to know? I mean, seriously?

>
> Thanks Fran
>
> From: m... [mailto:m...] On Behalf Of
> Paul Curtis
> Sent: Friday, April 13, 2012 5:11 AM
> To: m...
> Subject: RE: [msp430] Re: Warning in IAR when performing bitwise not on
> unsigned char (corrected)
>
> > Paul,
> >
> > I always read your posts as you have taught me some important and
> > useful things about compilers. Would you care to expand on this remark
> > below? I always use unsigned chars as return types on the MPS430, just
> > as you say, assuming it uses less space than words and will run
> > faster. I use them as my default variable for the same reason. Please
> tell us more.
>
> The problem with using 8-bit types on a 16-bit machine is that the
> internal registers are 16 bits wide. Storing 8-bit data in a 16-bit
> register is not without problems. I will simplify this a little so that it
> isn't too technical.
>
> unsigned char x;
>
> unsigned char bar(void)
> {
> return x;
> }
>
> unsigned char foo(void)
> {
> return bar() / 3;
> }
>
> CrossWorks generates this:
>
> bar:
> MOV.B &_x, R15
> RET
>
> Foo:
> CALL #_bar
> MOV.W #3, R14
> CALL #___int16_div
> MOV.B R15, R15
> RET
>
> Now, there is an extra MOV.B R15, R15 to normalize the output of the
> integer division, because bar()/3 must be computed in integer prevision,
> to the range of unsigned char to return from foo. The more of these you
> do, the more masking operations are required. If you go with the plain int
> or unsigned, which matches the register width of the machine, then the
> masking disappears.
>
> This is even more prevalent when using signed char types which are not
> natively supported on the MSP430:
>
> /* 8-bit-blinkered AVR programmer assume always that "smaller is more
> efficient". */
>
> signed char n;
> void bar(int x);
>
> void foo(void)
> {
> signed char x;
> for (x = 1; x < n+2; ++x)
> bar(x);
> }
>
> MOV.B #1, R11
> JMP @48
> @49
> MOV.B R11, R15
> SXT R15
> CALL #_bar
> ADD.B #1, R11
> @48
> MOV.B &_n, R15
> SXT R15
> ADD.W #2, R15
> MOV.B R11, R14
> SXT R14
> CMP.W R15, R14
> JL @49
>
> Here we see the converse: we need to sign extend R15 after x is moved to
> it; this is because the .B operations zero the high order part of the
> register, they do not sign extend it. And the simple-looking comparison of
> two signed chars with a +1 in the mix? Yes, that looks great, doesn't it?
>
> There are more examples of this. In general, you want your parameter types
> and return types *not* to be "unsigned char" or "signed char", you should
> prefer them to be "int" if you do not wish to be penalised by the
> compiler+architecture combination.
>
> --
> Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
> SolderCore running Defender... http://www.vimeo.com/25709426
>
>
>
>
It seems you have an idea on how things are supposed to work (Guru) yet the
compiler may not

Understand the 8/16/32 bit storage of an individual CPU/MPU. Some headers
assume an 8bit mask

Is used for all storage, mem or stack.

/* Delay100TCYx

* Delay multiples of 100 Tcy

* Passing 0 (zero) results in a delay of 25,600 cycles.

* The full range of [0,255] is supported.

*/

void Delay100TCYx(PARAM_SCLASS unsigned char);

Try this in several compilers and you will see a BIG difference

Thoughts?

From: m... [mailto:m...] On Behalf Of
Paul Curtis
Sent: Friday, April 13, 2012 5:53 AM
To: m...
Subject: RE: [msp430] Re: Warning in IAR when performing bitwise not on
unsigned char (corrected)

> Hi Paul
>
> Why is it I get a bad class in Delays.h when it defines an unsigned char
> ?

How do you expect me to know? I mean, seriously?

>
> Thanks Fran
>
> From: m...
[mailto:m... ] On Behalf
Of
> Paul Curtis
> Sent: Friday, April 13, 2012 5:11 AM
> To: m...
> Subject: RE: [msp430] Re: Warning in IAR when performing bitwise not on
> unsigned char (corrected)
>
> > Paul,
> >
> > I always read your posts as you have taught me some important and
> > useful things about compilers. Would you care to expand on this remark
> > below? I always use unsigned chars as return types on the MPS430, just
> > as you say, assuming it uses less space than words and will run
> > faster. I use them as my default variable for the same reason. Please
> tell us more.
>
> The problem with using 8-bit types on a 16-bit machine is that the
> internal registers are 16 bits wide. Storing 8-bit data in a 16-bit
> register is not without problems. I will simplify this a little so that it
> isn't too technical.
>
> unsigned char x;
>
> unsigned char bar(void)
> {
> return x;
> }
>
> unsigned char foo(void)
> {
> return bar() / 3;
> }
>
> CrossWorks generates this:
>
> bar:
> MOV.B &_x, R15
> RET
>
> Foo:
> CALL #_bar
> MOV.W #3, R14
> CALL #___int16_div
> MOV.B R15, R15
> RET
>
> Now, there is an extra MOV.B R15, R15 to normalize the output of the
> integer division, because bar()/3 must be computed in integer prevision,
> to the range of unsigned char to return from foo. The more of these you
> do, the more masking operations are required. If you go with the plain int
> or unsigned, which matches the register width of the machine, then the
> masking disappears.
>
> This is even more prevalent when using signed char types which are not
> natively supported on the MSP430:
>
> /* 8-bit-blinkered AVR programmer assume always that "smaller is more
> efficient". */
>
> signed char n;
> void bar(int x);
>
> void foo(void)
> {
> signed char x;
> for (x = 1; x < n+2; ++x)
> bar(x);
> }
>
> MOV.B #1, R11
> JMP @48
> @49
> MOV.B R11, R15
> SXT R15
> CALL #_bar
> ADD.B #1, R11
> @48
> MOV.B &_n, R15
> SXT R15
> ADD.W #2, R15
> MOV.B R11, R14
> SXT R14
> CMP.W R15, R14
> JL @49
>
> Here we see the converse: we need to sign extend R15 after x is moved to
> it; this is because the .B operations zero the high order part of the
> register, they do not sign extend it. And the simple-looking comparison of
> two signed chars with a +1 in the mix? Yes, that looks great, doesn't it?
>
> There are more examples of this. In general, you want your parameter types
> and return types *not* to be "unsigned char" or "signed char", you should
> prefer them to be "int" if you do not wish to be penalised by the
> compiler+architecture combination.
>
> --
> Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
> SolderCore running Defender... http://www.vimeo.com/25709426
>
>
>
>
>
>
Hi,

> It seems you have an idea on how things are supposed to work (Guru) yet
> the compiler may not Understand the 8/16/32 bit storage of an individual
> CPU/MPU. Some headers assume an 8bit mask

The compiler always understands; "may not understand" what?

> Is used for all storage, mem or stack.

?

> /* Delay100TCYx
>
> * Delay multiples of 100 Tcy
>
> * Passing 0 (zero) results in a delay of 25,600 cycles.
>
> * The full range of [0,255] is supported.
>
> */
>
> void Delay100TCYx(PARAM_SCLASS unsigned char);

So, PARAM_SCLASS, I would conjecture, is a pre-processor symbol (defined
somehow outside the header) to set the parameter (PARAM) storage class
(SCLASS) and has no reason to be in a prototype.

> Try this in several compilers and you will see a BIG difference

A big difference in what?

> Thoughts?

My thoughts are that there is still no coherent framing of a question.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

If you use different 8/16/32 bit processors, the code compiles differently.

( I believe the MSP430 family is an 8bit processor. Thus any mask is
ignored)

EVEN if you define your mask, the compiler, may (or may not), optimize on
en even paragraph.

An observation is not necessarily always postulated as a question. Sorry if
this does not compute for you.

Please try to have a good day.

From: m... [mailto:m...] On Behalf Of
Paul Curtis
Sent: Friday, April 13, 2012 6:41 AM
To: m...
Subject: RE: [msp430] Re: Warning in IAR when performing bitwise not on
unsigned char (corrected)

Hi,

> It seems you have an idea on how things are supposed to work (Guru) yet
> the compiler may not Understand the 8/16/32 bit storage of an individual
> CPU/MPU. Some headers assume an 8bit mask

The compiler always understands; "may not understand" what?

> Is used for all storage, mem or stack.

?

> /* Delay100TCYx
>
> * Delay multiples of 100 Tcy
>
> * Passing 0 (zero) results in a delay of 25,600 cycles.
>
> * The full range of [0,255] is supported.
>
> */
>
> void Delay100TCYx(PARAM_SCLASS unsigned char);

So, PARAM_SCLASS, I would conjecture, is a pre-processor symbol (defined
somehow outside the header) to set the parameter (PARAM) storage class
(SCLASS) and has no reason to be in a prototype.

> Try this in several compilers and you will see a BIG difference

A big difference in what?

> Thoughts?

My thoughts are that there is still no coherent framing of a question.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426



> If you use different 8/16/32 bit processors, the code compiles
> differently.

Well, of course it compiles differently. Different processors have
different instruction sets.

> ( I believe the MSP430 family is an 8bit processor. Thus any mask is
> ignored)

No, the MSP430 is a 16-bit processor and there is indeed an invisible
penalty for using smaller-than-word-size types.

> EVEN if you define your mask, the compiler, may (or may not), optimize
> on en even paragraph.

Paragraphs have no place in the MSP430 architecture; this is a hang-over
from the design of older x86 processors.

> An observation is not necessarily always postulated as a question. Sorry
> if this does not compute for you.

There is no observation as far as I can tell other than "it compiles
differently" and "there is a big difference" without any definition of what
the difference is.

> Please try to have a good day.

I'm having a great day.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore running Defender... http://www.vimeo.com/25709426

From TI:

MSP430 does not provide a list of all the op-codes because there are many
available addressing modes. However, a description is available for the
individual bits that make up the various opcodes, depending on instruction
and addressing mode.

The MSP430xxxx Family User`s Guide(For Eg: MSP430x1xx Family User`s Guide
,MSP430x2xx Family User`s Guide) shows all the information available for the
instruction set in the `RISC 16-Bit CPU` chapter. The `Addressing Modes`
section explains the `As` and `Ad` bits. In the `Instruction Set` section
you can see how the HEX representation of an instruction is built from the
bits:

1. opcode
2. S-Reg (0b0000 = R0, 0b0001 = R1 ... 0b1111 = R15)
3. D-Reg (0b0000 = R0, 0b0001 = R1 ... 0b1111 = R15)
4. Ad
5. As
6. Byte or Word operation (B/W)

The section `Instruction Set Description` contains the Core Instruction Map.
The section `Instruction Cycles and Lengths` outlines the number of clock
cycles used for the instructions. The number of CPU clock cycles required
for an instruction depends on the instruction format and the addressing
modes used - not the instruction itself. The number of clock cycles refers
to the MCLK.



On 13/04/2012 10:27, Paul Curtis wrote:
> Dan,
>
> > First off, there is no such concept as an "unsigned char" (or a "signed
> > char").
>
> But there is such a concept because (a) it is enshrined in the ISO standard
> for C and (b) is the smallest addressable unit for the standard such that
> sizeof() computes its result using the definition sizeof(char) > sizeof(unsigned char) = sizeof(signed char) = 1.
>
> > Can show me the difference between a positive letter X and a
> > negative letter X? "char" is for characters, letters, and parts of
> > strings.
>
> char is the smallest addressable unit; the plain char type, which is either
> signed or unsigned, is different from both signed char and unsigned char.
> The plain char type is selected, by the implementation, to be either sign or
> unsigned to generate the best code (usually fastest/smallest). Thus there
> is most certainly a signed char type and the valid character '\377' could
> well be -1. This matters when you are collating strings.

My point is that the /name/ "char" is inappropriate for numbers - it is
illogical, and code written to store numbers or do arithmetic on a
"char" does not make sense. It (presumably) will work - but writing
code that works is only half the battle of writing good quality
software. K&R could just as well have called the type a "banana" - it
would do the same job, and it would be equally silly to talk about
"signed banana" as "signed char".

Contrast it with Pascal, which defines a "Character" to hold characters,
and "ShortInt" to hold 8-bit integers (Pascal is equally bad at
specifying exact sizes, though you can improve it with subrange types).
Both "Character" and "ShortInt" boil down to an 8-bit unit on most
systems, or a bigger unit on systems that can't address bytes. But the
names are distinct to indicate different usage, and help people write
clear, logical and correct code.

>
> > The appropriate choice of type for 8-bit arithmetic values is
> > "uint8_t" or "int8_t" - these names say what you mean, so that the code
> > you write is clearer and more logical.
>
> No. These are defined for STORAGE only: uint_t is stored in exactly n
> bits in two's complement form without any padding. That type name may not

Unsigned data is not "two's complement", as it is unsigned. And to my
knowledge, C standards don't actually require two's complement for
signed data.

> even exist in some implementations and, therefore, such types are not
> universally portable--char is more portable in this regard. Returning to

Yes, "char" may be bigger than 8-bit on machines that can't address
8-bit bytes. That doesn't really make "char" portable in practice -
even though it exists on all architectures, most "portable" code that
uses it assumes it is 8-bit. If you are writing code that needs to work
on targets which don't have "uint8_t" or "int8_t", then the correct
types to use are "uint_least8_t" and "int_least8_t". But failing that,
it is usually best for portability to use "int8_t" and "uint8_t" - then
when you try to compile on a horrible DSP architecture, you will get
clear and obvious compile-time errors, rather than mystical run-time
problems due to assumptions about "char".

> the storage requirement, all arithmetic is carried out using the basic types
> int, unsigned, and their long and long long variants. No arithmetic is ever
> carried out in uint8_t or int8_t because those types are storage types only.
>

Yes, all arithmetic on uint8_t and int8_t is done by first promoting the
types to "int" (or, if necessary, unsigned int).

That doesn't change the fact that it makes sense to write "uint8_t x 100;" but it is meaningless to write "unsigned char x = 100;" - even
though the generated code is identical.

> All variants of char must meet minimum requirements according to the
> standard and can be relied upon to be present. Therefore, char is more
> portable than uint8_t and, by its definition, both signed and unsigned char
> can be relied upon to store something between [-127, 127] (yes, that is
> correct) or [0, 255].

As explained above, char is not portable - one of the reasons being
precisely that it is only defined with minimum ranges. When you need
highly portable, appropriately-named types with these ranges, you use
"uint_least8_t" and "int_least8_t".

It is no coincidence that much well-written portable code either uses
"uint8_t" and "int8_t", or defines its own fixed-size types such as
"u8_t", rather than using "char". (They also use fixed-size types
rather than "int" or "long int".)

>
> > The use of "unsigned char" as a type is a hang-over from one of K&R's
> > many bad language design decisions, and should be avoided.
>
> I think this is overstating the case rather a lot.
>

Maybe - but I will continue to state it this way. Most C code that
exists is badly written, and a number of common faults are due to bad
language design. I don't necessarily mean that the design decisions
were wrong at the time when K&R made them - they had their priorities,
and limitations in the computing systems available at the time. But the
end result is features of the C language that should not be used in
clear, high-quality software even though it is legal C. Please don't
ask me to name others, or we'll be here all day!

mvh,.

David

Memfault State of IoT Report