Getting started with AVR and C| page 9

Reply by Stephen Sprunk ●November 29, 20122012-11-29

On 29-Nov-12 14:36, Keith Thompson wrote:
> upsidedown@downunder.com writes:
>> IMHO CHAR_BIT = 21 is the correct way to handle the Unicode range.
>>
>> On the Unicode list, I even suggested packing three 21 characters into
>> a single 64 bit data word as UTF-64 :-)
> 
> I like it -- but it breaks as soon as they add U+200000 or higher, and
> I'm not aware of any guarantee that they won't.

I thought they had guaranteed they would never go above U+10FFFF, which
would break UTF-16.

> I've thought of UTF-24, encoding each character in 3 octets; that's
> good for up to 16,777,216 distinct code points.

AIUI, there are some DSPs with CHAR_BIT==24 (or was that 12?).

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

Reply by Jon Kirwan ●November 29, 20122012-11-29

On Thu, 29 Nov 2012 22:06:08 +0000 (UTC), Grant Edwards
<invalid@invalid.invalid> wrote:

>On 2012-11-29, Jon Kirwan <jonk@infinitefactors.org> wrote:
>> On Thu, 29 Nov 2012 22:40:41 +0200, upsidedown@downunder.com
>> wrote:
>>
>>>On Thu, 29 Nov 2012 16:36:34 +0000, John Devereux
>>><john@devereux.me.uk> wrote:
>>>
>>>>Grant Edwards <invalid@invalid.invalid> writes:
>>>>
>>>>> On 2012-11-29, Tim Wescott <tim@seemywebsite.com> wrote:
>>>>>
>>>>>> It's certainly what I would expect from gcc-avr.  There's no reason you 
>>>>>> can't make a beautifully compliant, reasonably efficient compiler that 
>>>>>> works well on the AVR.
>>>>>
>>>>> avr-gcc does indeed work very nicely as long as you don't look at the
>>>>> code generated when you use pointers.  You'll go blind -- especially
>>>>> if you're used to something like the msp430.  It's easy to forget that
>>>>> the AVR is an 8-bit CPU not a 16-bit CPU like the '430, and use of
>>>>> 16-bit pointers on the AVR requires a lot of overhead.
>>>>
>>>>Other problem with it is the separate program and data memory
>>>>spaces. Fine for small deeply embedded things but started to show strain
>>>>when I wanted a LCD display, menus etc. I would not use it for a new
>>>>project unless there was a very good reason, ultra-low power
>>>>perhaps. Cortex M3 is much nicer but the chips are much more complicated
>>>>of course.
>>>
>>>Except for self modifying code, why would one want data (program)
>>>access into program space (unless you are writing a linker or
>>>debugger) ??
>>>
>>>While working with PDP-11's in the 1970's, the ability to use separate
>>>I/D (Instruction/Data) space helped a lot to keep code/data in private
>>>64 KiD address spaces.
>>
>> There are good reasons for self-modifying code space.
>
>Nobody said anything about modifying code space.

Sorry I didn't interpret things well.

>The "data" that's put in code space is never modified (at least not
>any any project I've ever seen).

I've needed writable code space. Thunking is one such
example.

>It's not _modifying_ the progam space that's the issue (that is
>generally only done for firmware updates, where the entire flash is
>erased and reprogrammed).

While I agree with the "generally" I don't agree that this
translates into 100%.

>Simply _reading_ program space _as_data_ is problematic.  If you've
>got a lot of string constants or constant tables, you want to just
>leave them in flash (program space) rather than copy them all to
>(scarce) RAM on startup.

Indeed. Completely agreed.

Jon

>Now you need three-byte pointers/addresses to differentiate between
>data at 0xABCD in data space and the data at 0xABCD in program space.
>Three byte pointers is how some compilers solve that problem -- but I
>don't think avr-gcc does that.

Reply by pete ●November 29, 20122012-11-29

James Kuyper wrote:
> 
> On 11/29/2012 09:07 AM, James Kuyper wrote:
> ...
> > "If an int can represent all values of the original type
> > (as restricted
> > by the width, for a bit-field), the value is converted to an int;
> > otherwise, it is converted to an unsigned int. These are called the
> > integer promotions.58) All other types are unchanged by the integer
> > promotions."
> > (6.3.1.1p2) The first use of "integer promotions" in that
> > clause is italicized, which is an ISO convention indicating
> > that the
> > sentence containing that phrase serves
> > as the definition of the phrase.
> 
> I just realized that the meaning of the phrase "All other types"
> is not
> clear without the preceding part of that clause which I snipped:
> 
> > The following may be used in an expression
> > wherever an int or unsigned int may
> > be used:
> > &mdash; An object or expression with an integer type
> > (other than int or unsigned int)
> > whose integer conversion rank is less than
> > or equal to the rank of int and unsigned int.
> > &mdash;
> > A bit-field of type _Bool, int, signed int, or unsigned int.

I recall reading some posts in this newsgroup a long time ago,
which claimed that under certain circumstances,
that it was possible in C99,
for unsigned int to promote to type signed int.

But that was never the case. 

In C99
6.3.1.1 paragraph 2, read as "less than"
instead of "less than or equal" as you have above;
and unsigned int type was covered by "All other types"
in the last sentence.


-- 
pete

Reply by Keith Thompson ●November 29, 20122012-11-29

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
> In comp.lang.c Jon Kirwan <jonk@infinitefactors.org> wrote:
>> On Thu, 29 Nov 2012 11:01:34 -0500, James Kuyper
>> <jameskuyper@verizon.net> wrote:
>  
>>><snip>
>>>Claims have frequently been made on
>>>comp.lang.c that, while the C standard allows CHAR_BIT != 8, the
>
> As I remember the stories, the CRAY-1 had 64 bit char.
[...]

That may well be true; I never used a Cray-1.  (And there was more
emphasis on Fortran, or should I say FORTRAN, than on C.)

By the time I started using Crays, they were running Unicos, Cray's
version of Unix, so they pretty much had to have CHAR_BIT==8.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
    Will write code for food.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Reply by Keith Thompson ●November 29, 20122012-11-29

Stephen Sprunk <stephen@sprunk.org> writes:
> On 29-Nov-12 14:36, Keith Thompson wrote:
>> upsidedown@downunder.com writes:
>>> IMHO CHAR_BIT = 21 is the correct way to handle the Unicode range.
>>>
>>> On the Unicode list, I even suggested packing three 21 characters into
>>> a single 64 bit data word as UTF-64 :-)
>> 
>> I like it -- but it breaks as soon as they add U+200000 or higher, and
>> I'm not aware of any guarantee that they won't.
>
> I thought they had guaranteed they would never go above U+10FFFF, which
> would break UTF-16.

You're right.  <http://www.unicode.org/faq/utf_bom.html> says:

    Both Unicode and ISO 10646 have policies in place that formally
    limit future code assignment to the integer range that can be
    expressed with current UTF-16 (0 to 1,114,111).

>> I've thought of UTF-24, encoding each character in 3 octets; that's
>> good for up to 16,777,216 distinct code points.
>
> AIUI, there are some DSPs with CHAR_BIT==24 (or was that 12?).

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
    Will write code for food.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Reply by James Kuyper ●November 29, 20122012-11-29

On 11/29/2012 06:25 PM, pete wrote:
> James Kuyper wrote:
>>
>> On 11/29/2012 09:07 AM, James Kuyper wrote:
>> ...
>>> "If an int can represent all values of the original type
>>> (as restricted
>>> by the width, for a bit-field), the value is converted to an int;
>>> otherwise, it is converted to an unsigned int. These are called the
>>> integer promotions.58) All other types are unchanged by the integer
>>> promotions."
>>> (6.3.1.1p2) The first use of "integer promotions" in that
>>> clause is italicized, which is an ISO convention indicating
>>> that the
>>> sentence containing that phrase serves
>>> as the definition of the phrase.
>>
>> I just realized that the meaning of the phrase "All other types"
>> is not
>> clear without the preceding part of that clause which I snipped:
>>
>>> The following may be used in an expression
>>> wherever an int or unsigned int may
>>> be used:
>>> &mdash; An object or expression with an integer type
>>> (other than int or unsigned int)
>>> whose integer conversion rank is less than
>>> or equal to the rank of int and unsigned int.
>>> &mdash;
>>> A bit-field of type _Bool, int, signed int, or unsigned int.
> 
> I recall reading some posts in this newsgroup a long time ago,
> which claimed that under certain circumstances,
> that it was possible in C99,
> for unsigned int to promote to type signed int.

An unsigned type whose entire range can be represented by an int will
promote to signed int, as can easily be confirmed by checking the above
text, and that point has been raised in this group - there were several
threads that touched on that subject in just this past summer. However,
 anyone who claimed that it could happen to "unsigned int" was mistaken.
That clause explicitly applies only to types "other than int or unsigned
int".

> But that was never the case. 
> 
> In C99
> 6.3.1.1 paragraph 2, read as "less than"
> instead of "less than or equal" as you have above;
> and unsigned int type was covered by "All other types"
> in the last sentence.

n1256.pdf (which is C99 with all three TCs applied, making it MORE
useful than C99 itself) and n1570.pdf (which is essentially identical to
C2011) both have "less than or equal to". The line is marked as being
changed from C99 in n1256.pdf, implying that one of the TCs is the
reason. My copy of C99 itself is inaccessible right now, so I can't
confirm the nature of the change.
-- 
James Kuyper

Reply by John Devereux ●November 30, 20122012-11-30

Grant Edwards <invalid@invalid.invalid> writes:

> On 2012-11-29, upsidedown@downunder.com <upsidedown@downunder.com> wrote:
>> On Thu, 29 Nov 2012 16:36:34 +0000, John Devereux
>><john@devereux.me.uk> wrote:
>>
>>>Grant Edwards <invalid@invalid.invalid> writes:
>>>
>>>> On 2012-11-29, Tim Wescott <tim@seemywebsite.com> wrote:
>>>>
>>>>> It's certainly what I would expect from gcc-avr.  There's no reason you 
>>>>> can't make a beautifully compliant, reasonably efficient compiler that 
>>>>> works well on the AVR.
>>>>
>>>> avr-gcc does indeed work very nicely as long as you don't look at the
>>>> code generated when you use pointers.  You'll go blind -- especially
>>>> if you're used to something like the msp430.  It's easy to forget that
>>>> the AVR is an 8-bit CPU not a 16-bit CPU like the '430, and use of
>>>> 16-bit pointers on the AVR requires a lot of overhead.
>>>
>>>Other problem with it is the separate program and data memory
>>>spaces. Fine for small deeply embedded things but started to show strain
>>>when I wanted a LCD display, menus etc. I would not use it for a new
>>>project unless there was a very good reason, ultra-low power
>>>perhaps. Cortex M3 is much nicer but the chips are much more complicated
>>>of course.
>>
>> Except for self modifying code, why would one want data (program)
>> access into program space (unless you are writing a linker or
>> debugger) ??
>
> The "program" space was flash (non-volatile).  The "data" space was
> registers and RAM (volatile).  All non-volatile data (strings, screen
> templates, lookup tables, menu structures, and so) has to be in flash
> memory (IOW "program space").  It makes a _lot_ of sense to just use
> directly from flash instead of copying it all to RAM when RAM is so
> scarce.

Yes, that is precisely it. The AVRs especially tended to have lots of
flash but little RAM. Access to program memory is possible on the AVR,
but you have to use special attribute modifiers everywhere and the
resulting objects become incompatible with the standard libraries, so
you have to write special versions of these...

Another thing is that, being an 8 bit machine, int and short operations
are not atomic. So you have to be very careful about protecting
variables shared with interrupt handlers (or other tasks in a preemptive
system). Good practice anyway of course but a modern CPU like Cortex M3
is a lot more forgiving since even 32 bit load/store operations are
atomic.

[...]


-- 

John Devereux

Reply by Larry Jones ●November 30, 20122012-11-30

James Kuyper <jameskuyper@verizon.net> wrote:
> 
> n1256.pdf (which is C99 with all three TCs applied, making it MORE
> useful than C99 itself) and n1570.pdf (which is essentially identical to
> C2011) both have "less than or equal to". The line is marked as being
> changed from C99 in n1256.pdf, implying that one of the TCs is the
> reason. My copy of C99 itself is inaccessible right now, so I can't
> confirm the nature of the change.

It was TC2 and the change came from DR 230. It was to handle the case of
enumerationed types with the same rank as int, it didn't have anything
to do with unsigned int.
-- 
Larry Jones

I'm a genius. -- Calvin

Reply by ●November 30, 20122012-11-30

> I believe that C was implemented on the PDP-10. I didn't use
> it when I was programming the PDP-10 (I used assembly, then,
> and some other languages... but not C, until I worked on Unix
> v6 in '78.) But that was a 36-bit machine. And ASCII was
> packed into 7 bits so that 5 chars fit in a word. No one used
> 8, so far as I recall. That was the standard method. So I'm
> curious now what the C implementation did.

For Unix, serial I/O was as important as efficient storage of data.
Most serial terminal can't do more than 8 bits, and usually 7E or 7O.
So, 8 bit char became standard.

Reply by Jon Kirwan ●November 30, 20122012-11-30

On Fri, 30 Nov 2012 09:26:39 -0800 (PST),
me@linnix.info-for.us wrote:

>> I believe that C was implemented on the PDP-10. I didn't use
>> it when I was programming the PDP-10 (I used assembly, then,
>> and some other languages... but not C, until I worked on Unix
>> v6 in '78.) But that was a 36-bit machine. And ASCII was
>> packed into 7 bits so that 5 chars fit in a word. No one used
>> 8, so far as I recall. That was the standard method. So I'm
>> curious now what the C implementation did.
>
>For Unix, serial I/O was as important as efficient storage of data.

Given the cost of memory back then, primary or secondary, a
great many man-hours were spent on efficient storage. Serial
I/O was almost exclusively used because of how modems worked,
then, for transmission over long distances. (Some may argue
that it requires fewer wires, too, in cables. But that was
less an issue then -- witness the 36-pin and 25-pin Centronix
cables/connectors which were very wire-heavy.) It turns out
that terminals, like the ASR-33 and KSR-35, were often used
without a computer for dial-up modem use over a phone line.
So they used a serial interface, by design. Which meant that
Unix needed to cope with it. But I wouldn't say "as important
as." I worked on the v6 Unix kernel, so I was slightly aware
of the situation.

Of course, I was just bringing up the PDP-10 because of its
odd way of packing 7-bit codes into a 36-bit word.

>Most serial terminal can't do more than 8 bits, and usually 7E or 7O.
>So, 8 bit char became standard.

At the time, there was no real standard at all. I saw equal
numbers of machines using EBCDIC and 6-bit (5-bit Boudot was
waning by this time but I also remember old terminals that
used 5-bit) and 7-bit. No machine used 8-bit for anything,
then. The 8th bit was always just looked at as either 'don't
punch it at all, so the paper tape is more durable' or else
make it even or odd parity. Some of us would write programs
to punch out visible English messages on the tape, which was
one of the few reasons we actually wanted control over 8 bits
(for those paper punch machines that punched 8.) I honestly
hoped, but didn't know, if ASCII would win out in the end. I
almost had a feeling then that I'd be converting from one
code to another the rest of my life, if things continued as
they were. I wanted ASCII to win, though.

Side note: there was only a gradual "coming together" on the
idea that an 8-bit byte was a "good idea." I think a lot of
people these days imagine that it was always as obvious and
as ubiquitous as it is today. But that's not entirely true.
Things went to 8-bit, gradually. Partly, because 8 bits is a
nice 2^3 power thing and partly because ASCII was gradually
taking over as a standard and would fit into a 8-bit byte,
nicely. There was a confluence of forces going on and this
kind of "precipitated out" to what it is today.

Side note again: Recently, I read a "personal history"
talking about the complexity of the ASR-33. The author has no
idea. I also remember quite well the much more complicated
KSR-35. I worked on repairing both, from time to time. By
comparison, the ASR-33 was a toy, designed for less lifetime
and less complex, as well. The earlier KSR-35 was made for
men, so to speak -- extremely well lubricated system with
real man-parts and not toy pieces. The ASR-33 had a cute
little cylinder with the letters on it, not that unlike the
typewriter ball. The KSR-35 had a large hammer block,
instead.

Jon