On 9/13/2016 11:07 PM, upsidedown@downunder.com wrote:
>> There's just WAY too much effort involved making sense of Unicode
>> in a way that your *users* will appreciate and consider intuitive.
>> When faced with the I18N/L10N issues, I found it MUCH easier to
>> *punt* (if they don't speak English, that's THEIR problem; or, a
>> task for someone with more patience/resources than me!)
>
> Why would a person have to speak English or even know latin letters to
> use a computer such as a cell phone ?

I'm not designing a cell phone.  And, both parties (user and device)
*speak* to each other.  Should I also add learning foreign pronunciation
algorithms to my list of design issues?  :>

On Tue, 13 Sep 2016 15:41:24 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 9/13/2016 3:24 AM, upsidedown@downunder.com wrote:
>> On Tue, 13 Sep 2016 01:11:09 -0700, Don Y
>> <blockedofcourse@foo.invalid> wrote:
>>
>>> On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote:
>>>> For programming, the most convenient way was to switch the terminal to
>>>> US-ASCII, but unfortunately the keyboard layout also changed.
>>>>
>>>> Digraphs and trigraphs partially solved this problem but was ugly.
>>>>
>>>> With the introduction of the 8 bit  ISO Latin, a large number of
>>>> languages and programming could be handled with that single code page,
>>>> greatly simplifying data exchange in Europe,  Americas, Africa and
>>>> Australia.
>>>
>>> The problem with Unicode is that it makes the problem space bigger.
>>> Its relatively easy for a developer to decide on appropriate
>>> syntax for file names, etc. with ASCII, Latin1, etc. But, start
>>> allowing for all these other code points and suddenly the
>>> developer needs to be a *linguist* in order to understand what
>>> should/might be an appropriate set of constraints for his particular
>>> needs.
>>
>> For _file_ names not a problem, stop scanning at next white space (or
>> null in C). Everything in between is the file name, no matter what
>> characters are used.
>
>The file system code is easy cuz it doesn't have to impart meaning to
>any particular "characters".  But, a there are typically human beings
>involved who *do* impart meaning to certain glyphs (as well as the
>OS itself), the developer can't ignore that expected meaning.
>
>I want to use the eight character name "////\\\\".  Or, "::::::::".
>Or, "><&&&&><".
>
>How do I differentiate between "RSTUV" and "RSTU<roman numeral 5>"?

Depending on usage.
 
For text rendering, either have a slightly modified glyph for U+2165
or use the fallback letter-V (usable as well as for normal text
sorting).

However, if the numeric value is of interest, a sequence of roman
numerals like LXV would be handled as numeric value of 65.

>(i.e., people already have trouble visually disambiguating between
>the '0' and 'O' glyphs, '1' and 'l', etc.)

This is a glyph problem in which similar looking symbols generate
different code points.

>Do you allow control characters in file/pathnames?

Such as null ? Why not, but the usage might be a bit problematic.

>Why *not* whitespace"?  I have a file called "My To-Do List" and
>don't have any problems accessing it, etc.

With a purely GUI user interface, not a big problem. However with
direct command line entry or scripts will require all kinds of escape
mechanisms.

>
>The developer (not the character set) is responsible for presenting
>this information in a form that is useful to the user.  E.g., MS
>will sort files named "1" through "99" as:
>     1, 2, ... 10, 11... 20, 21... 99
>whereas UNIX will use a L-R alpha sort:
>     1, 10, 11..., 2, 20, 21... 99
>
>Will the "RSTUV" user above expect to see "RSTU<roman numeral 5>"
>immediately following "RSTUU"?  And, where would "RSTUV" fit in
>that scheme?

Depending on sorting locale. Since sorting locales are usually
different from language to language, you could create your own locale
for exactly what you want.

>> For _path_ specifications, there must be some rules how to separate
>> the node, path, file name, file extension and file version from each
>> other. The separator or other syntactic elements are usually chosen
>> from the original 7 bit ASCII character set. What is between these
>> separators is irrelevant.
>
>I parse my namespaces without regard to specific separators.
>So, in some objects' identifiers, '/' can be a valid character
>while it may have delimited the start of that object name
>in the *containing* object.  E.g., the single "o/b/j/e/c/t" exists
>within the named "container" in the following:
>     container/o/b/j/e/c/t
>
>>> Also, we *tend* to associate meaning with each of the (e.g.) ASCII
>>> code points.  So, 16r31 is the *digit* '1'.  There's concensus on
>>> that interpretation.
>>
>> For _numeric_entry_ fields, including the characters 0-9 requires
>> Arabic numbers fallback mapping.
>
>Why?  shouldn't the glyphs for the Thai digits (or Laotian) also be
>recognized as numeric?  Likewise for the roman numerals, the *fullwidth*
>(arabic) digits, etc.?

Yes of course, use fallback character mapping from Thai digit 1 to
Arabic digit 1. The Roman numerals are more complex, since they do not
use the positional system.

>
>A particular braille glyph means different things based on how
>the application/user interprets it.
>
>For example, the letter 'A' is denoted by a single dot in the upper
>left corner of the cell.  'B' adds the dot immediately below while
>'C' adds the dot to the immediate right, instead.
>
>Yet, in music braille, the "note" 'A' is denoted by a cell that
>is the union of the letters 'B' and 'C' *less* the letter 'A'
>
>A    B    C
>*.   *.   **
>..   *.   ..
>..   ..   ..
>
>Notes:
>A    B    C
>.*   .*   **
>*.   **   .*
>..   ..   ..
>
>I.e., the same glyph means different things.  Imagine labeling a file
>of sound samples with their corresponding "notes"...
>
>> As strange as it sounds, the numbers used in Arabic countries differs
>> from those used in Europe.
>>
>>> However, Unicode just tabulates *glyphs* and deprives many of them
>>> of any particular semantic value.  E.g., if I pick a glyph out
>>> of U+[2800,28FF], there's no way a developer would know what my
>>> *intent* was in selecting the glyph at that codepoint.
>>
>> As long as it is just payload data, why should the programmer worry
>> about it ?
>
>Because the programmer has to deal with the glyph's *meaning* to the
>user.  Otherwise, why not just list file names and content as 4 digit
>unicode codepoints and eliminate the hassle of rendering fonts,
>imparting meaning, etc.?
>
>>> It could
>>> "mean" many different things (the developer has to IMPOSE a
>>> particular meaning -- like deciding that 'A' is not an alphabetic
>>> but, rather, a "hexadecimal character" IN THIS CONTEXT)
>>
>> In Unicode, there are code points for hexadecimal 0 to F. Very good
>> idea to separate the dual usage for A to F.
>> Has anyone actually used those  code points ?
>
>The 10 arabic digits that we are accustomed to exist as
>U+0030 - U+0039 as well as U+FF10 - U+FF19.  Then, there are
>the 10 arabic-indic digits, 10 extended arabic-indic digits,
>10 mongolian digits, 10 laotian digits, etc.

Use fallback mapping for numeric entry, use original code points for
rendering.


>[We'll ingore dingbat digits, circled digits, super/subscripted
>digits, etc.]

Use foldback mapping.

>Then, stacking glyphs (e.g., the equivalent of diacriticals)...
>
>There's just WAY too much effort involved making sense of Unicode
>in a way that your *users* will appreciate and consider intuitive.
>When faced with the I18N/L10N issues, I found it MUCH easier to
>*punt* (if they don't speak English, that's THEIR problem; or, a
>task for someone with more patience/resources than me!)

Why would a person have to speak English or even know latin letters to
use a computer such as a cell phone ?

On 9/13/2016 3:24 AM, upsidedown@downunder.com wrote:
> On Tue, 13 Sep 2016 01:11:09 -0700, Don Y
> <blockedofcourse@foo.invalid> wrote:
>
>> On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote:
>>> For programming, the most convenient way was to switch the terminal to
>>> US-ASCII, but unfortunately the keyboard layout also changed.
>>>
>>> Digraphs and trigraphs partially solved this problem but was ugly.
>>>
>>> With the introduction of the 8 bit  ISO Latin, a large number of
>>> languages and programming could be handled with that single code page,
>>> greatly simplifying data exchange in Europe,  Americas, Africa and
>>> Australia.
>>
>> The problem with Unicode is that it makes the problem space bigger.
>> Its relatively easy for a developer to decide on appropriate
>> syntax for file names, etc. with ASCII, Latin1, etc. But, start
>> allowing for all these other code points and suddenly the
>> developer needs to be a *linguist* in order to understand what
>> should/might be an appropriate set of constraints for his particular
>> needs.
>
> For _file_ names not a problem, stop scanning at next white space (or
> null in C). Everything in between is the file name, no matter what
> characters are used.

The file system code is easy cuz it doesn't have to impart meaning to
any particular "characters".  But, a there are typically human beings
involved who *do* impart meaning to certain glyphs (as well as the
OS itself), the developer can't ignore that expected meaning.

I want to use the eight character name "////\\\\".  Or, "::::::::".
Or, "><&&&&><".

How do I differentiate between "RSTUV" and "RSTU<roman numeral 5>"?

(i.e., people already have trouble visually disambiguating between
the '0' and 'O' glyphs, '1' and 'l', etc.)

Do you allow control characters in file/pathnames?

Why *not* whitespace"?  I have a file called "My To-Do List" and
don't have any problems accessing it, etc.

The developer (not the character set) is responsible for presenting
this information in a form that is useful to the user.  E.g., MS
will sort files named "1" through "99" as:
     1, 2, ... 10, 11... 20, 21... 99
whereas UNIX will use a L-R alpha sort:
     1, 10, 11..., 2, 20, 21... 99

Will the "RSTUV" user above expect to see "RSTU<roman numeral 5>"
immediately following "RSTUU"?  And, where would "RSTUV" fit in
that scheme?

> For _path_ specifications, there must be some rules how to separate
> the node, path, file name, file extension and file version from each
> other. The separator or other syntactic elements are usually chosen
> from the original 7 bit ASCII character set. What is between these
> separators is irrelevant.

I parse my namespaces without regard to specific separators.
So, in some objects' identifiers, '/' can be a valid character
while it may have delimited the start of that object name
in the *containing* object.  E.g., the single "o/b/j/e/c/t" exists
within the named "container" in the following:
     container/o/b/j/e/c/t

>> Also, we *tend* to associate meaning with each of the (e.g.) ASCII
>> code points.  So, 16r31 is the *digit* '1'.  There's concensus on
>> that interpretation.
>
> For _numeric_entry_ fields, including the characters 0-9 requires
> Arabic numbers fallback mapping.

Why?  shouldn't the glyphs for the Thai digits (or Laotian) also be
recognized as numeric?  Likewise for the roman numerals, the *fullwidth*
(arabic) digits, etc.?

A particular braille glyph means different things based on how
the application/user interprets it.

For example, the letter 'A' is denoted by a single dot in the upper
left corner of the cell.  'B' adds the dot immediately below while
'C' adds the dot to the immediate right, instead.

Yet, in music braille, the "note" 'A' is denoted by a cell that
is the union of the letters 'B' and 'C' *less* the letter 'A'

A    B    C
*.   *.   **
..   *.   ..
..   ..   ..

Notes:
A    B    C
.*   .*   **
*.   **   .*
..   ..   ..

I.e., the same glyph means different things.  Imagine labeling a file
of sound samples with their corresponding "notes"...

> As strange as it sounds, the numbers used in Arabic countries differs
> from those used in Europe.
>
>> However, Unicode just tabulates *glyphs* and deprives many of them
>> of any particular semantic value.  E.g., if I pick a glyph out
>> of U+[2800,28FF], there's no way a developer would know what my
>> *intent* was in selecting the glyph at that codepoint.
>
> As long as it is just payload data, why should the programmer worry
> about it ?

Because the programmer has to deal with the glyph's *meaning* to the
user.  Otherwise, why not just list file names and content as 4 digit
unicode codepoints and eliminate the hassle of rendering fonts,
imparting meaning, etc.?

>> It could
>> "mean" many different things (the developer has to IMPOSE a
>> particular meaning -- like deciding that 'A' is not an alphabetic
>> but, rather, a "hexadecimal character" IN THIS CONTEXT)
>
> In Unicode, there are code points for hexadecimal 0 to F. Very good
> idea to separate the dual usage for A to F.
> Has anyone actually used those  code points ?

The 10 arabic digits that we are accustomed to exist as
U+0030 - U+0039 as well as U+FF10 - U+FF19.  Then, there are
the 10 arabic-indic digits, 10 extended arabic-indic digits,
10 mongolian digits, 10 laotian digits, etc.

[We'll ingore dingbat digits, circled digits, super/subscripted
digits, etc.]

Then, stacking glyphs (e.g., the equivalent of diacriticals)...

There's just WAY too much effort involved making sense of Unicode
in a way that your *users* will appreciate and consider intuitive.
When faced with the I18N/L10N issues, I found it MUCH easier to
*punt* (if they don't speak English, that's THEIR problem; or, a
task for someone with more patience/resources than me!)

On 13/09/16 17:22, Dennis wrote:
> On 09/13/2016 05:25 AM, David Brown wrote:
>
>>
>>
>> If a change makes code clearer, then it is a good thing.  Visually
>> splitting the words in a multi-word identifier makes code clearer -
>> whether that is done using camelCase or underscores is a minor issue.
>
> I'll go off on a tangent - it can be an important issue. I once worked
> with a guy that was visually impaired and used a screen reader for much
> of his work. The underscore form would read as (spoken)word
> (spoken)underscore (spoken)word... where the camelCase would cause it to
> give up and spell it all out. We referred to the underscore form as
> "easy reader code". This was over a decade ago so screen readers may be
> smarter now.

Unless it is a screen reader specially designed for code, then I'd 
imagine it would have trouble with camelCase words.  I think Don knows 
more about this sort of program.

But you are absolutely right that there can be particular circumstances 
that determine our choices here, and have overriding importance.


>
>> Small letters are easier to read (that's why they exist!), and avoid
>> unnecessary emphasis - that makes them a good choice in most cases.  And
>> there is rarely any benefit in indicating that an identifier is a
>> constant or a macro (assuming it is defined and used sensibly) - so
>> there is no point in making such a dramatic distinction.
>>
>

On 09/13/2016 05:25 AM, David Brown wrote:

>
>
> If a change makes code clearer, then it is a good thing.  Visually
> splitting the words in a multi-word identifier makes code clearer -
> whether that is done using camelCase or underscores is a minor issue.

I'll go off on a tangent - it can be an important issue. I once worked 
with a guy that was visually impaired and used a screen reader for much 
of his work. The underscore form would read as (spoken)word 
(spoken)underscore (spoken)word... where the camelCase would cause it to 
give up and spell it all out. We referred to the underscore form as 
"easy reader code". This was over a decade ago so screen readers may be 
smarter now.

> Small letters are easier to read (that's why they exist!), and avoid
> unnecessary emphasis - that makes them a good choice in most cases.  And
> there is rarely any benefit in indicating that an identifier is a
> constant or a macro (assuming it is defined and used sensibly) - so
> there is no point in making such a dramatic distinction.
>

On 13/09/16 12:24, upsidedown@downunder.com wrote:
> On Tue, 13 Sep 2016 01:11:09 -0700, Don Y
> <blockedofcourse@foo.invalid> wrote:
> 
>> On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote:
>>> For programming, the most convenient way was to switch the terminal to
>>> US-ASCII, but unfortunately the keyboard layout also changed.
>>>
>>> Digraphs and trigraphs partially solved this problem but was ugly.
>>>
>>> With the introduction of the 8 bit  ISO Latin, a large number of
>>> languages and programming could be handled with that single code page,
>>> greatly simplifying data exchange in Europe,  Americas, Africa and
>>> Australia.
>>
>> The problem with Unicode is that it makes the problem space bigger.
>> Its relatively easy for a developer to decide on appropriate
>> syntax for file names, etc. with ASCII, Latin1, etc. But, start
>> allowing for all these other code points and suddenly the
>> developer needs to be a *linguist* in order to understand what
>> should/might be an appropriate set of constraints for his particular
>> needs.
> 
> For _file_ names not a problem, stop scanning at next white space (or
> null in C). Everything in between is the file name, no matter what
> characters are used.
> 

That sounds fine - but what is "white space" in unicode?  In ASCII, it's
space, tab, newline and carriage return characters.  In unicode, there
are far more.  Invisible spaces, non-breaking spaces, spaces of
different widths, etc.  Did you remember to check for the Ogham space
mark, for those Celtic file names?

Use UTF-8 and stop on a null character.  Just let people put spaces of
any sort in their filenames, and you only have to worry about / (or \
and : ) as special characters.

> For _path_ specifications, there must be some rules how to separate
> the node, path, file name, file extension and file version from each
> other. The separator or other syntactic elements are usually chosen
> from the original 7 bit ASCII character set. What is between these
> separators is irrelevant.
> 
> 
>> Also, we *tend* to associate meaning with each of the (e.g.) ASCII
>> code points.  So, 16r31 is the *digit* '1'.  There's concensus on
>> that interpretation.
> 
> For _numeric_entry_ fields, including the characters 0-9 requires
> Arabic numbers fallback mapping. 
> 
> As strange as it sounds, the numbers used in Arabic countries differs
> from those used in Europe.

That's because our "Arabic numerals" came from India, not Arabia -
though they were brought over to Europe by an Arabic mathematician.  I
believe that in Arabic, the term for them translates as "Indian numerals".

> 
>> However, Unicode just tabulates *glyphs* and deprives many of them
>> of any particular semantic value.  E.g., if I pick a glyph out
>> of U+[2800,28FF], there's no way a developer would know what my
>> *intent* was in selecting the glyph at that codepoint.  
> 
> As long as it is just payload data, why should the programmer worry
> about it ?
> 
>> It could
>> "mean" many different things (the developer has to IMPOSE a
>> particular meaning -- like deciding that 'A' is not an alphabetic
>> but, rather, a "hexadecimal character" IN THIS CONTEXT)
> 
> In Unicode, there are code points for hexadecimal 0 to F. Very good
> idea to separate the dual usage for A to F.
> Has anyone actually used those  code points ?
> 

There are lots of cases where the same glyph exists in multiple unicode
code points, for different purposes.  I have no idea how often they are
used.

On 13/09/16 06:22, Don Y wrote:
> On 9/12/2016 2:11 PM, David Brown wrote:
> 
>>>>> I would NOT, for example, write:
>>>>>      x=-1;
>>>>
>>>> Neither would I - I would write "x = -1;".  But I believe I am missing
>>>> your point with this example.
>>>
>>> My example would be parsed as:
>>>     x =- 1 ;
>>
>> Parsed by who or what?  A compiler would parse it as "x = -1;".  I
>> assume we
>> are still talking about C (or C++) ?
> 
> No.  Dig out a copy of Whitesmith's or pcc.  What is NOW expressed as
> "x -= 1" was originally expressed as "x =- 1".  Ditto for =+, =%, etc.
> So, if you were a lazy typist and liked omitting whitespace by thinking it
> redundant, you typed:
>     x=-1;
> and got:
>     x=x-1;
> instead of:
>     x = (-1);
> 
>> When writing, I would include appropriate spaces so that it is easy to
>> see what
>> it means.
> 
> My point is that most folks would think x=-1 (or x=+2, etc.) bound the
> sign to the value more tightly than to the assignment operator.

Of course they think "x=-1" means "x = -1" !  It has been almost forty
years since "x =- 1" has been standard C.  Most people also think that
television is in colour, you can communicate to Australia by telephone,
and flares are no longer in fashion.  Live moves on.

In this particular case, the number of people who ever learned to write
"x =- 1", and are still working as programmers (or even still alive) is
tiny.  And the number of those who failed to learn to use "x -= 1" at
least 35 years ago, must be tinier still.  Sure, you can /remember/ it -
and remember having to change old code to suit new compilers.  An old
shopkeeper may remember when he made deliveries with a horse and cart -
but he does not insist that new employees know about grooming horses.

Backwards compatibility, and compatibility with existing code, is
important.  That is why we still have many of the poor choices in the
design of C as a language - for compatibility.  But with each passing
year or decade, compatibility with the oldest code gets less and less
relevant - except to historians or the occasional very specialist cases.

Of all the lines of C code that are in use today, what fraction were
written in pre-K&R C when "x =- 1" was valid?  One in a million?  One in
a hundred million?  If we exclude code lines that have been converted to
later C standards, then I doubt if it is nearly that many.

> 
> The same sort of reasoning applies to
>    x=y/*p;
> where the '*' ends up binding more tightly to the '/' to produce the
> comment introducer, "/*".  Note that the problem persists in the
> language's current form -- but compilers often warn about it
> (e.g., detecting "nested" comments, etc.)

That is different in that the parsing rules for C are quite clear here,
and are the same as the always have been - /* starts a comment.  But
unless you have carefully created a pathological case and use a
particularly unhelpful compiler (and editor - in this century, most
programmers use editors with syntax highlighting), you are going to spot
the error very quickly.

C provides enormous opportunity for accidentally writing incorrect code
- in many cases, the result is perfectly acceptable C code and will not
trigger any warnings.  If you were to take the top hundred categories of
typos and small mistakes in C code that resulted in compilable but
incorrect code, "x=y/*p" would not feature.  It /might/ make it onto a
list of the top thousand mistakes.  It really is that irrelevant.

And it is preventable by using spaces.  There is a reason that the space
bar is the biggest key on the keyboard.

> 
>>> Nowadays, you would see this as:
>>>     x -= 1 ;
>>
>> That is a completely different statement.
> 
> No.  It is what the statement above HAS BECOME!  (as the language evolved)
> 

Take your head out of your history books.  In C, "x-=1" means "x -= 1",
while "x=-1" means "x = -1".  That is it.  It is a simple fact.  It
matters little what C used to be, decades before most programmers were born.

>>> Would you have noticed that it was NOT assigning "-1" to x?
>>> Would you have wasted that precious, limited timeslot that you
>>> had access to "The Machine" chasing down a bug that you could have
>>> avoided just by adopting a style that, instead, writes this as:
>>>     x = -1;
>>
>> I am afraid I still don't get your point.  "x = -1;" is exactly how I
>> would
>> write it.  I believe spacing like that is good style, and helps make
>> it clear
>> to the reader and the writer how the statement is parsed.  And like in
>> normal
>> text, appropriate spacing makes things easier to read.
> 
> See above.
> 
> Or, better yet, google turns up:
> 
> <http://bitsavers.informatik.uni-stuttgart.de/pdf/chromatics/CGC_7900_C_Programmers_Manual_Mar82.pdf>
> 
> 
> Perhaps an interesting read for folks didn't have to write code
> "back then".

I am not as old as you, but I have been programming for about 35 years.
 I have had my share of hand-assembling code, burning eeproms, using
punched tape, and even setting opcodes with DIP switches with a toggle
switch for the clock.

But I understand the difference between what I do /now/, and what other
programmers do /now/, and what I did long ago.

> 
>>>>> And, I *would* write:
>>>>>      p = &buffer[0];
>>>>
>>>> So would I - because I think the code looks clearer.  When I want p to
>>>> be a pointer to the first element of "buffer", that's what I write.
>>>
>>> You'll more frequently encounter:
>>>       p = buffer;
>>
>> I know.  But I prefer "p = &buffer[0];", because I think it looks
>> clearer and
>> makes more sense.  To my reading, "buffer" is an array - it does not
>> make sense
>> to assign an array to a pointer-to-int.  (I'm guessing at types here,
>> since
>> they were never declared - adjust them if necessary if "buffer" was an
>> array of
>> char or something else.)
> 
> I prefer it because I am more hardware oriented.  I think of "objects"
> (poor choice of words) residing AT memory addresses.  So, it is only
> natural for me to think about "the address of the zeroth element of
> the array".

I am a hardware man too.  And I quite appreciate that interpretation as
well.

> 
>> C converts arrays or array operations into pointers and pointer
>> operations in
>> certain circumstances.  I wish it did not - but I can't change the
>> language.
>> But just because the language allows you to write code in a particular
>> way, and
>> just because many people /do/ write code in a particular way, does not
>> necessarily mean it is a good idea to do so.
>>
>>>> I'd use parens in places that you'd consider superfluous -- but that
>>>>> made bindings very obvious (cuz the compiler would do what I *told* it
>>>>> without suggesting that I might be doing something unintended):
>>>>
>>>> Me too.  But that's for clarity of code for both the reader and the
>>>> writer - not because I worry that my tools don't follow the rules of C.
>>>
>>> The problem isn't that tools don't follow the rules.  The problem is
>>> that
>>> a compiler can be conforming WITHOUT holding your hand and reminding you
>>> that what you've typed *is* "valid C" -- just not likely the valid C
>>> that you assumed!
>>
>> Again, I can only say - get a decent compiler, and learn how to use it
>> properly.  If you cannot use good compiler (because no good compilers
>> exist for
>> your target, at least not in your budget), then use an additional
>> static error
>> checker.
> 
> For how many different processors have you coded?

I can't remember - perhaps 20 or so.

Z80, 6502, 68k, x86, MIPS, COP8, 8051, PIC16, HP43000, ARM, PPC, AVR,
AVR32, MSP430, NIOS, XMOS, TMS430, 56Fxxx

That's 18 - there are several more whose names I can't remember, and
some that I have programmed on without being familiar with the assembly
language.

>  I have compilers
> for processors that never made it into "mass production".  And, for
> processors that saw very limited/targeted support.  If I'm willing to FUND
> the development of a compiler -- knowing that I may be the only
> customer for that compiler -- then I can "buy" all sorts of capabilities
> in that compiler!
> 
> OTOH, if I have to pass those costs along to a client, he may be far less
> excited about how "good the tool is" and, instead, wonder why *I* can't
> compensate for the tool's quality.

Long ago, anyone wanting to make a C compiler for a new processor would
either buy the front end and write their own code generator, or would
pay a compiler company to write the code generator to go with their
existing front end.  Only hobby developers would write their own C front
end - for professionals, it was not worth the money unless they were a
full-time compiler development company.

So you got your front-end already made, with whatever features and
warnings it supported.  Clearly, the range of features would vary here.
 And sometimes you wanted to add your own extra features for better
support of the target.

Now, anyone wanting to make a C compiler for a new processor starts with
either gcc or clang, and writes the code generator - again, the
front-end is there already.

> 
> What were tools like 20 years ago?  Could you approach your client/employer
> and make the case for investing in the development of a compiler with
> 2016 capabilities?  Running, *effectively*, on 1995 hardware and OS?
> Or, would you, instead, argue that the project should be deferred until
> the tools were more capable?

The tools I used 20 years ago were not as good as the ones I use now.
And the tools I used 20 years ago were not as good as the best ones
available 20 years ago - the budget did not stretch.

But now, the budget /does/ stretch to high quality tools - for most
microcontrollers, /everybody's/ budget stretches far enough because high
quality compiler tools are free or very cheap.  There are a few
microcontrollers where that is not the case (the 8051, the unkillable
dinosaur, being an example), but tool quality and price is a factor many
people consider when choosing a microcontroller.

And how relevant are 20 year old tools to the work I do /today/, writing
code /today/ ?  Not very relevant at all, except for occasional
maintenance of old projects.

> 
> Or, would you develop work styles that allowed you to produce reliable
> products with the tools at your disposal??

The whole point is that 20 years ago I had to have a style that made
sense 20 years ago with the tools of that era.  Now I have a style that
is suited to the tools of /this/ era.  Not a lot has changed, because
the guiding principles have been the same, but many details have
changed.  Function-like macros have morphed into static inline
functions, home-made size-specific types have changed to uint16_t and
friends, etc.  Some of my modern style features, such as heavy use of
static assertions, could also have been used 20 years ago - I have
learned with time and experience.

But I refuse to write modern code of poorer quality and with fewer
features simply because those features were not available decades ago -
or even because those features are not available on all modern compilers.

> 
>> To be used in a safe, reliable, maintainable, and understandable way,
>> you need
>> to limit the C features you use and the style you use.  (This applies
>> to any
>> programming language, but perhaps more to C than other languages.) 
>> And you use
>> whatever tools you can to help spot errors as early as possible.
>>
>>>>>      y = x / (*p);
>>>>
>>>> Me too.
>>>
>>> Because the compiler would gladly think "y=x/*p;" was THREE characters
>>> before the start of a comment -- without telling you that it was
>>> interpreting it as such.  Again, you turn the crank and then spend
>>> precious time staring at your code, wondering why its throwing an error
>>> when none should exist (in YOUR mind).  Or, worse, mangling input:
>>>
>>>          y=/*p      /* this is the intended comment */
>>>            x=3;
>>>
>>
>> Again, spaces are your friend.
> 
> Spaces (or parens) are a stylistic choice.  The language doesn't mandate
> their use.  E.g., this is perfectly valid:
>      y=/*I want to initialize two variables to the same value
>          in a single stmt*/x=3;
> 
> It's a bad coding STYLE but nothing that the compiler SHOULD complain
> about!

Compilers can, do, and should complain about particularly bad style.
It's important that such complaints are optional - and for
compatibility, they are usually disabled by default.  There is no clear
division between what is simply a stylistic choice ("x=3" vs. "x = 3",
for example), and what is a really /bad/ idea, such as putting comments
in the middle of a statement.  Thus any complaints about style need to
be configurable.

But there is no doubt that such warnings can be helpful in preventing
bugs.  Warning on "if (x = 3)" is a fine example.  Another is gcc 6's
new "-Wmisleading-indentation" warning that will warn on:

if (x == 1)
	a = 2;
	b = 3;

Code like that is wrong - it is bad code, even if it is perfectly
legitimate C code, and even if it happens to work.  It is a good thing
for compilers to complain about it.

> 
>>>> Thus I would always write:
>>>>
>>>>     if (expression == constant) { .... }
>>>>
>>>> (I'm not a fan of shouting either - I left all-caps behind with BASIC
>>>> long ago.)
>>>
>>> I ALWAYS use caps for manifest constants.
>>
>> Many people do.  I don't.  I really do not think it helps in any way,
>> and it
>> gives unnecessary emphasis to a rather minor issue, detracting from the
>> readability of the code.
> 
> Would you write:
> const zero = 0;
> const nine = 9;
>     for (index = zero; index < nine; index++)...
> Or:
> const start = zero;
> const end = nine;
>     for (index = start; index < end; index++)...
> Or:
>     for (index = START; index < END; index++)...
> 

No, I would not write any of that.

I would not write "const zero = 0" for several reasons.  First, it is
illegal C - it needs a type.  Second, such constants are usually best
declared static.  Third, it is pointless making a constant whose name is
its value.

But I /might/ write:

static const int start = 0;
static const int end = 9;

for (int index = start; index < end; index++) { ... }

Even that is quite unlikely - "start" and "end" would usually have far
better names, while "index" is almost certainly better written as "i".
(But that is a matter of style :-) ).

> The latter makes it abundantly clear to me (without having to chase down
> the declaration/definition of "start", "end", "zero" or "nine".  (who
> is to say whether "zero" actually maps to '0' vs. "273" -- 0 degrees K!)

The only thing the START and END form makes abundantly clear is that you
really, really want everyone looking at the code to see at a glance that
START and END won't change - and that is far more important than
anything else about the code, such as what it does.

If "zero" does not map to zero, don't call it "zero".  Call it "zeroK",
or "lowestTemperature", or whatever.

>>
>> And again, you are missing the point.  I am not writing code for a 40
>> year old
>> compiler.  Nor is the OP.  For the most part, I don't need to write
>> unpleasant
>> code to deal with outdated tools.
> 
> No, YOU are missing the point!  

Certainly we seem to be talking at cross-purposes here.  It is a matter
of viewpoint who is "missing the point" - probably both of us.

> I'm writing code with 40 years of
> "respectible track record".  You're arguing that I should "fix"
> something (i.e., my style preferences) that isn't broken.  Because
> it differs from what your *20* year track record has found to be
> acceptable.  Should we both wait and see what next year's
> crop of developers comes up with?  Maybe Hungarian notation will be
> supplanted by Vietnamese notation?  Or, we'll decide that using identifiers
> expressed in Esperanto is far more universally readable?

Yes, I am arguing that if something in your style is no longer the best
choice for modern programming, then you certainly should consider
changing it.  Clearly you will not do so without good reason, which is
absolutely fine.

I am also arguing against recommending new people adopt a style whose
benefits are based on ancient tools and your own personal habits.
Modern programmers should adopt a forward-looking style that lets them
make the take advantage of modern tools - there is no benefit in
adopting your habits or my habits, simply because /we/ are used to them.
 There are benefits in using, or at least being familiar with, common
idioms and styles.  But that should not be an overriding concern.  Keep
good habits, if they are still good - but drop bad habits.

> 
>> I have, in the past, used far more limited compilers.  I have worked with
>> compilers that only supported C90/ANSI, meaning I can't mix
>> declarations and
>> statements.  I have worked with compilers that have such god-awful code
>> generation that I had to think about every little statement, and limit
>> the
>> number of local variables I used.  Very occasionally, I have to work
>> on old
>> projects that were made using such tools - and I have to go back to
>> the dark
>> ages for a while.
>>
>> Perhaps, long ago when the tools I had were poor and the company I
>> worked for
>> did not have the budget for a linter, it might have made sense to
>> write "if
>> (CONSTANT == expression)".  But not now - and I don't think there will
>> be many
>> developers working today for which it would still make sense.
> 
> I can then argue that we shouldn't bother with such an archaic language,
> at all!  Look at all the cruft it brings along with it!  Why not wait to
> see what pops up tomorrow as the language (and development style) du jour?
> Or, why risk being early adopters -- let's wait a few *weeks* before
> adopting tomorrow's advances!

There is a balance between choosing something that is mature, field
proven and familiar, and choosing something that is newer and has
benefits such as efficiency, clarity, flexibility, safety, etc.

I think that the large majority of work done in C would be better
written in a different language, were it not for two factors - existing
code written in C, and existing experience of the programmer in C.  For
most programming tasks, C /is/ archaic - it is limited, inflexible, and
error prone.  For some tasks, its limitations and its stability as a
language are an advantage.  But for many tasks, if one could disregard
existing C experience, it is a poor choice of language.

Thus a lot of software on bigger systems is written in higher level
languages, such as Python, Ruby, etc.  A lot of software in embedded
systems are written in C++ to keep maximal run-time efficiency while
getting more powerful development features.  New languages such as Go
are developed to get a better balance of the advantages of different
languages and features.

For a good deal of embedded development, the way forward is to avoid
archaic and brain-dead microcontrollers such as the 8051 or the PIC.
Stick to solid but modern processors such as ARM or MIPS cores.  And
move to C++ - /if/ you are good enough to learn and understand how to
use that language well in embedded systems.

I would wait a few years, not weeks, but not decades, before adopting
new languages for embedded programming.  Maybe Go will be a better
choice in a few years.

We've been through all this with "C vs. assembly" - and there are plenty
of people that still use assembly programming for embedded systems
because "it was good enough for my grandfather, it's good enough for
me", or because they simply refuse to move forward with the times.  Like
assembly, C will never go away - but it /will/ move further and further
into niche areas, and be used "for compatibility with existing code and
systems".

In the meantime, we can try and write our C code in the best way that
modern tools allow.

> 
> Programming is still an art, not a science.  You rely on techniques that
> have given you success in the past.  When 25% of product returns are due
> to "I couldn't figure out how to USE this PoS!", that suggests current
> development styles are probably "lacking".
> <https://books.google.com/books?id=pAsCYZCMvOAC&pg=PA130&lpg=PA130&dq=product+returns+confusion&source=bl&ots=P_v4nTI0m8&sig=XZ6VGEOtuyJOG7Kwkcd2SMV2_6w&hl=en&sa=X&ved=0ahUKEwiDwtDKt4vPAhVk0YMKHfO-C-AQ6AEINjAD>
> 

We have only been talking about coding styles, which are a small part of
development styles.  And development styles are only a small part of
products as a whole.

Learning to use spaces appropriately and not using Yoda-speak for your
conditionals will not mean end-users will automatically like your product!

> 
>>>>> [would you consider "if (foo==bar())" as preferred to "if
>>>>> (bar()==foo)"?]
>>>>
>>>> I'd usually try to avoid a function call inside an expression inside a
>>>> conditional, but if I did then the guiding principle is to make it
>>>> clear
>>>> to read and understand.
>>>
>>> Which do you claim is cleared?
>>> Imagine foo is "errno".
>>> After answering, rethink it knowing that errno is actually a function
>>> invocation on this platform.
>>
>> Neither is clear enough.
> 
> So, what's your solution?

It would depend on the rest of the context, which is missing here, but
I'd guess it would be something like:

int noOfWhatsits = bar();
if (noOfWhatsits == foo) { ... }

Local variables are free, and let you divide your code into clear and
manageable parts, and their names let you document your code.  I use
them a lot.

(I have also used older and weaker compilers that generated poorer code
if you had lots of local variables - I am glad my current style does not
have to handle such tools.)

> 
>>>>> Much of this is a consequence of how I learned to "read" (subvocalize)
>>>>> code to myself:
>>>>>      x = y;
>>>>> is "x gets the value of y", not "x equals y".  (how would you read
>>>>> x ::=
>>>>> y ?)
>>>>>      if ( CONSTANT == expression )
>>>>> is "if expression yields CONSTANT".  Catching myself saying "if
>>>>> CONSTANT
>>>>> gets
>>>>> expression" -- or "if variable gets expression":
>>>>>      if ( CONSTANT = expression )
>>>>>      if ( variable = expression )
>>>>> is a red flag that I'm PROBABLY not doing what I want to do.
>>>>>
>>>>> I'll wager that most folks read:
>>>>>      x = y;
>>>>> as:
>>>>>      x equals y
>>>>
>>>> I can't say I vocalize code directly - how does one read things like
>>>> brackets or operators like -> ?  But "x = y" means "set x equal to y".
>>>
>>> And, x ::= y effectively says the same -- do you pronounce the colons?
>>> Instead, use a verb that indicates the action being performed:
>>>       x gets y
>>
>> I don't pronounce colons.  I also don't use a programming language
>> with a ::=
>> operator.  But if I did want to pronounce it, then "x gets y" would be
>> fine.
> 
> Then you come up with alternative ways of conveying the information
> present in that symbology.
> 
> E.g., "::=" (used to initialize a variable) vs '=' (used to define
> *constants*, but not variables, and test for value equality) vs ":=:"
> (to test for address equality) vs ":=" (chained assignment) in Algol;
> '=' vs "==" in C; ":=" vs. '=' and ':' and "==" in Limbo; etc.
> 

Different languages have different symbols - yes, I know that.

> When conversing with someone not fluent in a language, the actual
> punctuation plays an important role.

I am lucky enough to have full use of my hands and my eyes, as well as
my mouth.  The same applies to other people I discuss code with.  I
would not try to distinguish "x = y" and "x == y" verbally - I would
/write/ it.

>  Saying "x gets y" to someone
> who isn't familiar with Algol's "::=" would PROBABLY find them
> writing "x = y".  When "reading" Limbo code to someone, I resort
> to "colon-equals", "equals" and "colon" so I'm sure they know
> exactly what I'm saying (because the differences are consequential)
> 
>>> I learned to code with left-facing arrows instead of equal signs
>>> to make the "assignment" more explicit.
>>
>> I think that <- or := is a better choice of symbol for assignment.  The
>> designers of C made a poor choice with "=".
>>
>>>>> and subtly change their intent, but NOT their words, when
>>>>> encountering:
>>>>>      if ( x == y )
>>>>> as:
>>>>>      if x equals y
>>>>
>>>> "if x equals y", or "if x is equal to y".
>>>>
>>>>> (how would they read "if ( x = y )"?)
>>>>
>>>> I read it as "warning: suggest parentheses around assignment used as
>>>> truth value" :-)
>>>
>>> No, your *compiler* tells you that.  If you've NEVER glossed over
>>> such an assignment in your code, you've ALWAYS had a compiler holding
>>> your hand for you.  I'm glad I didn't have to wait 10 years for such
>>> a luxury before *I* started using C.
>>
>> I have used warnings on such errors for as long as I have used
>> compilers that
>> conveniently warned about them.  But I can say it is very rare that a
>> compiler
>> /has/ warned about it - it is not a mistake I have made more than a
>> couple of
>> times.  Still, I want my tools to warn me if it were to happen.
> 
> I want my tools to warn me of everything that they are capable of
> detecting.

Good.  Then we agree - make the best use of the best tools available.

> INCLUDING the tool that occupies my cranial cavity!

I agree.  That means not distracting it with things that are easily
found automatically by compilers and other tools, so that your mind can
concentrate on the difficult stuff.

> 
>>> I'll frequently store pointers to functions where you might store
>>> a variable that you conditionally examine (and then dispatch to
>>> one of N places in your code).  In my case, I just invoke the
>>> function THROUGH the pointer -- the tests were done when I *chose*
>>> which pointer to stuff into that variable (why should I bother to
>>> repeat them each time this code fragment is executed?)
>>>
>>> When you do code reviews, do you all sit around with laptops and
>>> live code copies?  Or, do you pour over LISTINGS with red pens?
>>> Do you expect people to just give the code a cursory once-over?
>>> Or, do you expect them to *read* the code and act as "human
>>> compilers/processors"?
>>
>> I expect code to be as simple to understand as possible, so that it
>> takes as
>> little time or effort as possible for other people to read. That way
>> others can
>> spend their effort on confirming that what the code does is the right
>> thing,
>> rather than on figuring out what the code does first.
> 
> What's simple to me might not be simple to you!  E.g., I naturally
> think in terms of pointers.  Others prefer array indexes.  For me
> to jump/call *through* a pointer is far more obvious than encoding some
> meaning into a flag at some point in the program.  Then, decoding that
> flag at another point and dispatching based on that decode!  The latter
> requires keeping two pieces of code in sync (encode & decode).  The
> former puts everything in one place (encoding).

Agreed - there is plenty of scope for personal variation and style here.

> 
> Should I write code at a level that a newbie developer can understand?
> Should I limit myself to how expressive and productive I can be out of fear
> that someone might not be quick to grasp what I've written?

The right balance here will vary depending on the circumstances - there
is no single correct answer (but there are many wrong answers).

> 
>>> There is no such thing as a perfect style.  You adopt a style that
>>> works for you and measure its success by how buggy the code is (or
>>> is not) that you produce and the time that it takes you to produce it.
>>
>> Yes.
>>
>>>
>>> I suspect if I sat you down with some of these early and one-of-a-kind
>>> compilers, you'd spend a helluva lot more time chasing down bugs that
>>> the compiler didn't hint at.  And, probably run the risk of some
>>> creeping
>>> into your release (x=y=3) that you'd never caught.
>>
>> No.
>>
>> Static checking by the compiler is not a substitute for writing
>> correct code.
> 
> Where am I writing "INcorrect code"?  It meets the specifications of the
> language.  It compiles without errors or warnings.  It fulfills the goals
> of the specification.

There is more to writing good code than that (and I know you know that).
 Whether you call bad code that happens to work "correct" or "incorrect"
is up to you.

But my point here was that you seem to imply I write code with little
regard for it being correct or incorrect, and then rely on the compiler
to find my errors.

> 
> It's just that *you* don't like my style!  Is that what makes it "not
> correct"?

I suspect that in the great majority of cases where I don't like your
style, then it is nothing more than that.  I might think it is not clear
or easy to understand, or not as maintainable as it could be, or simply
looks ugly and hard to read, or that it is not as efficient as other
styles.  I can't say for sure, since about the only things I know for
sure about your style is that you like to write "if (3 == x)" rather
than "if (x == 3)", and that you like function pointers.

It takes a lot more than that for me to label code as "incorrect" or
"bad" (assuming the final result does the job required).

> If I replaced the identifiers for manifest constants with lowercase
> symbols,
> would that make it MORE correct?  Should I use camelcase identifiers?
> Hungarian notation?  Embedded underscores?   How do any of these make the
> code more or less "correct"?

If a change makes code clearer, then it is a good thing.  Visually
splitting the words in a multi-word identifier makes code clearer -
whether that is done using camelCase or underscores is a minor issue.
Small letters are easier to read (that's why they exist!), and avoid
unnecessary emphasis - that makes them a good choice in most cases.  And
there is rarely any benefit in indicating that an identifier is a
constant or a macro (assuming it is defined and used sensibly) - so
there is no point in making such a dramatic distinction.

> 
>> You make it sound like I program by throwing a bunch of random symbols
>> at the
>> compiler, then fixing each point it complains about.  The checking is
>> automated
>> confirmation that I am following the rules I use when coding, plus
>> convenient
>> checking of silly typos that anyone makes on occasion.
>>
>>> Instead of pushing an update to the customer over the internet, you'd
>>> send someone out to their site to install a new set of EPROMs.  Or,
>>> have them mail the product back to you as their replacement arrived
>>> (via USPS).  Kind of annoying for the customer when he's got your
>>> device on his commercial fishing boat.  Or, has to lose a day's
>>> production
>>> while it's removed from service, updated and then recertified.
>>
>> I have mailed EPROMs to customers, or had products delivered back for
>> program
>> updates.  But never due to small mistakes of the sort we are
>> discussing here.
> 
> I've *never* had to update a product after delivery.  Because the cost of
> doing so would easily exceed the profit margin in the product!
> 
> I've had clients request modifications to a product; or tweeks for
> specific customers.  But, those aren't "bug fixes", they're effectively
> "new products" that leverage existing hardware.

And those are updates after delivery.  There are many perfectly good
reasons for updating software after delivery.  All I said was that I
have provided updates in a variety of ways, and for a variety of reasons
- but never for the sort of mistakes that you seem to think you are
immune to because you learned to program with limited tools, while you
think /I/ make them all the time because I take advantage of modern tool
features.

> "No, there will be no upgrades.  You will get it right the first time!"
> 

That is fine for some projects.  I have had cards that have been
cemented into the ocean floor - upgrades are not physically possible.
And on other projects, customers want to be able to have new features or
changes at a later date.

I think everyone agrees that shipping something that does not work
correctly, and updating for bug fixes, is always a bad idea - just /how/
bad it is will vary.

>> The new programmer should learn to take advantage of new tools, and
>> concentrate
>> efforts on problems that are relevant rather than problems are no
>> longer an
>> issue (to the extent that they ever /were/ an issue). And if that new
>> programmer is faced with poorer tools, then he or she will have to
>> learn the
>> idiosyncrasies at the time.  And the lack of warnings on "if (x = 1)" is
>> unlikely to be the most important point.
> 
> Without the warning, the developer is likely to waste a boatload of
> time "seeing what he wants to see" and not what the *compiler* sees.

Without decent warnings, developers (especially new ones) are likely to
spend a good deal more time chasing small bugs than they would if the
compiler or linter helped them out.  But why do you think this
particular issue is so important?  New C programmers are often told how
important it is to distinguish between = and ==, so it is something they
look out for, and avoid in most cases.  And the Yoda rule only helps in
/some/ cases where you have comparisons - you still need to get your =
and == right everywhere else.

> 
>>> There's a reason you can look at preprocessor output and ASM
>>> sources.  Because they are the only way to understand some nasty
>>> bugs that may arise from your misunderstanding of what a tool
>>> SHOULD be doing; *or*, from a defective tool!  (a broken tool
>>> doesn't have to tell you that it's broken!)
>>
>> I have almost /never/ had occasion to look at pre-processor output. 
>> But I do
>> recommend that embedded programmers should be familiar enough with the
>> assembly
>> for their targets that they can look at and understand the assembly
>> listing.
> 
> You've probably never "abused" the preprocessor in creative ways!  :>
> 

I have abused preprocessors a bit (any use of ## is abuse!), but I
haven't had to look at the output directly to debug that abuse.  Maybe I
haven't been creative enough in my abuses here.

On Tue, 13 Sep 2016 01:11:09 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote:
>> For programming, the most convenient way was to switch the terminal to
>> US-ASCII, but unfortunately the keyboard layout also changed.
>>
>> Digraphs and trigraphs partially solved this problem but was ugly.
>>
>> With the introduction of the 8 bit  ISO Latin, a large number of
>> languages and programming could be handled with that single code page,
>> greatly simplifying data exchange in Europe,  Americas, Africa and
>> Australia.
>
>The problem with Unicode is that it makes the problem space bigger.
>Its relatively easy for a developer to decide on appropriate
>syntax for file names, etc. with ASCII, Latin1, etc. But, start
>allowing for all these other code points and suddenly the
>developer needs to be a *linguist* in order to understand what
>should/might be an appropriate set of constraints for his particular
>needs.

For _file_ names not a problem, stop scanning at next white space (or
null in C). Everything in between is the file name, no matter what
characters are used.

For _path_ specifications, there must be some rules how to separate
the node, path, file name, file extension and file version from each
other. The separator or other syntactic elements are usually chosen
from the original 7 bit ASCII character set. What is between these
separators is irrelevant.

>Also, we *tend* to associate meaning with each of the (e.g.) ASCII
>code points.  So, 16r31 is the *digit* '1'.  There's concensus on
>that interpretation.

For _numeric_entry_ fields, including the characters 0-9 requires
Arabic numbers fallback mapping. 

As strange as it sounds, the numbers used in Arabic countries differs
from those used in Europe.

>However, Unicode just tabulates *glyphs* and deprives many of them
>of any particular semantic value.  E.g., if I pick a glyph out
>of U+[2800,28FF], there's no way a developer would know what my
>*intent* was in selecting the glyph at that codepoint.  

As long as it is just payload data, why should the programmer worry
about it ?

>It could
>"mean" many different things (the developer has to IMPOSE a
>particular meaning -- like deciding that 'A' is not an alphabetic
>but, rather, a "hexadecimal character" IN THIS CONTEXT)

In Unicode, there are code points for hexadecimal 0 to F. Very good
idea to separate the dual usage for A to F.
Has anyone actually used those  code points ?

On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote:
> For programming, the most convenient way was to switch the terminal to
> US-ASCII, but unfortunately the keyboard layout also changed.
>
> Digraphs and trigraphs partially solved this problem but was ugly.
>
> With the introduction of the 8 bit  ISO Latin, a large number of
> languages and programming could be handled with that single code page,
> greatly simplifying data exchange in Europe,  Americas, Africa and
> Australia.

The problem with Unicode is that it makes the problem space bigger.
Its relatively easy for a developer to decide on appropriate
syntax for file names, etc. with ASCII, Latin1, etc. But, start
allowing for all these other code points and suddenly the
developer needs to be a *linguist* in order to understand what
should/might be an appropriate set of constraints for his particular
needs.

Also, we *tend* to associate meaning with each of the (e.g.) ASCII
code points.  So, 16r31 is the *digit* '1'.  There's concensus on
that interpretation.

However, Unicode just tabulates *glyphs* and deprives many of them
of any particular semantic value.  E.g., if I pick a glyph out
of U+[2800,28FF], there's no way a developer would know what my
*intent* was in selecting the glyph at that codepoint.  It could
"mean" many different things (the developer has to IMPOSE a
particular meaning -- like deciding that 'A' is not an alphabetic
but, rather, a "hexadecimal character" IN THIS CONTEXT)

On Mon, 12 Sep 2016 22:52:00 +0200, Hans-Bernhard Br&#4294967295;ker
<HBBroeker@t-online.de> wrote:

>Am 11.09.2016 um 15:14 schrieb upsidedown@downunder.com:
>> On Sun, 11 Sep 2016 13:36:59 +0200, Hans-Bernhard Br&#4294967295;ker
>> <HBBroeker@t-online.de> wrote:
>
>>> It may appear to provide a solution to the issue: "How to keep non-ASCII
>>> characters in a plain char?"
>
>> Unsigned char was the answer for at least 5-10 years before UCS2.
>
>It really wasn't, because as a solution it was incoherent, incomplete, 
>and insular.  Anyway, there really weren't that many years between C90 
>and UCS2.
>
>8-bit unsigned char for non-ASCII characters created more problems than 
>it ever solved.  Instead of having one obvious, while painful 
>restriction, we now had
>
>*) dozens of conflicting interpretations of the same 256 values,
>*) no standard way of knowing which of those applied to given input
>*) almost no way of combining multiple such streams internally
>*) no useful way of outputting a combined stream

Exactly the same problems existed with the 7 bit ISO646 from the
1960/70's.

In ISO 646 some of the US-ASCII code points were replaced by a
national character, such as @ [ \ ] { | } _ ^  ~ $ #

The replacement characters were typically different for each country
or at least each language. Try to combine texts from different
languages produced a mess. This was a problem especially in countries
with multiple official languages. Also writing names of foreign origin
caused problems.

Think about Pascal or C-programming, with missing [ \ ] { | }  ^
characters, which in one code page might have been displayed as &#4294967295; &#4294967295; &#4294967295;
&#4294967295; &#4294967295; &#4294967295; in one code page but  something different in an other code page.
This was especially problematic for printing with a language specific
printer (e.g. daisy chain) giving quite different looking printouts
depending on which printer was used.

For programming, the most convenient way was to switch the terminal to
US-ASCII, but unfortunately the keyboard layout also changed.

Digraphs and trigraphs partially solved this problem but was ugly.

With the introduction of the 8 bit  ISO Latin, a large number of
languages and programming could be handled with that single code page,
greatly simplifying data exchange in Europe,  Americas, Africa and
Australia.