Reply by Don Y September 14, 20162016-09-14
On 9/13/2016 11:07 PM, upsidedown@downunder.com wrote:
>> There's just WAY too much effort involved making sense of Unicode >> in a way that your *users* will appreciate and consider intuitive. >> When faced with the I18N/L10N issues, I found it MUCH easier to >> *punt* (if they don't speak English, that's THEIR problem; or, a >> task for someone with more patience/resources than me!) > > Why would a person have to speak English or even know latin letters to > use a computer such as a cell phone ?
I'm not designing a cell phone. And, both parties (user and device) *speak* to each other. Should I also add learning foreign pronunciation algorithms to my list of design issues? :>
Reply by September 14, 20162016-09-14
On Tue, 13 Sep 2016 15:41:24 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 9/13/2016 3:24 AM, upsidedown@downunder.com wrote: >> On Tue, 13 Sep 2016 01:11:09 -0700, Don Y >> <blockedofcourse@foo.invalid> wrote: >> >>> On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote: >>>> For programming, the most convenient way was to switch the terminal to >>>> US-ASCII, but unfortunately the keyboard layout also changed. >>>> >>>> Digraphs and trigraphs partially solved this problem but was ugly. >>>> >>>> With the introduction of the 8 bit ISO Latin, a large number of >>>> languages and programming could be handled with that single code page, >>>> greatly simplifying data exchange in Europe, Americas, Africa and >>>> Australia. >>> >>> The problem with Unicode is that it makes the problem space bigger. >>> Its relatively easy for a developer to decide on appropriate >>> syntax for file names, etc. with ASCII, Latin1, etc. But, start >>> allowing for all these other code points and suddenly the >>> developer needs to be a *linguist* in order to understand what >>> should/might be an appropriate set of constraints for his particular >>> needs. >> >> For _file_ names not a problem, stop scanning at next white space (or >> null in C). Everything in between is the file name, no matter what >> characters are used. > >The file system code is easy cuz it doesn't have to impart meaning to >any particular "characters". But, a there are typically human beings >involved who *do* impart meaning to certain glyphs (as well as the >OS itself), the developer can't ignore that expected meaning. > >I want to use the eight character name "////\\\\". Or, "::::::::". >Or, "><&&&&><". > >How do I differentiate between "RSTUV" and "RSTU<roman numeral 5>"?
Depending on usage. For text rendering, either have a slightly modified glyph for U+2165 or use the fallback letter-V (usable as well as for normal text sorting). However, if the numeric value is of interest, a sequence of roman numerals like LXV would be handled as numeric value of 65.
>(i.e., people already have trouble visually disambiguating between >the '0' and 'O' glyphs, '1' and 'l', etc.)
This is a glyph problem in which similar looking symbols generate different code points.
>Do you allow control characters in file/pathnames?
Such as null ? Why not, but the usage might be a bit problematic.
>Why *not* whitespace"? I have a file called "My To-Do List" and >don't have any problems accessing it, etc.
With a purely GUI user interface, not a big problem. However with direct command line entry or scripts will require all kinds of escape mechanisms.
> >The developer (not the character set) is responsible for presenting >this information in a form that is useful to the user. E.g., MS >will sort files named "1" through "99" as: > 1, 2, ... 10, 11... 20, 21... 99 >whereas UNIX will use a L-R alpha sort: > 1, 10, 11..., 2, 20, 21... 99 > >Will the "RSTUV" user above expect to see "RSTU<roman numeral 5>" >immediately following "RSTUU"? And, where would "RSTUV" fit in >that scheme?
Depending on sorting locale. Since sorting locales are usually different from language to language, you could create your own locale for exactly what you want.
>> For _path_ specifications, there must be some rules how to separate >> the node, path, file name, file extension and file version from each >> other. The separator or other syntactic elements are usually chosen >> from the original 7 bit ASCII character set. What is between these >> separators is irrelevant. > >I parse my namespaces without regard to specific separators. >So, in some objects' identifiers, '/' can be a valid character >while it may have delimited the start of that object name >in the *containing* object. E.g., the single "o/b/j/e/c/t" exists >within the named "container" in the following: > container/o/b/j/e/c/t > >>> Also, we *tend* to associate meaning with each of the (e.g.) ASCII >>> code points. So, 16r31 is the *digit* '1'. There's concensus on >>> that interpretation. >> >> For _numeric_entry_ fields, including the characters 0-9 requires >> Arabic numbers fallback mapping. > >Why? shouldn't the glyphs for the Thai digits (or Laotian) also be >recognized as numeric? Likewise for the roman numerals, the *fullwidth* >(arabic) digits, etc.?
Yes of course, use fallback character mapping from Thai digit 1 to Arabic digit 1. The Roman numerals are more complex, since they do not use the positional system.
> >A particular braille glyph means different things based on how >the application/user interprets it. > >For example, the letter 'A' is denoted by a single dot in the upper >left corner of the cell. 'B' adds the dot immediately below while >'C' adds the dot to the immediate right, instead. > >Yet, in music braille, the "note" 'A' is denoted by a cell that >is the union of the letters 'B' and 'C' *less* the letter 'A' > >A B C >*. *. ** >.. *. .. >.. .. .. > >Notes: >A B C >.* .* ** >*. ** .* >.. .. .. > >I.e., the same glyph means different things. Imagine labeling a file >of sound samples with their corresponding "notes"... > >> As strange as it sounds, the numbers used in Arabic countries differs >> from those used in Europe. >> >>> However, Unicode just tabulates *glyphs* and deprives many of them >>> of any particular semantic value. E.g., if I pick a glyph out >>> of U+[2800,28FF], there's no way a developer would know what my >>> *intent* was in selecting the glyph at that codepoint. >> >> As long as it is just payload data, why should the programmer worry >> about it ? > >Because the programmer has to deal with the glyph's *meaning* to the >user. Otherwise, why not just list file names and content as 4 digit >unicode codepoints and eliminate the hassle of rendering fonts, >imparting meaning, etc.? > >>> It could >>> "mean" many different things (the developer has to IMPOSE a >>> particular meaning -- like deciding that 'A' is not an alphabetic >>> but, rather, a "hexadecimal character" IN THIS CONTEXT) >> >> In Unicode, there are code points for hexadecimal 0 to F. Very good >> idea to separate the dual usage for A to F. >> Has anyone actually used those code points ? > >The 10 arabic digits that we are accustomed to exist as >U+0030 - U+0039 as well as U+FF10 - U+FF19. Then, there are >the 10 arabic-indic digits, 10 extended arabic-indic digits, >10 mongolian digits, 10 laotian digits, etc.
Use fallback mapping for numeric entry, use original code points for rendering.
>[We'll ingore dingbat digits, circled digits, super/subscripted >digits, etc.]
Use foldback mapping.
>Then, stacking glyphs (e.g., the equivalent of diacriticals)... > >There's just WAY too much effort involved making sense of Unicode >in a way that your *users* will appreciate and consider intuitive. >When faced with the I18N/L10N issues, I found it MUCH easier to >*punt* (if they don't speak English, that's THEIR problem; or, a >task for someone with more patience/resources than me!)
Why would a person have to speak English or even know latin letters to use a computer such as a cell phone ?
Reply by Don Y September 13, 20162016-09-13
On 9/13/2016 3:24 AM, upsidedown@downunder.com wrote:
> On Tue, 13 Sep 2016 01:11:09 -0700, Don Y > <blockedofcourse@foo.invalid> wrote: > >> On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote: >>> For programming, the most convenient way was to switch the terminal to >>> US-ASCII, but unfortunately the keyboard layout also changed. >>> >>> Digraphs and trigraphs partially solved this problem but was ugly. >>> >>> With the introduction of the 8 bit ISO Latin, a large number of >>> languages and programming could be handled with that single code page, >>> greatly simplifying data exchange in Europe, Americas, Africa and >>> Australia. >> >> The problem with Unicode is that it makes the problem space bigger. >> Its relatively easy for a developer to decide on appropriate >> syntax for file names, etc. with ASCII, Latin1, etc. But, start >> allowing for all these other code points and suddenly the >> developer needs to be a *linguist* in order to understand what >> should/might be an appropriate set of constraints for his particular >> needs. > > For _file_ names not a problem, stop scanning at next white space (or > null in C). Everything in between is the file name, no matter what > characters are used.
The file system code is easy cuz it doesn't have to impart meaning to any particular "characters". But, a there are typically human beings involved who *do* impart meaning to certain glyphs (as well as the OS itself), the developer can't ignore that expected meaning. I want to use the eight character name "////\\\\". Or, "::::::::". Or, "><&&&&><". How do I differentiate between "RSTUV" and "RSTU<roman numeral 5>"? (i.e., people already have trouble visually disambiguating between the '0' and 'O' glyphs, '1' and 'l', etc.) Do you allow control characters in file/pathnames? Why *not* whitespace"? I have a file called "My To-Do List" and don't have any problems accessing it, etc. The developer (not the character set) is responsible for presenting this information in a form that is useful to the user. E.g., MS will sort files named "1" through "99" as: 1, 2, ... 10, 11... 20, 21... 99 whereas UNIX will use a L-R alpha sort: 1, 10, 11..., 2, 20, 21... 99 Will the "RSTUV" user above expect to see "RSTU<roman numeral 5>" immediately following "RSTUU"? And, where would "RSTUV" fit in that scheme?
> For _path_ specifications, there must be some rules how to separate > the node, path, file name, file extension and file version from each > other. The separator or other syntactic elements are usually chosen > from the original 7 bit ASCII character set. What is between these > separators is irrelevant.
I parse my namespaces without regard to specific separators. So, in some objects' identifiers, '/' can be a valid character while it may have delimited the start of that object name in the *containing* object. E.g., the single "o/b/j/e/c/t" exists within the named "container" in the following: container/o/b/j/e/c/t
>> Also, we *tend* to associate meaning with each of the (e.g.) ASCII >> code points. So, 16r31 is the *digit* '1'. There's concensus on >> that interpretation. > > For _numeric_entry_ fields, including the characters 0-9 requires > Arabic numbers fallback mapping.
Why? shouldn't the glyphs for the Thai digits (or Laotian) also be recognized as numeric? Likewise for the roman numerals, the *fullwidth* (arabic) digits, etc.? A particular braille glyph means different things based on how the application/user interprets it. For example, the letter 'A' is denoted by a single dot in the upper left corner of the cell. 'B' adds the dot immediately below while 'C' adds the dot to the immediate right, instead. Yet, in music braille, the "note" 'A' is denoted by a cell that is the union of the letters 'B' and 'C' *less* the letter 'A' A B C *. *. ** .. *. .. .. .. .. Notes: A B C .* .* ** *. ** .* .. .. .. I.e., the same glyph means different things. Imagine labeling a file of sound samples with their corresponding "notes"...
> As strange as it sounds, the numbers used in Arabic countries differs > from those used in Europe. > >> However, Unicode just tabulates *glyphs* and deprives many of them >> of any particular semantic value. E.g., if I pick a glyph out >> of U+[2800,28FF], there's no way a developer would know what my >> *intent* was in selecting the glyph at that codepoint. > > As long as it is just payload data, why should the programmer worry > about it ?
Because the programmer has to deal with the glyph's *meaning* to the user. Otherwise, why not just list file names and content as 4 digit unicode codepoints and eliminate the hassle of rendering fonts, imparting meaning, etc.?
>> It could >> "mean" many different things (the developer has to IMPOSE a >> particular meaning -- like deciding that 'A' is not an alphabetic >> but, rather, a "hexadecimal character" IN THIS CONTEXT) > > In Unicode, there are code points for hexadecimal 0 to F. Very good > idea to separate the dual usage for A to F. > Has anyone actually used those code points ?
The 10 arabic digits that we are accustomed to exist as U+0030 - U+0039 as well as U+FF10 - U+FF19. Then, there are the 10 arabic-indic digits, 10 extended arabic-indic digits, 10 mongolian digits, 10 laotian digits, etc. [We'll ingore dingbat digits, circled digits, super/subscripted digits, etc.] Then, stacking glyphs (e.g., the equivalent of diacriticals)... There's just WAY too much effort involved making sense of Unicode in a way that your *users* will appreciate and consider intuitive. When faced with the I18N/L10N issues, I found it MUCH easier to *punt* (if they don't speak English, that's THEIR problem; or, a task for someone with more patience/resources than me!)
Reply by David Brown September 13, 20162016-09-13
On 13/09/16 17:22, Dennis wrote:
> On 09/13/2016 05:25 AM, David Brown wrote: > >> >> >> If a change makes code clearer, then it is a good thing. Visually >> splitting the words in a multi-word identifier makes code clearer - >> whether that is done using camelCase or underscores is a minor issue. > > I'll go off on a tangent - it can be an important issue. I once worked > with a guy that was visually impaired and used a screen reader for much > of his work. The underscore form would read as (spoken)word > (spoken)underscore (spoken)word... where the camelCase would cause it to > give up and spell it all out. We referred to the underscore form as > "easy reader code". This was over a decade ago so screen readers may be > smarter now.
Unless it is a screen reader specially designed for code, then I'd imagine it would have trouble with camelCase words. I think Don knows more about this sort of program. But you are absolutely right that there can be particular circumstances that determine our choices here, and have overriding importance.
> >> Small letters are easier to read (that's why they exist!), and avoid >> unnecessary emphasis - that makes them a good choice in most cases. And >> there is rarely any benefit in indicating that an identifier is a >> constant or a macro (assuming it is defined and used sensibly) - so >> there is no point in making such a dramatic distinction. >> >
Reply by Dennis September 13, 20162016-09-13
On 09/13/2016 05:25 AM, David Brown wrote:

> > > If a change makes code clearer, then it is a good thing. Visually > splitting the words in a multi-word identifier makes code clearer - > whether that is done using camelCase or underscores is a minor issue.
I'll go off on a tangent - it can be an important issue. I once worked with a guy that was visually impaired and used a screen reader for much of his work. The underscore form would read as (spoken)word (spoken)underscore (spoken)word... where the camelCase would cause it to give up and spell it all out. We referred to the underscore form as "easy reader code". This was over a decade ago so screen readers may be smarter now.
> Small letters are easier to read (that's why they exist!), and avoid > unnecessary emphasis - that makes them a good choice in most cases. And > there is rarely any benefit in indicating that an identifier is a > constant or a macro (assuming it is defined and used sensibly) - so > there is no point in making such a dramatic distinction. >
Reply by David Brown September 13, 20162016-09-13
On 13/09/16 12:24, upsidedown@downunder.com wrote:
> On Tue, 13 Sep 2016 01:11:09 -0700, Don Y > <blockedofcourse@foo.invalid> wrote: > >> On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote: >>> For programming, the most convenient way was to switch the terminal to >>> US-ASCII, but unfortunately the keyboard layout also changed. >>> >>> Digraphs and trigraphs partially solved this problem but was ugly. >>> >>> With the introduction of the 8 bit ISO Latin, a large number of >>> languages and programming could be handled with that single code page, >>> greatly simplifying data exchange in Europe, Americas, Africa and >>> Australia. >> >> The problem with Unicode is that it makes the problem space bigger. >> Its relatively easy for a developer to decide on appropriate >> syntax for file names, etc. with ASCII, Latin1, etc. But, start >> allowing for all these other code points and suddenly the >> developer needs to be a *linguist* in order to understand what >> should/might be an appropriate set of constraints for his particular >> needs. > > For _file_ names not a problem, stop scanning at next white space (or > null in C). Everything in between is the file name, no matter what > characters are used. >
That sounds fine - but what is "white space" in unicode? In ASCII, it's space, tab, newline and carriage return characters. In unicode, there are far more. Invisible spaces, non-breaking spaces, spaces of different widths, etc. Did you remember to check for the Ogham space mark, for those Celtic file names? Use UTF-8 and stop on a null character. Just let people put spaces of any sort in their filenames, and you only have to worry about / (or \ and : ) as special characters.
> For _path_ specifications, there must be some rules how to separate > the node, path, file name, file extension and file version from each > other. The separator or other syntactic elements are usually chosen > from the original 7 bit ASCII character set. What is between these > separators is irrelevant. > > >> Also, we *tend* to associate meaning with each of the (e.g.) ASCII >> code points. So, 16r31 is the *digit* '1'. There's concensus on >> that interpretation. > > For _numeric_entry_ fields, including the characters 0-9 requires > Arabic numbers fallback mapping. > > As strange as it sounds, the numbers used in Arabic countries differs > from those used in Europe.
That's because our "Arabic numerals" came from India, not Arabia - though they were brought over to Europe by an Arabic mathematician. I believe that in Arabic, the term for them translates as "Indian numerals".
> >> However, Unicode just tabulates *glyphs* and deprives many of them >> of any particular semantic value. E.g., if I pick a glyph out >> of U+[2800,28FF], there's no way a developer would know what my >> *intent* was in selecting the glyph at that codepoint. > > As long as it is just payload data, why should the programmer worry > about it ? > >> It could >> "mean" many different things (the developer has to IMPOSE a >> particular meaning -- like deciding that 'A' is not an alphabetic >> but, rather, a "hexadecimal character" IN THIS CONTEXT) > > In Unicode, there are code points for hexadecimal 0 to F. Very good > idea to separate the dual usage for A to F. > Has anyone actually used those code points ? >
There are lots of cases where the same glyph exists in multiple unicode code points, for different purposes. I have no idea how often they are used.
Reply by David Brown September 13, 20162016-09-13
On 13/09/16 06:22, Don Y wrote:
> On 9/12/2016 2:11 PM, David Brown wrote: > >>>>> I would NOT, for example, write: >>>>> x=-1; >>>> >>>> Neither would I - I would write "x = -1;". But I believe I am missing >>>> your point with this example. >>> >>> My example would be parsed as: >>> x =- 1 ; >> >> Parsed by who or what? A compiler would parse it as "x = -1;". I >> assume we >> are still talking about C (or C++) ? > > No. Dig out a copy of Whitesmith's or pcc. What is NOW expressed as > "x -= 1" was originally expressed as "x =- 1". Ditto for =+, =%, etc. > So, if you were a lazy typist and liked omitting whitespace by thinking it > redundant, you typed: > x=-1; > and got: > x=x-1; > instead of: > x = (-1); > >> When writing, I would include appropriate spaces so that it is easy to >> see what >> it means. > > My point is that most folks would think x=-1 (or x=+2, etc.) bound the > sign to the value more tightly than to the assignment operator.
Of course they think "x=-1" means "x = -1" ! It has been almost forty years since "x =- 1" has been standard C. Most people also think that television is in colour, you can communicate to Australia by telephone, and flares are no longer in fashion. Live moves on. In this particular case, the number of people who ever learned to write "x =- 1", and are still working as programmers (or even still alive) is tiny. And the number of those who failed to learn to use "x -= 1" at least 35 years ago, must be tinier still. Sure, you can /remember/ it - and remember having to change old code to suit new compilers. An old shopkeeper may remember when he made deliveries with a horse and cart - but he does not insist that new employees know about grooming horses. Backwards compatibility, and compatibility with existing code, is important. That is why we still have many of the poor choices in the design of C as a language - for compatibility. But with each passing year or decade, compatibility with the oldest code gets less and less relevant - except to historians or the occasional very specialist cases. Of all the lines of C code that are in use today, what fraction were written in pre-K&R C when "x =- 1" was valid? One in a million? One in a hundred million? If we exclude code lines that have been converted to later C standards, then I doubt if it is nearly that many.
> > The same sort of reasoning applies to > x=y/*p; > where the '*' ends up binding more tightly to the '/' to produce the > comment introducer, "/*". Note that the problem persists in the > language's current form -- but compilers often warn about it > (e.g., detecting "nested" comments, etc.)
That is different in that the parsing rules for C are quite clear here, and are the same as the always have been - /* starts a comment. But unless you have carefully created a pathological case and use a particularly unhelpful compiler (and editor - in this century, most programmers use editors with syntax highlighting), you are going to spot the error very quickly. C provides enormous opportunity for accidentally writing incorrect code - in many cases, the result is perfectly acceptable C code and will not trigger any warnings. If you were to take the top hundred categories of typos and small mistakes in C code that resulted in compilable but incorrect code, "x=y/*p" would not feature. It /might/ make it onto a list of the top thousand mistakes. It really is that irrelevant. And it is preventable by using spaces. There is a reason that the space bar is the biggest key on the keyboard.
> >>> Nowadays, you would see this as: >>> x -= 1 ; >> >> That is a completely different statement. > > No. It is what the statement above HAS BECOME! (as the language evolved) >
Take your head out of your history books. In C, "x-=1" means "x -= 1", while "x=-1" means "x = -1". That is it. It is a simple fact. It matters little what C used to be, decades before most programmers were born.
>>> Would you have noticed that it was NOT assigning "-1" to x? >>> Would you have wasted that precious, limited timeslot that you >>> had access to "The Machine" chasing down a bug that you could have >>> avoided just by adopting a style that, instead, writes this as: >>> x = -1; >> >> I am afraid I still don't get your point. "x = -1;" is exactly how I >> would >> write it. I believe spacing like that is good style, and helps make >> it clear >> to the reader and the writer how the statement is parsed. And like in >> normal >> text, appropriate spacing makes things easier to read. > > See above. > > Or, better yet, google turns up: > > <http://bitsavers.informatik.uni-stuttgart.de/pdf/chromatics/CGC_7900_C_Programmers_Manual_Mar82.pdf> > > > Perhaps an interesting read for folks didn't have to write code > "back then".
I am not as old as you, but I have been programming for about 35 years. I have had my share of hand-assembling code, burning eeproms, using punched tape, and even setting opcodes with DIP switches with a toggle switch for the clock. But I understand the difference between what I do /now/, and what other programmers do /now/, and what I did long ago.
> >>>>> And, I *would* write: >>>>> p = &buffer[0]; >>>> >>>> So would I - because I think the code looks clearer. When I want p to >>>> be a pointer to the first element of "buffer", that's what I write. >>> >>> You'll more frequently encounter: >>> p = buffer; >> >> I know. But I prefer "p = &buffer[0];", because I think it looks >> clearer and >> makes more sense. To my reading, "buffer" is an array - it does not >> make sense >> to assign an array to a pointer-to-int. (I'm guessing at types here, >> since >> they were never declared - adjust them if necessary if "buffer" was an >> array of >> char or something else.) > > I prefer it because I am more hardware oriented. I think of "objects" > (poor choice of words) residing AT memory addresses. So, it is only > natural for me to think about "the address of the zeroth element of > the array".
I am a hardware man too. And I quite appreciate that interpretation as well.
> >> C converts arrays or array operations into pointers and pointer >> operations in >> certain circumstances. I wish it did not - but I can't change the >> language. >> But just because the language allows you to write code in a particular >> way, and >> just because many people /do/ write code in a particular way, does not >> necessarily mean it is a good idea to do so. >> >>>> I'd use parens in places that you'd consider superfluous -- but that >>>>> made bindings very obvious (cuz the compiler would do what I *told* it >>>>> without suggesting that I might be doing something unintended): >>>> >>>> Me too. But that's for clarity of code for both the reader and the >>>> writer - not because I worry that my tools don't follow the rules of C. >>> >>> The problem isn't that tools don't follow the rules. The problem is >>> that >>> a compiler can be conforming WITHOUT holding your hand and reminding you >>> that what you've typed *is* "valid C" -- just not likely the valid C >>> that you assumed! >> >> Again, I can only say - get a decent compiler, and learn how to use it >> properly. If you cannot use good compiler (because no good compilers >> exist for >> your target, at least not in your budget), then use an additional >> static error >> checker. > > For how many different processors have you coded?
I can't remember - perhaps 20 or so. Z80, 6502, 68k, x86, MIPS, COP8, 8051, PIC16, HP43000, ARM, PPC, AVR, AVR32, MSP430, NIOS, XMOS, TMS430, 56Fxxx That's 18 - there are several more whose names I can't remember, and some that I have programmed on without being familiar with the assembly language.
> I have compilers > for processors that never made it into "mass production". And, for > processors that saw very limited/targeted support. If I'm willing to FUND > the development of a compiler -- knowing that I may be the only > customer for that compiler -- then I can "buy" all sorts of capabilities > in that compiler! > > OTOH, if I have to pass those costs along to a client, he may be far less > excited about how "good the tool is" and, instead, wonder why *I* can't > compensate for the tool's quality.
Long ago, anyone wanting to make a C compiler for a new processor would either buy the front end and write their own code generator, or would pay a compiler company to write the code generator to go with their existing front end. Only hobby developers would write their own C front end - for professionals, it was not worth the money unless they were a full-time compiler development company. So you got your front-end already made, with whatever features and warnings it supported. Clearly, the range of features would vary here. And sometimes you wanted to add your own extra features for better support of the target. Now, anyone wanting to make a C compiler for a new processor starts with either gcc or clang, and writes the code generator - again, the front-end is there already.
> > What were tools like 20 years ago? Could you approach your client/employer > and make the case for investing in the development of a compiler with > 2016 capabilities? Running, *effectively*, on 1995 hardware and OS? > Or, would you, instead, argue that the project should be deferred until > the tools were more capable?
The tools I used 20 years ago were not as good as the ones I use now. And the tools I used 20 years ago were not as good as the best ones available 20 years ago - the budget did not stretch. But now, the budget /does/ stretch to high quality tools - for most microcontrollers, /everybody's/ budget stretches far enough because high quality compiler tools are free or very cheap. There are a few microcontrollers where that is not the case (the 8051, the unkillable dinosaur, being an example), but tool quality and price is a factor many people consider when choosing a microcontroller. And how relevant are 20 year old tools to the work I do /today/, writing code /today/ ? Not very relevant at all, except for occasional maintenance of old projects.
> > Or, would you develop work styles that allowed you to produce reliable > products with the tools at your disposal??
The whole point is that 20 years ago I had to have a style that made sense 20 years ago with the tools of that era. Now I have a style that is suited to the tools of /this/ era. Not a lot has changed, because the guiding principles have been the same, but many details have changed. Function-like macros have morphed into static inline functions, home-made size-specific types have changed to uint16_t and friends, etc. Some of my modern style features, such as heavy use of static assertions, could also have been used 20 years ago - I have learned with time and experience. But I refuse to write modern code of poorer quality and with fewer features simply because those features were not available decades ago - or even because those features are not available on all modern compilers.
> >> To be used in a safe, reliable, maintainable, and understandable way, >> you need >> to limit the C features you use and the style you use. (This applies >> to any >> programming language, but perhaps more to C than other languages.) >> And you use >> whatever tools you can to help spot errors as early as possible. >> >>>>> y = x / (*p); >>>> >>>> Me too. >>> >>> Because the compiler would gladly think "y=x/*p;" was THREE characters >>> before the start of a comment -- without telling you that it was >>> interpreting it as such. Again, you turn the crank and then spend >>> precious time staring at your code, wondering why its throwing an error >>> when none should exist (in YOUR mind). Or, worse, mangling input: >>> >>> y=/*p /* this is the intended comment */ >>> x=3; >>> >> >> Again, spaces are your friend. > > Spaces (or parens) are a stylistic choice. The language doesn't mandate > their use. E.g., this is perfectly valid: > y=/*I want to initialize two variables to the same value > in a single stmt*/x=3; > > It's a bad coding STYLE but nothing that the compiler SHOULD complain > about!
Compilers can, do, and should complain about particularly bad style. It's important that such complaints are optional - and for compatibility, they are usually disabled by default. There is no clear division between what is simply a stylistic choice ("x=3" vs. "x = 3", for example), and what is a really /bad/ idea, such as putting comments in the middle of a statement. Thus any complaints about style need to be configurable. But there is no doubt that such warnings can be helpful in preventing bugs. Warning on "if (x = 3)" is a fine example. Another is gcc 6's new "-Wmisleading-indentation" warning that will warn on: if (x == 1) a = 2; b = 3; Code like that is wrong - it is bad code, even if it is perfectly legitimate C code, and even if it happens to work. It is a good thing for compilers to complain about it.
> >>>> Thus I would always write: >>>> >>>> if (expression == constant) { .... } >>>> >>>> (I'm not a fan of shouting either - I left all-caps behind with BASIC >>>> long ago.) >>> >>> I ALWAYS use caps for manifest constants. >> >> Many people do. I don't. I really do not think it helps in any way, >> and it >> gives unnecessary emphasis to a rather minor issue, detracting from the >> readability of the code. > > Would you write: > const zero = 0; > const nine = 9; > for (index = zero; index < nine; index++)... > Or: > const start = zero; > const end = nine; > for (index = start; index < end; index++)... > Or: > for (index = START; index < END; index++)... >
No, I would not write any of that. I would not write "const zero = 0" for several reasons. First, it is illegal C - it needs a type. Second, such constants are usually best declared static. Third, it is pointless making a constant whose name is its value. But I /might/ write: static const int start = 0; static const int end = 9; for (int index = start; index < end; index++) { ... } Even that is quite unlikely - "start" and "end" would usually have far better names, while "index" is almost certainly better written as "i". (But that is a matter of style :-) ).
> The latter makes it abundantly clear to me (without having to chase down > the declaration/definition of "start", "end", "zero" or "nine". (who > is to say whether "zero" actually maps to '0' vs. "273" -- 0 degrees K!)
The only thing the START and END form makes abundantly clear is that you really, really want everyone looking at the code to see at a glance that START and END won't change - and that is far more important than anything else about the code, such as what it does. If "zero" does not map to zero, don't call it "zero". Call it "zeroK", or "lowestTemperature", or whatever.
>> >> And again, you are missing the point. I am not writing code for a 40 >> year old >> compiler. Nor is the OP. For the most part, I don't need to write >> unpleasant >> code to deal with outdated tools. > > No, YOU are missing the point!
Certainly we seem to be talking at cross-purposes here. It is a matter of viewpoint who is "missing the point" - probably both of us.
> I'm writing code with 40 years of > "respectible track record". You're arguing that I should "fix" > something (i.e., my style preferences) that isn't broken. Because > it differs from what your *20* year track record has found to be > acceptable. Should we both wait and see what next year's > crop of developers comes up with? Maybe Hungarian notation will be > supplanted by Vietnamese notation? Or, we'll decide that using identifiers > expressed in Esperanto is far more universally readable?
Yes, I am arguing that if something in your style is no longer the best choice for modern programming, then you certainly should consider changing it. Clearly you will not do so without good reason, which is absolutely fine. I am also arguing against recommending new people adopt a style whose benefits are based on ancient tools and your own personal habits. Modern programmers should adopt a forward-looking style that lets them make the take advantage of modern tools - there is no benefit in adopting your habits or my habits, simply because /we/ are used to them. There are benefits in using, or at least being familiar with, common idioms and styles. But that should not be an overriding concern. Keep good habits, if they are still good - but drop bad habits.
> >> I have, in the past, used far more limited compilers. I have worked with >> compilers that only supported C90/ANSI, meaning I can't mix >> declarations and >> statements. I have worked with compilers that have such god-awful code >> generation that I had to think about every little statement, and limit >> the >> number of local variables I used. Very occasionally, I have to work >> on old >> projects that were made using such tools - and I have to go back to >> the dark >> ages for a while. >> >> Perhaps, long ago when the tools I had were poor and the company I >> worked for >> did not have the budget for a linter, it might have made sense to >> write "if >> (CONSTANT == expression)". But not now - and I don't think there will >> be many >> developers working today for which it would still make sense. > > I can then argue that we shouldn't bother with such an archaic language, > at all! Look at all the cruft it brings along with it! Why not wait to > see what pops up tomorrow as the language (and development style) du jour? > Or, why risk being early adopters -- let's wait a few *weeks* before > adopting tomorrow's advances!
There is a balance between choosing something that is mature, field proven and familiar, and choosing something that is newer and has benefits such as efficiency, clarity, flexibility, safety, etc. I think that the large majority of work done in C would be better written in a different language, were it not for two factors - existing code written in C, and existing experience of the programmer in C. For most programming tasks, C /is/ archaic - it is limited, inflexible, and error prone. For some tasks, its limitations and its stability as a language are an advantage. But for many tasks, if one could disregard existing C experience, it is a poor choice of language. Thus a lot of software on bigger systems is written in higher level languages, such as Python, Ruby, etc. A lot of software in embedded systems are written in C++ to keep maximal run-time efficiency while getting more powerful development features. New languages such as Go are developed to get a better balance of the advantages of different languages and features. For a good deal of embedded development, the way forward is to avoid archaic and brain-dead microcontrollers such as the 8051 or the PIC. Stick to solid but modern processors such as ARM or MIPS cores. And move to C++ - /if/ you are good enough to learn and understand how to use that language well in embedded systems. I would wait a few years, not weeks, but not decades, before adopting new languages for embedded programming. Maybe Go will be a better choice in a few years. We've been through all this with "C vs. assembly" - and there are plenty of people that still use assembly programming for embedded systems because "it was good enough for my grandfather, it's good enough for me", or because they simply refuse to move forward with the times. Like assembly, C will never go away - but it /will/ move further and further into niche areas, and be used "for compatibility with existing code and systems". In the meantime, we can try and write our C code in the best way that modern tools allow.
> > Programming is still an art, not a science. You rely on techniques that > have given you success in the past. When 25% of product returns are due > to "I couldn't figure out how to USE this PoS!", that suggests current > development styles are probably "lacking". > <https://books.google.com/books?id=pAsCYZCMvOAC&pg=PA130&lpg=PA130&dq=product+returns+confusion&source=bl&ots=P_v4nTI0m8&sig=XZ6VGEOtuyJOG7Kwkcd2SMV2_6w&hl=en&sa=X&ved=0ahUKEwiDwtDKt4vPAhVk0YMKHfO-C-AQ6AEINjAD> >
We have only been talking about coding styles, which are a small part of development styles. And development styles are only a small part of products as a whole. Learning to use spaces appropriately and not using Yoda-speak for your conditionals will not mean end-users will automatically like your product!
> >>>>> [would you consider "if (foo==bar())" as preferred to "if >>>>> (bar()==foo)"?] >>>> >>>> I'd usually try to avoid a function call inside an expression inside a >>>> conditional, but if I did then the guiding principle is to make it >>>> clear >>>> to read and understand. >>> >>> Which do you claim is cleared? >>> Imagine foo is "errno". >>> After answering, rethink it knowing that errno is actually a function >>> invocation on this platform. >> >> Neither is clear enough. > > So, what's your solution?
It would depend on the rest of the context, which is missing here, but I'd guess it would be something like: int noOfWhatsits = bar(); if (noOfWhatsits == foo) { ... } Local variables are free, and let you divide your code into clear and manageable parts, and their names let you document your code. I use them a lot. (I have also used older and weaker compilers that generated poorer code if you had lots of local variables - I am glad my current style does not have to handle such tools.)
> >>>>> Much of this is a consequence of how I learned to "read" (subvocalize) >>>>> code to myself: >>>>> x = y; >>>>> is "x gets the value of y", not "x equals y". (how would you read >>>>> x ::= >>>>> y ?) >>>>> if ( CONSTANT == expression ) >>>>> is "if expression yields CONSTANT". Catching myself saying "if >>>>> CONSTANT >>>>> gets >>>>> expression" -- or "if variable gets expression": >>>>> if ( CONSTANT = expression ) >>>>> if ( variable = expression ) >>>>> is a red flag that I'm PROBABLY not doing what I want to do. >>>>> >>>>> I'll wager that most folks read: >>>>> x = y; >>>>> as: >>>>> x equals y >>>> >>>> I can't say I vocalize code directly - how does one read things like >>>> brackets or operators like -> ? But "x = y" means "set x equal to y". >>> >>> And, x ::= y effectively says the same -- do you pronounce the colons? >>> Instead, use a verb that indicates the action being performed: >>> x gets y >> >> I don't pronounce colons. I also don't use a programming language >> with a ::= >> operator. But if I did want to pronounce it, then "x gets y" would be >> fine. > > Then you come up with alternative ways of conveying the information > present in that symbology. > > E.g., "::=" (used to initialize a variable) vs '=' (used to define > *constants*, but not variables, and test for value equality) vs ":=:" > (to test for address equality) vs ":=" (chained assignment) in Algol; > '=' vs "==" in C; ":=" vs. '=' and ':' and "==" in Limbo; etc. >
Different languages have different symbols - yes, I know that.
> When conversing with someone not fluent in a language, the actual > punctuation plays an important role.
I am lucky enough to have full use of my hands and my eyes, as well as my mouth. The same applies to other people I discuss code with. I would not try to distinguish "x = y" and "x == y" verbally - I would /write/ it.
> Saying "x gets y" to someone > who isn't familiar with Algol's "::=" would PROBABLY find them > writing "x = y". When "reading" Limbo code to someone, I resort > to "colon-equals", "equals" and "colon" so I'm sure they know > exactly what I'm saying (because the differences are consequential) > >>> I learned to code with left-facing arrows instead of equal signs >>> to make the "assignment" more explicit. >> >> I think that <- or := is a better choice of symbol for assignment. The >> designers of C made a poor choice with "=". >> >>>>> and subtly change their intent, but NOT their words, when >>>>> encountering: >>>>> if ( x == y ) >>>>> as: >>>>> if x equals y >>>> >>>> "if x equals y", or "if x is equal to y". >>>> >>>>> (how would they read "if ( x = y )"?) >>>> >>>> I read it as "warning: suggest parentheses around assignment used as >>>> truth value" :-) >>> >>> No, your *compiler* tells you that. If you've NEVER glossed over >>> such an assignment in your code, you've ALWAYS had a compiler holding >>> your hand for you. I'm glad I didn't have to wait 10 years for such >>> a luxury before *I* started using C. >> >> I have used warnings on such errors for as long as I have used >> compilers that >> conveniently warned about them. But I can say it is very rare that a >> compiler >> /has/ warned about it - it is not a mistake I have made more than a >> couple of >> times. Still, I want my tools to warn me if it were to happen. > > I want my tools to warn me of everything that they are capable of > detecting.
Good. Then we agree - make the best use of the best tools available.
> INCLUDING the tool that occupies my cranial cavity!
I agree. That means not distracting it with things that are easily found automatically by compilers and other tools, so that your mind can concentrate on the difficult stuff.
> >>> I'll frequently store pointers to functions where you might store >>> a variable that you conditionally examine (and then dispatch to >>> one of N places in your code). In my case, I just invoke the >>> function THROUGH the pointer -- the tests were done when I *chose* >>> which pointer to stuff into that variable (why should I bother to >>> repeat them each time this code fragment is executed?) >>> >>> When you do code reviews, do you all sit around with laptops and >>> live code copies? Or, do you pour over LISTINGS with red pens? >>> Do you expect people to just give the code a cursory once-over? >>> Or, do you expect them to *read* the code and act as "human >>> compilers/processors"? >> >> I expect code to be as simple to understand as possible, so that it >> takes as >> little time or effort as possible for other people to read. That way >> others can >> spend their effort on confirming that what the code does is the right >> thing, >> rather than on figuring out what the code does first. > > What's simple to me might not be simple to you! E.g., I naturally > think in terms of pointers. Others prefer array indexes. For me > to jump/call *through* a pointer is far more obvious than encoding some > meaning into a flag at some point in the program. Then, decoding that > flag at another point and dispatching based on that decode! The latter > requires keeping two pieces of code in sync (encode & decode). The > former puts everything in one place (encoding).
Agreed - there is plenty of scope for personal variation and style here.
> > Should I write code at a level that a newbie developer can understand? > Should I limit myself to how expressive and productive I can be out of fear > that someone might not be quick to grasp what I've written?
The right balance here will vary depending on the circumstances - there is no single correct answer (but there are many wrong answers).
> >>> There is no such thing as a perfect style. You adopt a style that >>> works for you and measure its success by how buggy the code is (or >>> is not) that you produce and the time that it takes you to produce it. >> >> Yes. >> >>> >>> I suspect if I sat you down with some of these early and one-of-a-kind >>> compilers, you'd spend a helluva lot more time chasing down bugs that >>> the compiler didn't hint at. And, probably run the risk of some >>> creeping >>> into your release (x=y=3) that you'd never caught. >> >> No. >> >> Static checking by the compiler is not a substitute for writing >> correct code. > > Where am I writing "INcorrect code"? It meets the specifications of the > language. It compiles without errors or warnings. It fulfills the goals > of the specification.
There is more to writing good code than that (and I know you know that). Whether you call bad code that happens to work "correct" or "incorrect" is up to you. But my point here was that you seem to imply I write code with little regard for it being correct or incorrect, and then rely on the compiler to find my errors.
> > It's just that *you* don't like my style! Is that what makes it "not > correct"?
I suspect that in the great majority of cases where I don't like your style, then it is nothing more than that. I might think it is not clear or easy to understand, or not as maintainable as it could be, or simply looks ugly and hard to read, or that it is not as efficient as other styles. I can't say for sure, since about the only things I know for sure about your style is that you like to write "if (3 == x)" rather than "if (x == 3)", and that you like function pointers. It takes a lot more than that for me to label code as "incorrect" or "bad" (assuming the final result does the job required).
> If I replaced the identifiers for manifest constants with lowercase > symbols, > would that make it MORE correct? Should I use camelcase identifiers? > Hungarian notation? Embedded underscores? How do any of these make the > code more or less "correct"?
If a change makes code clearer, then it is a good thing. Visually splitting the words in a multi-word identifier makes code clearer - whether that is done using camelCase or underscores is a minor issue. Small letters are easier to read (that's why they exist!), and avoid unnecessary emphasis - that makes them a good choice in most cases. And there is rarely any benefit in indicating that an identifier is a constant or a macro (assuming it is defined and used sensibly) - so there is no point in making such a dramatic distinction.
> >> You make it sound like I program by throwing a bunch of random symbols >> at the >> compiler, then fixing each point it complains about. The checking is >> automated >> confirmation that I am following the rules I use when coding, plus >> convenient >> checking of silly typos that anyone makes on occasion. >> >>> Instead of pushing an update to the customer over the internet, you'd >>> send someone out to their site to install a new set of EPROMs. Or, >>> have them mail the product back to you as their replacement arrived >>> (via USPS). Kind of annoying for the customer when he's got your >>> device on his commercial fishing boat. Or, has to lose a day's >>> production >>> while it's removed from service, updated and then recertified. >> >> I have mailed EPROMs to customers, or had products delivered back for >> program >> updates. But never due to small mistakes of the sort we are >> discussing here. > > I've *never* had to update a product after delivery. Because the cost of > doing so would easily exceed the profit margin in the product! > > I've had clients request modifications to a product; or tweeks for > specific customers. But, those aren't "bug fixes", they're effectively > "new products" that leverage existing hardware.
And those are updates after delivery. There are many perfectly good reasons for updating software after delivery. All I said was that I have provided updates in a variety of ways, and for a variety of reasons - but never for the sort of mistakes that you seem to think you are immune to because you learned to program with limited tools, while you think /I/ make them all the time because I take advantage of modern tool features.
> "No, there will be no upgrades. You will get it right the first time!" >
That is fine for some projects. I have had cards that have been cemented into the ocean floor - upgrades are not physically possible. And on other projects, customers want to be able to have new features or changes at a later date. I think everyone agrees that shipping something that does not work correctly, and updating for bug fixes, is always a bad idea - just /how/ bad it is will vary.
>> The new programmer should learn to take advantage of new tools, and >> concentrate >> efforts on problems that are relevant rather than problems are no >> longer an >> issue (to the extent that they ever /were/ an issue). And if that new >> programmer is faced with poorer tools, then he or she will have to >> learn the >> idiosyncrasies at the time. And the lack of warnings on "if (x = 1)" is >> unlikely to be the most important point. > > Without the warning, the developer is likely to waste a boatload of > time "seeing what he wants to see" and not what the *compiler* sees.
Without decent warnings, developers (especially new ones) are likely to spend a good deal more time chasing small bugs than they would if the compiler or linter helped them out. But why do you think this particular issue is so important? New C programmers are often told how important it is to distinguish between = and ==, so it is something they look out for, and avoid in most cases. And the Yoda rule only helps in /some/ cases where you have comparisons - you still need to get your = and == right everywhere else.
> >>> There's a reason you can look at preprocessor output and ASM >>> sources. Because they are the only way to understand some nasty >>> bugs that may arise from your misunderstanding of what a tool >>> SHOULD be doing; *or*, from a defective tool! (a broken tool >>> doesn't have to tell you that it's broken!) >> >> I have almost /never/ had occasion to look at pre-processor output. >> But I do >> recommend that embedded programmers should be familiar enough with the >> assembly >> for their targets that they can look at and understand the assembly >> listing. > > You've probably never "abused" the preprocessor in creative ways! :> >
I have abused preprocessors a bit (any use of ## is abuse!), but I haven't had to look at the output directly to debug that abuse. Maybe I haven't been creative enough in my abuses here.
Reply by September 13, 20162016-09-13
On Tue, 13 Sep 2016 01:11:09 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote: >> For programming, the most convenient way was to switch the terminal to >> US-ASCII, but unfortunately the keyboard layout also changed. >> >> Digraphs and trigraphs partially solved this problem but was ugly. >> >> With the introduction of the 8 bit ISO Latin, a large number of >> languages and programming could be handled with that single code page, >> greatly simplifying data exchange in Europe, Americas, Africa and >> Australia. > >The problem with Unicode is that it makes the problem space bigger. >Its relatively easy for a developer to decide on appropriate >syntax for file names, etc. with ASCII, Latin1, etc. But, start >allowing for all these other code points and suddenly the >developer needs to be a *linguist* in order to understand what >should/might be an appropriate set of constraints for his particular >needs.
For _file_ names not a problem, stop scanning at next white space (or null in C). Everything in between is the file name, no matter what characters are used. For _path_ specifications, there must be some rules how to separate the node, path, file name, file extension and file version from each other. The separator or other syntactic elements are usually chosen from the original 7 bit ASCII character set. What is between these separators is irrelevant.
>Also, we *tend* to associate meaning with each of the (e.g.) ASCII >code points. So, 16r31 is the *digit* '1'. There's concensus on >that interpretation.
For _numeric_entry_ fields, including the characters 0-9 requires Arabic numbers fallback mapping. As strange as it sounds, the numbers used in Arabic countries differs from those used in Europe.
>However, Unicode just tabulates *glyphs* and deprives many of them >of any particular semantic value. E.g., if I pick a glyph out >of U+[2800,28FF], there's no way a developer would know what my >*intent* was in selecting the glyph at that codepoint.
As long as it is just payload data, why should the programmer worry about it ?
>It could >"mean" many different things (the developer has to IMPOSE a >particular meaning -- like deciding that 'A' is not an alphabetic >but, rather, a "hexadecimal character" IN THIS CONTEXT)
In Unicode, there are code points for hexadecimal 0 to F. Very good idea to separate the dual usage for A to F. Has anyone actually used those code points ?
Reply by Don Y September 13, 20162016-09-13
On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote:
> For programming, the most convenient way was to switch the terminal to > US-ASCII, but unfortunately the keyboard layout also changed. > > Digraphs and trigraphs partially solved this problem but was ugly. > > With the introduction of the 8 bit ISO Latin, a large number of > languages and programming could be handled with that single code page, > greatly simplifying data exchange in Europe, Americas, Africa and > Australia.
The problem with Unicode is that it makes the problem space bigger. Its relatively easy for a developer to decide on appropriate syntax for file names, etc. with ASCII, Latin1, etc. But, start allowing for all these other code points and suddenly the developer needs to be a *linguist* in order to understand what should/might be an appropriate set of constraints for his particular needs. Also, we *tend* to associate meaning with each of the (e.g.) ASCII code points. So, 16r31 is the *digit* '1'. There's concensus on that interpretation. However, Unicode just tabulates *glyphs* and deprives many of them of any particular semantic value. E.g., if I pick a glyph out of U+[2800,28FF], there's no way a developer would know what my *intent* was in selecting the glyph at that codepoint. It could "mean" many different things (the developer has to IMPOSE a particular meaning -- like deciding that 'A' is not an alphabetic but, rather, a "hexadecimal character" IN THIS CONTEXT)
Reply by September 13, 20162016-09-13
On Mon, 12 Sep 2016 22:52:00 +0200, Hans-Bernhard Br&#4294967295;ker
<HBBroeker@t-online.de> wrote:

>Am 11.09.2016 um 15:14 schrieb upsidedown@downunder.com: >> On Sun, 11 Sep 2016 13:36:59 +0200, Hans-Bernhard Br&#4294967295;ker >> <HBBroeker@t-online.de> wrote: > >>> It may appear to provide a solution to the issue: "How to keep non-ASCII >>> characters in a plain char?" > >> Unsigned char was the answer for at least 5-10 years before UCS2. > >It really wasn't, because as a solution it was incoherent, incomplete, >and insular. Anyway, there really weren't that many years between C90 >and UCS2. > >8-bit unsigned char for non-ASCII characters created more problems than >it ever solved. Instead of having one obvious, while painful >restriction, we now had > >*) dozens of conflicting interpretations of the same 256 values, >*) no standard way of knowing which of those applied to given input >*) almost no way of combining multiple such streams internally >*) no useful way of outputting a combined stream
Exactly the same problems existed with the 7 bit ISO646 from the 1960/70's. In ISO 646 some of the US-ASCII code points were replaced by a national character, such as @ [ \ ] { | } _ ^ ~ $ # The replacement characters were typically different for each country or at least each language. Try to combine texts from different languages produced a mess. This was a problem especially in countries with multiple official languages. Also writing names of foreign origin caused problems. Think about Pascal or C-programming, with missing [ \ ] { | } ^ characters, which in one code page might have been displayed as &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; in one code page but something different in an other code page. This was especially problematic for printing with a language specific printer (e.g. daisy chain) giving quite different looking printouts depending on which printer was used. For programming, the most convenient way was to switch the terminal to US-ASCII, but unfortunately the keyboard layout also changed. Digraphs and trigraphs partially solved this problem but was ugly. With the introduction of the 8 bit ISO Latin, a large number of languages and programming could be handled with that single code page, greatly simplifying data exchange in Europe, Americas, Africa and Australia.