EmbeddedRelated.com
Forums
Memfault Beyond the Launch

XC8 novice question

Started by rwood September 11, 2016
On 9/13/2016 3:24 AM, upsidedown@downunder.com wrote:
> On Tue, 13 Sep 2016 01:11:09 -0700, Don Y > <blockedofcourse@foo.invalid> wrote: > >> On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote: >>> For programming, the most convenient way was to switch the terminal to >>> US-ASCII, but unfortunately the keyboard layout also changed. >>> >>> Digraphs and trigraphs partially solved this problem but was ugly. >>> >>> With the introduction of the 8 bit ISO Latin, a large number of >>> languages and programming could be handled with that single code page, >>> greatly simplifying data exchange in Europe, Americas, Africa and >>> Australia. >> >> The problem with Unicode is that it makes the problem space bigger. >> Its relatively easy for a developer to decide on appropriate >> syntax for file names, etc. with ASCII, Latin1, etc. But, start >> allowing for all these other code points and suddenly the >> developer needs to be a *linguist* in order to understand what >> should/might be an appropriate set of constraints for his particular >> needs. > > For _file_ names not a problem, stop scanning at next white space (or > null in C). Everything in between is the file name, no matter what > characters are used.
The file system code is easy cuz it doesn't have to impart meaning to any particular "characters". But, a there are typically human beings involved who *do* impart meaning to certain glyphs (as well as the OS itself), the developer can't ignore that expected meaning. I want to use the eight character name "////\\\\". Or, "::::::::". Or, "><&&&&><". How do I differentiate between "RSTUV" and "RSTU<roman numeral 5>"? (i.e., people already have trouble visually disambiguating between the '0' and 'O' glyphs, '1' and 'l', etc.) Do you allow control characters in file/pathnames? Why *not* whitespace"? I have a file called "My To-Do List" and don't have any problems accessing it, etc. The developer (not the character set) is responsible for presenting this information in a form that is useful to the user. E.g., MS will sort files named "1" through "99" as: 1, 2, ... 10, 11... 20, 21... 99 whereas UNIX will use a L-R alpha sort: 1, 10, 11..., 2, 20, 21... 99 Will the "RSTUV" user above expect to see "RSTU<roman numeral 5>" immediately following "RSTUU"? And, where would "RSTUV" fit in that scheme?
> For _path_ specifications, there must be some rules how to separate > the node, path, file name, file extension and file version from each > other. The separator or other syntactic elements are usually chosen > from the original 7 bit ASCII character set. What is between these > separators is irrelevant.
I parse my namespaces without regard to specific separators. So, in some objects' identifiers, '/' can be a valid character while it may have delimited the start of that object name in the *containing* object. E.g., the single "o/b/j/e/c/t" exists within the named "container" in the following: container/o/b/j/e/c/t
>> Also, we *tend* to associate meaning with each of the (e.g.) ASCII >> code points. So, 16r31 is the *digit* '1'. There's concensus on >> that interpretation. > > For _numeric_entry_ fields, including the characters 0-9 requires > Arabic numbers fallback mapping.
Why? shouldn't the glyphs for the Thai digits (or Laotian) also be recognized as numeric? Likewise for the roman numerals, the *fullwidth* (arabic) digits, etc.? A particular braille glyph means different things based on how the application/user interprets it. For example, the letter 'A' is denoted by a single dot in the upper left corner of the cell. 'B' adds the dot immediately below while 'C' adds the dot to the immediate right, instead. Yet, in music braille, the "note" 'A' is denoted by a cell that is the union of the letters 'B' and 'C' *less* the letter 'A' A B C *. *. ** .. *. .. .. .. .. Notes: A B C .* .* ** *. ** .* .. .. .. I.e., the same glyph means different things. Imagine labeling a file of sound samples with their corresponding "notes"...
> As strange as it sounds, the numbers used in Arabic countries differs > from those used in Europe. > >> However, Unicode just tabulates *glyphs* and deprives many of them >> of any particular semantic value. E.g., if I pick a glyph out >> of U+[2800,28FF], there's no way a developer would know what my >> *intent* was in selecting the glyph at that codepoint. > > As long as it is just payload data, why should the programmer worry > about it ?
Because the programmer has to deal with the glyph's *meaning* to the user. Otherwise, why not just list file names and content as 4 digit unicode codepoints and eliminate the hassle of rendering fonts, imparting meaning, etc.?
>> It could >> "mean" many different things (the developer has to IMPOSE a >> particular meaning -- like deciding that 'A' is not an alphabetic >> but, rather, a "hexadecimal character" IN THIS CONTEXT) > > In Unicode, there are code points for hexadecimal 0 to F. Very good > idea to separate the dual usage for A to F. > Has anyone actually used those code points ?
The 10 arabic digits that we are accustomed to exist as U+0030 - U+0039 as well as U+FF10 - U+FF19. Then, there are the 10 arabic-indic digits, 10 extended arabic-indic digits, 10 mongolian digits, 10 laotian digits, etc. [We'll ingore dingbat digits, circled digits, super/subscripted digits, etc.] Then, stacking glyphs (e.g., the equivalent of diacriticals)... There's just WAY too much effort involved making sense of Unicode in a way that your *users* will appreciate and consider intuitive. When faced with the I18N/L10N issues, I found it MUCH easier to *punt* (if they don't speak English, that's THEIR problem; or, a task for someone with more patience/resources than me!)
On Tue, 13 Sep 2016 15:41:24 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 9/13/2016 3:24 AM, upsidedown@downunder.com wrote: >> On Tue, 13 Sep 2016 01:11:09 -0700, Don Y >> <blockedofcourse@foo.invalid> wrote: >> >>> On 9/13/2016 12:11 AM, upsidedown@downunder.com wrote: >>>> For programming, the most convenient way was to switch the terminal to >>>> US-ASCII, but unfortunately the keyboard layout also changed. >>>> >>>> Digraphs and trigraphs partially solved this problem but was ugly. >>>> >>>> With the introduction of the 8 bit ISO Latin, a large number of >>>> languages and programming could be handled with that single code page, >>>> greatly simplifying data exchange in Europe, Americas, Africa and >>>> Australia. >>> >>> The problem with Unicode is that it makes the problem space bigger. >>> Its relatively easy for a developer to decide on appropriate >>> syntax for file names, etc. with ASCII, Latin1, etc. But, start >>> allowing for all these other code points and suddenly the >>> developer needs to be a *linguist* in order to understand what >>> should/might be an appropriate set of constraints for his particular >>> needs. >> >> For _file_ names not a problem, stop scanning at next white space (or >> null in C). Everything in between is the file name, no matter what >> characters are used. > >The file system code is easy cuz it doesn't have to impart meaning to >any particular "characters". But, a there are typically human beings >involved who *do* impart meaning to certain glyphs (as well as the >OS itself), the developer can't ignore that expected meaning. > >I want to use the eight character name "////\\\\". Or, "::::::::". >Or, "><&&&&><". > >How do I differentiate between "RSTUV" and "RSTU<roman numeral 5>"?
Depending on usage. For text rendering, either have a slightly modified glyph for U+2165 or use the fallback letter-V (usable as well as for normal text sorting). However, if the numeric value is of interest, a sequence of roman numerals like LXV would be handled as numeric value of 65.
>(i.e., people already have trouble visually disambiguating between >the '0' and 'O' glyphs, '1' and 'l', etc.)
This is a glyph problem in which similar looking symbols generate different code points.
>Do you allow control characters in file/pathnames?
Such as null ? Why not, but the usage might be a bit problematic.
>Why *not* whitespace"? I have a file called "My To-Do List" and >don't have any problems accessing it, etc.
With a purely GUI user interface, not a big problem. However with direct command line entry or scripts will require all kinds of escape mechanisms.
> >The developer (not the character set) is responsible for presenting >this information in a form that is useful to the user. E.g., MS >will sort files named "1" through "99" as: > 1, 2, ... 10, 11... 20, 21... 99 >whereas UNIX will use a L-R alpha sort: > 1, 10, 11..., 2, 20, 21... 99 > >Will the "RSTUV" user above expect to see "RSTU<roman numeral 5>" >immediately following "RSTUU"? And, where would "RSTUV" fit in >that scheme?
Depending on sorting locale. Since sorting locales are usually different from language to language, you could create your own locale for exactly what you want.
>> For _path_ specifications, there must be some rules how to separate >> the node, path, file name, file extension and file version from each >> other. The separator or other syntactic elements are usually chosen >> from the original 7 bit ASCII character set. What is between these >> separators is irrelevant. > >I parse my namespaces without regard to specific separators. >So, in some objects' identifiers, '/' can be a valid character >while it may have delimited the start of that object name >in the *containing* object. E.g., the single "o/b/j/e/c/t" exists >within the named "container" in the following: > container/o/b/j/e/c/t > >>> Also, we *tend* to associate meaning with each of the (e.g.) ASCII >>> code points. So, 16r31 is the *digit* '1'. There's concensus on >>> that interpretation. >> >> For _numeric_entry_ fields, including the characters 0-9 requires >> Arabic numbers fallback mapping. > >Why? shouldn't the glyphs for the Thai digits (or Laotian) also be >recognized as numeric? Likewise for the roman numerals, the *fullwidth* >(arabic) digits, etc.?
Yes of course, use fallback character mapping from Thai digit 1 to Arabic digit 1. The Roman numerals are more complex, since they do not use the positional system.
> >A particular braille glyph means different things based on how >the application/user interprets it. > >For example, the letter 'A' is denoted by a single dot in the upper >left corner of the cell. 'B' adds the dot immediately below while >'C' adds the dot to the immediate right, instead. > >Yet, in music braille, the "note" 'A' is denoted by a cell that >is the union of the letters 'B' and 'C' *less* the letter 'A' > >A B C >*. *. ** >.. *. .. >.. .. .. > >Notes: >A B C >.* .* ** >*. ** .* >.. .. .. > >I.e., the same glyph means different things. Imagine labeling a file >of sound samples with their corresponding "notes"... > >> As strange as it sounds, the numbers used in Arabic countries differs >> from those used in Europe. >> >>> However, Unicode just tabulates *glyphs* and deprives many of them >>> of any particular semantic value. E.g., if I pick a glyph out >>> of U+[2800,28FF], there's no way a developer would know what my >>> *intent* was in selecting the glyph at that codepoint. >> >> As long as it is just payload data, why should the programmer worry >> about it ? > >Because the programmer has to deal with the glyph's *meaning* to the >user. Otherwise, why not just list file names and content as 4 digit >unicode codepoints and eliminate the hassle of rendering fonts, >imparting meaning, etc.? > >>> It could >>> "mean" many different things (the developer has to IMPOSE a >>> particular meaning -- like deciding that 'A' is not an alphabetic >>> but, rather, a "hexadecimal character" IN THIS CONTEXT) >> >> In Unicode, there are code points for hexadecimal 0 to F. Very good >> idea to separate the dual usage for A to F. >> Has anyone actually used those code points ? > >The 10 arabic digits that we are accustomed to exist as >U+0030 - U+0039 as well as U+FF10 - U+FF19. Then, there are >the 10 arabic-indic digits, 10 extended arabic-indic digits, >10 mongolian digits, 10 laotian digits, etc.
Use fallback mapping for numeric entry, use original code points for rendering.
>[We'll ingore dingbat digits, circled digits, super/subscripted >digits, etc.]
Use foldback mapping.
>Then, stacking glyphs (e.g., the equivalent of diacriticals)... > >There's just WAY too much effort involved making sense of Unicode >in a way that your *users* will appreciate and consider intuitive. >When faced with the I18N/L10N issues, I found it MUCH easier to >*punt* (if they don't speak English, that's THEIR problem; or, a >task for someone with more patience/resources than me!)
Why would a person have to speak English or even know latin letters to use a computer such as a cell phone ?
On 9/13/2016 11:07 PM, upsidedown@downunder.com wrote:
>> There's just WAY too much effort involved making sense of Unicode >> in a way that your *users* will appreciate and consider intuitive. >> When faced with the I18N/L10N issues, I found it MUCH easier to >> *punt* (if they don't speak English, that's THEIR problem; or, a >> task for someone with more patience/resources than me!) > > Why would a person have to speak English or even know latin letters to > use a computer such as a cell phone ?
I'm not designing a cell phone. And, both parties (user and device) *speak* to each other. Should I also add learning foreign pronunciation algorithms to my list of design issues? :>

Memfault Beyond the Launch