EmbeddedRelated.com
Forums

Attention: European C/C++/C#/Java Programmers-Call for Input

Started by Paul K. McKneely January 27, 2009
Op Thu, 29 Jan 2009 22:48:13 +0100 schreef David Brown  
<david.brown@hesbynett.removethisbit.no>:
> Paul K. McKneely wrote: > A related problem is if you are making identifiers case-insensitive - > it's hard to figure out cases for non-ASCII characters.
Not only that, but asymmetric mappings (e.g. Greek sigma) and different mappings for different locales make it impossible to define a workable set of rules to govern ambiguities. -- Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/
On Thu, 29 Jan 2009 16:16:07 -0600, "Paul K. McKneely"
<pkmckneely@sbcglobal.net> wrote:

>"Stephen Pelc" <stephenXXX@mpeforth.com> wrote in message >news:498200b7.620856320@192.168.0.50... >> On a PC consider what happens when a program written in English >> by South Africans (three languages in daily use in the office), > >Oh really? Where in RSA and what languages?
Cape Town: English, Afrikaans and Xhosa. Note that RSA has/had 11 official languages when I last looked. Belgium has three and ... The real problem is that the Development Character Set (DCS, Operating system Character Set (OCS) and Application (run-time) Charater Set (ACS) may all be different. Now translate "Your balance at %time% on %date% is %value%" allowing for: language order date format time format currency display and make sure that you can cope with currencies with 6 digits after the primary value. Floating point just doesn't cut it with present ranges. For the new Hong Kong airport, the difference between two estimates for the cost of concrete just to cap the piles was US$ 10 million - the 128 bit integer calculation was *way* better than the floating point one. My point is that internationalisation of applications is *not* based on character sets, which are but one minor factor. Stephen -- Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads
"Stephen Pelc" <stephenXXX@mpeforth.com> wrote in message 
news:49824b30.639920999@192.168.0.50...
> The real problem is that the Development Character Set (DCS, > Operating system Character Set (OCS) and Application (run-time) > Charater Set (ACS) may all be different. Now translate > > "Your balance at %time% on %date% is %value%" > > allowing for: > language order > date format > time format > currency display > > and make sure that you can cope with currencies with 6 digits > after the primary value. Floating point just doesn't cut it with > present ranges. For the new Hong Kong airport, the difference > between two estimates for the cost of concrete just to cap the > piles was US$ 10 million - the 128 bit integer calculation was > *way* better than the floating point one. > > My point is that internationalisation of applications is *not* > based on character sets, which are but one minor factor.
Bravo! I am so glad for you to write that. Point well taken. And with all of this burden of internationalisation that people seem to want to put onto my back, I would never even get to first base when it comes to the the real substance of what I want to accomplish in the new programming language.
Paul K. McKneely wrote:

> Bravo! I am so glad for you to write that. Point well > taken. And with all of this burden of internationalisation > that people seem to want to put onto my back, I would > never even get to first base when it comes to the the real > substance of what I want to accomplish in the new > programming language.
Sounds like you are searching for reasons not to start the project :-) If you use some standard approach, like Unicode, I8N is only a very small part of designing a new language and most I8N topics can be found on the web or in books and are straightforward to implement. You are writing in this newsgroup, so looks like it is a language targeting embedded systems. Things like the compiler itself, object format, linker, libraries for all kind of hardware and software tasks, IDE, debugger etc., are much more work and much more interesting. I8N is just a small part of the library and maybe some support from the tools for handling Unicode. What do you want to accomplish in the new programming language? If it is just an incompatible C version, no one would use it. -- Frank Buss, fb@frank-buss.de http://www.frank-buss.de, http://www.it4-systems.de
"David Brown" <david.brown@hesbynett.removethisbit.no> wrote in message 
news:O_6dnYV7E52xuR_UnZ2dnUVZ8u2dnZ2d@lyse.net...
> I would suggest you start by giving up on all your thoughts of specific > character sets. Simply make a straight decision now - you will use UTF-8. > No other encodings - no Latin-1, no UTF-16, no home-made character sets, > no extra fonts. Take it as a fixed decision and work with it for a few > days to see how it fits your needs. Look at existing tools and source > code that supports UTF-8, and see how it can make your work easier and > give a result that users might actually be able to *use*. If you really > put in this effort and find that UTF-8 does not fit your needs, what have > you lost? A couple of days work here is a drop in the ocean compared to > the man-years it will take to work with your home-made encoding, and you > will at least have the benefit of a better understanding of your problem. > You might even be able to explain it to other people in a way that makes > sense.
I want you to know that I lost most of a good night's sleep over this post. In my anguish my brain mulled it over and I came up with a plan. First, I will give you some background (and a great deal of credit for my suffering :). Original conception for ?Text was circa 1985. Actual development began in 1988. It is basically a superset of ASCII. The ASCII part, as you well know, is not proprietary. But the key point is that ?Text began in 1985 as a byte-endian independent streaming format (as well as a flat 32-bit character format) much like UTF-8 which itself was in flux until late 2003. Both ?Text and UTF-8 use the high-order bit to determine what comes next as escape bytes in their byte stream encoding. Although streaming ?Text is much more all-inclusive than UTF-8, its symbol set is not as large (which is really all there is to UNICODE anyway). So I am really not just starting out as you might have the impression. I have probably 10 full man years already into this. I just started working on the 5th generation of the ?Text editor since about August and have been working on the 2nd generation compiler since about a year ago. It is early enough, I could make the 5th generation editor change its course but I will have to start completely over on the compiler (3rd generation). I really have a lot more than you probably think at stake. My business partner (in a medical networking device communications business) keeps urging me that I need to think about retiring in about 10 years. If I were to abandon what I have already done, the whole thing would collapse and I would have little more than UNICODE left. Rather than do this, I would just give up and do something completely different. Want to hear the music album that I wrote from 2004 to 2007? You can get free excerpts at: http://cdbaby.com/cd/pkmckneely The theme music with (full synthetic orchestral sound) is based on a science-fiction trilogy that I am writing. To research the stories, my wife and I spent nearly 3 weeks in South Africa to do background research, get experience and go birdwatching. I have over 4 hours of high-definition video from the trip. Yes, I am a videographer as well. Anyway, this is neither here nor there. I was just letting you know that I have plenty of other things to do during my retirement. Sooooooooooo....... I am not going to start (completely) over. But listen to my plan.... The next generation system will become a binary superset of the UNICODE/HTML suite *instead* of ASCII. Keep in mind that you HAVE to put wrappers around UNICODE to do anything at all. That is data AND program wrappers. Even the UNICODE standard states that it only gives you raw character codes and does not tell you how to process them. The next version of ?Text will be that wrapper. So I have hashed out a way to merge the two and the combination will still be called ?Text which will be its "internal" format. From ?Edit, you will be able to import either plain old raw UNICODE via UTF-8 or UTF-8/HTML with all of the visual properties of HTML. Or you will be able to *load* and *save* from/to native ?Text. The ?Text files should be considerably smaller and easier to parse than their HTML equivalents. But the compiler will require streaming ?Text files for input because it is far more efficient and much easier to parse than HTML. You can run a converter as the first step in the tool chain if you can tolerate the bloated HTML files as your main source code format. Straight UTF-8 files will be smaller but they will lack any visual enhansements. Either way, you can use your favorite generic UNICODE or HTML editor. But ?Edit will be much more useful and much easier to use for programming in the ? programming language. Plus the saved source files will be much smaller and much easier to parse. A lexical analyzer might be next in the tool chain which will accept only the ?Text format. Following that is the parser and then the code generator which will target the specific processor architecture. Intermediate code optimizers can be placed after the parser or generic UNICODE aware but architecture-specific optimizers can be placed after the code generator (or as part of the code generator). Generic assemblers can be used in such case that the output of the code generator is assembly language. Standard UNICODE-compatible linkers and standard downloaders can be used off-the-shelf. I am the holder of the domain name <phisystem.net> which will be the central repository for all information (including coding standards) for the ?System. I will need some time to get the website up and running but that is my plan. I have spent literally many years writing embedded operating systems and this whole thing is intended to go in the direction of development of the ?OS operating system which will be sort of a demonstration for how to write operating systems, device drivers and applications in the new language. There will be many parts to this system so contributors will be welcome. I have to make money somehow (my wife mostly pays the bills and I get a contract job every now and then designing micro-controller based circuit boards along with embedded applications and doing industrial training videos using computer animation) so my company's website will probably be selling "how to" books and training videos to aid developers besides the front-end development tools. I have been a computer animator and musical composer for about 5 years so I will be able to offer quite a few products in support of this system. What I would REALLY like to do is make <phisystem.net> a clearing house for independent software developers (that support the ?System) and give them a 90% royalty of the software that the organization sells for them (much like what CDBaby does for independent musicians, see the CDBABY link I gave above).
>> in his tool chain? That is not to mention that 21-bits >> (or 32-bits) are already used up in just the character's >> code. > I have no clue as to what you are talking about here.
If you look at UTF-8 more closely, it encodes a series of 21-bit "flat characters" (which is the current implementation of UNICODE). In other words, the escape sequences have to be expanded to 21 bits to obtain the flat version of the character in its full implementation. UTF-8 is just a way to stream them out (as to a file ) so that the format is no longer byte-endian dependent. It is also inherently vert efficient for raw ASCII character storage (just as in ?Text except that it allocates no code space for anything but raw character codes) According to Wikipedia, UTF-8 was an outgrowth of ISO 10646 which was a 32-bit flat format. UNICODE may grow to 32-bits (given the current trend continues into the future) seeing how the original 16-bit version was found to be wanting.
>> The new programming language supports fonts, >> color (foreground and background), attributes, size etc. >> Do you think it is a good idea >> to have to expand these basic character codes to >> 64/ 96/128 or even 256 bits in width just to cram it all in? >> The web people would want to encode it all in ASCII >> HTML-style tags which I think is a really bad idea. > > Are you suggesting that you are including font, colour, etc., directly in > the source code? And here was me thinking that a proprietary character > encoding was an "amazingly stupid idea". > >> The overwhelming consensus among responders to these >> threads have voiced that they are not going to use >> anything beyond ASCII anyway. And with all of >> this text stuff, you haven't even begun to talk about >> how you are going to achieve all of the very advanced >> (and very difficult) stuff in the programming language, >> (much of which hasn't ever been done before) >> while carrying this huge load of excess baggage > > Who is "you" who are going to achieve all this? Do you mean the > developers of the tools (i.e., you and your colleagues), or do you mean > your users? And if it is us potential users, what is this "very advanced > stuff" you are talking about? If we knew the specific aims of your > language - what it is that makes it better than existing alternatives - it > would be easier to advise you. > >> on your back. I needed to define some additional >> characters that weren't in ASCII (and aren't in UNICODE) >> for the purposes of the programming language (which >> predates UNICODE and UTF-8 BTW) Additional > > First off, you do *not* need to define additional characters. It's > conceivable that your tools might *benefit* from additional characters > (although, as I said, we know nothing about your tools). But they don't > *need* them. > > Secondly, Unicode has openings for additional domain-specific characters - > you can add them without losing all the other benefits of Unicode (of > course, you'll have to provide a suitable font). > >> characters in APL being sited as the downfall for that >> language is not well founded in light of the fact that, >> when it came out, you had to put out a couple of >> thousand dollars for a hard-wired specialized >> terminal just to program in that language. That is >> besides the fact that it was not designed for the >> kinds of things that I want to do with it (such as >> writing operating systems and device drivers) >> Do you see my point(s)? >> > > No, I don't see your point at all. It reads as though you are saying > APL's lack of popularity was not that it had extra characters, but that it > needed an expensive specialised terminal (which was solely because of its > special characters). > > The main reason for APL's lack of popularity *is* the special characters. > Even though you don't need special hardware (you use a specialised > keyboard map and extra fonts), the characters make it impossible to read > and understand for the non-expert, and extremely slow to enter > expressions. It is *vastly* easier to write for example "range(R)" than > "?R" because you don't have to find the special character. It is also > *vastly* easier to read and pronounce, and to understand "range(R)" than > "?R" even if you have never used the language in question (Python). To > take an example from wikipedia's APL page, here is an expression to give a > list of prime numbers up to R: > > (?R?R&#4294967295;.&#4294967295;R)/R?1??R > > The direct Python translation would be: > > [p for p in range(2, R+1) if not p in [x*y for x in > range(2, R+1) for y in range(2, R+1)]] > > The APL version is certainly shorter - but nevertheless is slower and > harder to write. APL's power and conciseness comes from the power of its > built-in functions, not the fact that most have a single weird symbol > instead of a multi-character name. > >> Simple, lean and mean, but more powerful >> than anything we have now. That is what I am >> shooting for. When symbols need to be >> converted to whatever format when object >> files are produced, that's where the necessary >> conversions will be done. >> This will keep the core of the tools much simpler >> (and smaller and run faster) so that the whole project >> won't collapse when I try to do the really difficult >> things that were the primary goals that I started >> out to accomplish in the first place. >> >>> So the extra memory consumption e.g. in compiler symbol tables are >>> negligible. >>> >>> Regarding linkers, UTF-8 global symbol names should not be a problem, >>> unless the object language uses the 8th bit for some kind of signaling >>> (such as end of string) or otherwise limits the valid bit >>> combinations. >>> >>> Of course the UTF-8 encoding may increase the identifier length, but >>> at least for a linker that usually examines only a specific number of >>> bytes, such as 32, the only risk is that two identifiers are not >>> unique within 32 bytes i.e. 16 characters in Greek or Cyrillic or 10 >>> graphs in some East-Asian script. >>> >>> Paul >> >> >> I do want you to know that I do very much >> appreciate your input. This issue about object >> formats supporting UNICODE is going to be >> a real help when it comes time to generating >> machine code. >>
Paul K. McKneely wrote:
> "David Brown" <david.brown@hesbynett.removethisbit.no> wrote in message > news:O_6dnYV7E52xuR_UnZ2dnUVZ8u2dnZ2d@lyse.net... >> I would suggest you start by giving up on all your thoughts of specific >> character sets. Simply make a straight decision now - you will use UTF-8. >> No other encodings - no Latin-1, no UTF-16, no home-made character sets, >> no extra fonts. Take it as a fixed decision and work with it for a few >> days to see how it fits your needs. Look at existing tools and source >> code that supports UTF-8, and see how it can make your work easier and >> give a result that users might actually be able to *use*. If you really >> put in this effort and find that UTF-8 does not fit your needs, what have >> you lost? A couple of days work here is a drop in the ocean compared to >> the man-years it will take to work with your home-made encoding, and you >> will at least have the benefit of a better understanding of your problem. >> You might even be able to explain it to other people in a way that makes >> sense. > > I want you to know that I lost most of a good night's > sleep over this post. In my anguish my brain mulled > it over and I came up with a plan. First, I will give > you some background (and a great deal of credit > for my suffering :). Original conception for ?Text > was circa 1985. Actual development began in 1988. > It is basically a superset of ASCII. The ASCII part, > as you well know, is not proprietary. But the key > point is that ?Text began in 1985 as a byte-endian > independent streaming format (as well as a flat 32-bit > character format) much like UTF-8 which itself > was in flux until late 2003. Both ?Text and UTF-8 > use the high-order bit to determine what comes next > as escape bytes in their byte stream encoding. > Although streaming ?Text is much more all-inclusive > than UTF-8, its symbol set is not as large (which is > really all there is to UNICODE anyway). So I am > really not just starting out as you might have the > impression. I have probably 10 full man years > already into this. I just started working on the > 5th generation of the ?Text editor since about > August and have been working on the 2nd > generation compiler since about a year ago. > It is early enough, I could make the 5th generation > editor change its course but I will have to start > completely over on the compiler (3rd generation). > > I really have a lot more than you probably think > at stake. My business partner (in a medical > networking device communications business) keeps > urging me that I need to think about retiring in about > 10 years. If I were to abandon what I have > already done, the whole thing would collapse and > I would have little more than UNICODE left. > Rather than do this, I would just give up and > do something completely different. >
I'm beginning to get a vague idea of what you are talking about. When you gave the domain name, I was able to guess that the character your newsreader fails to post in "?Text" is a phi, and googling for "phitext" gave me this: <http://lists.planix.com/pipermail/lout-users/1995q4/000297.html> It also gave some hits for source code, such as this: <http://read.pudn.com/downloads62/sourcecode/compiler/215357/compiler/PHITEXT.C__.htm> As far as I can see, back in 1988 you were interested in producing a general but efficient way to encode a wider range of characters than ASCII. You had slightly different priorities than the Unicode folks, who were starting at the same sort of time. In particular, you have far fewer possible characters (using 11 bits for a total of 1536), but unlike Unicode you use another 21 bits to store visual information such as text styles, weights, fonts, and colour. To support this system, you have been working on a text editor, a compiler, and an embedded operating system. You are now working with conversion tools so that a programmer could store their source code in Unicode, and translate it into phiText for your compiler. Am I right so far? How much of this software is actually developed? How many people are involved in creating it? How many users do you have? Has it actually been used in real systems? mvh., David <snip rest to save space>
Op Fri, 30 Jan 2009 17:33:51 +0100 schreef Paul K. McKneely  
<pkmckneely@sbcglobal.net>:
> "David Brown" <david.brown@hesbynett.removethisbit.no> wrote in message > news:O_6dnYV7E52xuR_UnZ2dnUVZ8u2dnZ2d@lyse.net... > According to Wikipedia, UTF-8 was an outgrowth > of ISO 10646 which was a 32-bit flat format.
ISO 10646 is not a format, it's a standard. UTF-8 is an encoding for Unicode. What many (used to) call "Unicode" was UCS-2 (no escape sequences) which was later followed by UTF-16 (which has escape sequences allowing it to encode all of Unicode). -- Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/
Hello Frank,

"Frank Buss" <fb@frank-buss.de> wrote in message 
news:1xgcfwvbuui7r$.1gdmb6tszma7g.dlg@40tude.net...
> Sounds like you are searching for reasons not to start the project :-)
I was at the time. I have followed David Brown's suggestion and I have worked up a plan to merge PhiText with Unicode. PhiText will become a binary superset of Unicode and Unicode/HTML for many reasons. My first software product will be an editor that can load files in several formats and save them back in any one of those formats. This will make it a useful tool for file format conversion. However, its main focus will be for writing source code in the phi Parallel Programming Language. The editor should also be useful for writing source code in any language based on Unicode. The compiler elements will come next. I plan to use generic Unicode compatible software for such things as linker/library managers and debuggers. If I could find a suitable intermediate code format then I mind get by with not having to write a code generator. However, I suspect that the advanced features of the programming language might preclude some of the above possibilities.
> If you use some standard approach, like Unicode, I8N is only a very small > part of designing a new language and most I8N topics can be found on the > web or in books and are straightforward to implement.
I did some searching for what you were calling 18N and I finally discovered that the term i18n is an abbreviation for internationalization where the '18' stands for the 18 letters that are between 'i' and 'n'. Although my software will need to be internationalized, I am a software technology developer so my customers will probably be the ones acutally doing most of the internationalizing of software. Because almost all of my customers will speak English (because software development is inherently ango-centric) I will probably not have to deal with the issues to the extent that my customers will.
> You are writing in this newsgroup, so looks like it is a language > targeting > embedded systems. Things like the compiler itself, object format, linker, > libraries for all kind of hardware and software tasks, IDE, debugger etc., > are much more work and much more interesting. I8N is just a small part of > the library and maybe some support from the tools for handling Unicode.
Yes. You are correct. The language is targetted at making it easy to write operating systems from the ground up. This really targets embedded people because they are the ones who are going to produce the next generation computer platform. In fact, it is their job to do it every day:) Paul
Paul K. McKneely wrote:

> I did some searching for what you were calling 18N and I finally > discovered that the term i18n is an abbreviation for internationalization > where the '18' stands for the 18 letters that are between 'i' and 'n'.
Yes, sorry, was I18N.
> Although my software will need to be internationalized, I am a > software technology developer so my customers will probably > be the ones acutally doing most of the internationalizing of software. > Because almost all of my customers will speak English (because > software development is inherently ango-centric) I will probably > not have to deal with the issues to the extent that my customers > will.
You can help them with libraries, e.g. like Java does it: http://java.sun.com/javase/technologies/core/basic/intl/
> Yes. You are correct. The language is targetted at making > it easy to write operating systems from the ground up. This > really targets embedded people because they are the ones > who are going to produce the next generation computer > platform. In fact, it is their job to do it every day:)
Your website phisystem.net doesn't show much information. The other link mentioned in this thread, http://lists.planix.com/pipermail/lout-users/1995q4/000297.html doesn't explain any details of the interesting parts, the kernel and the language. In your other postings you wrote that you want to sell how to books etc., but without any outline how your language and system looks like, and why it is better than to use some Eclipse based IDE with C and Linux or some other OS, not anybody would buy it. -- Frank Buss, fb@frank-buss.de http://www.frank-buss.de, http://www.it4-systems.de
Hi Frank,

"Frank Buss" <fb@frank-buss.de> wrote in message 
news:1ftp9ekz2479q$.145tqhhsymxmc.dlg@40tude.net...
> Paul K. McKneely wrote: > You can help them with libraries, e.g. like Java does it: > > http://java.sun.com/javase/technologies/core/basic/intl/
Thank you.
> Your website phisystem.net doesn't show much information. The other link > mentioned in this thread, > http://lists.planix.com/pipermail/lout-users/1995q4/000297.html > doesn't explain any details of the interesting parts, the kernel and the > language. > In your other postings you wrote that you want to sell how to books etc., > but without any outline how your language and system looks like, and why > it > is better than to use some Eclipse based IDE with C and Linux or some > other > OS, not anybody would buy it.
I have been on the news group, not to advertise but to get feedback from European programmers on i18n issues. It is because the products are not ready to be commercialized that I am asking for "early" feedback. This is so that opinions can be used in the development process and not for fixing problems after a product has gone to market. My frustration has been that people want to know the details about what the products are going to be like and when I give them some details then others come by and read it and start to think the products are ready for sale. I would rather just ask narrow well-focused questions but responders get very impatient because they want to know everything about it and it is not ready to be revealed to the world. The PhiSystem.net website is not even in operation yet but David Brown wanted to know what my future plans were so I told him. Right now, they are just plans and plans take time to implement. Paul