EmbeddedRelated.com
Forums

Attention: European C/C++/C#/Java Programmers-Call for Input

Started by Paul K. McKneely January 27, 2009
On 28 Gen, 00:31, "Paul K. McKneely" <pkmckne...@sbcglobal.net> wrote:

> The English speaking world has used a lot of > Greek letters for variables during that past > few centuries. =A0It wouldn't be much of a > shock for programmers to suddenly be > able to use ? instead of pi.
It depends on what is faster. Even if you have the gliph of pi in the characterset, if the keyboard doesn't have an easy access to it in my opinion it will not be used. Writing "pi" is two keys, writing the gliph on a standard keyboard is at least 3 keys (for example ctrl-shift-p). The two keys combination are already used (shift-keys for capital letters, ctrl-keys and alt- keys for program functions). So I think that is faster to continue use the transcription. And for the moment we are speaking about western alphabets (latin, greek,...), but what about asian alphabets? You want to encode them too? I wish to see a program with variables names written in chinese ideograms, where if I remember well sometimes the meaning depends on which ideogram is near the one you are reading (or writing). I think Mr. Dijkstra is right. For program code use the less complex character set you can find (ie. ASCII), for comments, variable and function name and so on English should be the language to use. For strings use Unicode. Bye Jack
Paul K. McKneely wrote:
> Hi, > >> I am looking forward to read source-code like this: >> >> principal(de_tout arg_compteur, signe *arg_horaire) >> { >> ???????? ??????? // >> kokonaisluku hakemisto; >> terwijl (de kleinere tellers zeven is) { >> } "geschwofelte Klammer zu" > > Now that IS funny. This is the very thing that the > programming community doesn't want. Don't > forget, Arabic and Hebrew are read from right > to left. Is the above code what an LR parser is for? > Or should it be called an LR/RL parser? > What I had in mind is more like ?=3.1415926; > The English speaking world has used a lot of > Greek letters for variables during that past > few centuries. It wouldn't be much of a > shock for programmers to suddenly be > able to use ? instead of pi. > > Paul >
There are times when non-English identifiers such as pi, or the Greek lower-case letters, could be useful. But they are few and far between, mostly restricted to mathematical programming. And if you want to be able to write pi as a single Greek letter identifier, you also want to be able to write identifiers with subscripts, and very soon a wide range of proper mathematical notation. This would lead to chaos very quickly (and it's already been done - it's called APL). Allow non-ASCII characters in comments and strings. Your choices are to either fix on Latin-1, fix on UTF-8, or allow different encodings with an identifier at the start of the file. I'd go for UTF-8 as a modern choice that works well and allows a very wide range of characters, while working with a great range of existing tools. Trying to invent your own character set, encodings, and orderings is about as useful to your users as using Esperanto for the documentation "in order to keep it international". It's a sure way to guarantee that your project will fail.
Paul K. McKneely schrieb:
> Hi, > >> I am looking forward to read source-code like this: >> >> principal(de_tout arg_compteur, signe *arg_horaire) >> { >> ???????? ??????? // >> kokonaisluku hakemisto; >> terwijl (de kleinere tellers zeven is) { >> } "geschwofelte Klammer zu" > > Now that IS funny. This is the very thing that the > programming community doesn't want. Don't > forget, Arabic and Hebrew are read from right > to left.
The line, your newsreader messed up, is a comment, as "//" is at the *start* of the line ;-)
> It wouldn't be much of a > shock for programmers to suddenly be > able to use ? instead of pi.
In a world, where even newsclients are unable to declare the character-set, that was used, everything would read ??=3?14... or US?=?*1?3... (3.1415 would be written 3,1415 in german) Falk
Op Wed, 28 Jan 2009 01:12:00 +0100 schreef Paul K. McKneely  
<pkmckneely@sbcglobal.net>:
> "Boudewijn Dijkstra" <boudewijn@indes.com> wrote in message > news:op.uoe8ixqyy6p7a2@azrael.lan... >> After reading your post, I must conclude that you are oblivious to key >> concepts and organizations surrounding internationalization and >> multilingual co-operations. It is a good thing that you sought advice >> from an intelligable community before re-inventing the wheel (badly). > > Thank you for being so polite and humble.
Don't be so bitter. You were the one trying to cater to Europeans (exclusively) without having a clue about the current situation nor the wants and needs of those Europeans.
> Let me say that the > new language is not about internationalization.
How is "it would be great if European programmers could [write] in their own native languages" not about internationalization?
> It is about providing > a much more powerful programming environment than is available > with standard languages.
OK, probably a praiseworthy goal. But what does that have to do with Europeans in particular and their languages in general? And maybe it would be wise to outline the shortcomings of the "standard languages" so that people can better flame you.
> (I know I expect to get a lot of flames > from that last statement. I understand that there are a lot of > insecure people in the world who will feel outrage with just about > anything I have to say. Such is the price for a small amount of > useful feedback). > >> Like Java does? >> http://java.sun.com/docs/books/jls/third_edition/html/lexical.html > > No body in their right mind would try to write an operating system > (or a device driver!) in Java.
I said "like Java", we were talking about the character set of the language, not about the available types and other grammatical elements.
> With no pointers and only signed > integers, it would be like programming with a straight jacket on. > And what would happen when an interrupt happened and the > Java engine decided it was time for garbage-collection in the > middle of an interrupt service routine?
Read JSR-1. http://jcp.org/en/jsr/detail?id=1
>> Why just Europeans? Lots of software is written by Israeli (Hebrew), >> North-African (Arabic), Chinese (thousands of ideographs in different >> families) and Japanese (Katakana) people. > > Let me answer your question with your own words: >> As far as I'm concerned, English is the only language that should be >> seen in source code elements (except maybe string literals).
So what's the point of using an extended character set you if agree to be using English anyway?
>> Not every language sorts the same alphabet in the same way. > The output of the software development tool chain is > for programmers only. I don't think everyone else will care if > the ordinal rules don't conform to every village on the planet.
Why sort users' result different than programmers' results? It can only be confusing and annoying.
>> There are two other arguments against your proposal: > > I didn't propose anything. I asked for input.
Besides asking for input, you apparantly created a 'proposed' character set (which I still haven't seen).
> And your > comments are well taken but do not address my request.
As I have read it, your request was about an extended character set to be used by Europeans to program (partly) in their native language. If this is not so, then please re-phrase your request as I have mis-interpreted your general direction with this 'project'.
>> (I hope that you will learn to appreciate the special marks and symbols >> used by your Spanish-speaking fellow-Americans (amongst others), before >> you inadvertantly insult one.) > > Sort of like the way you started to insult me with your first remarks?
That was not inadvertant. I am European; get used to it. ;)
> I can see that. I'll try not to follow your lead
I am definately not leading, merely interrogating and questioning your direction and your reasoning. -- Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/
Hi Falk,

I actually read the Arabic correctly.  It was somewhere in
the reply step where it was changed.  Thank you for
being polite.  Boudewijn Dijkstra is wrong in his
implication that all Europeans are rude jerks.
I remember someone saying one time: "Be polite
and considerate.  You never know who might end
up being your boss."

Paul

"Falk Willberg" <Faweglassenlk@falk-willberg.de> wrote in message 
news:6uai8dFe8elvU1@mid.individual.net...
> Paul K. McKneely schrieb: >> Hi, >> >>> I am looking forward to read source-code like this: >>> >>> principal(de_tout arg_compteur, signe *arg_horaire) >>> { >>> ???????? ??????? // >>> kokonaisluku hakemisto; >>> terwijl (de kleinere tellers zeven is) { >>> } "geschwofelte Klammer zu" >> >> Now that IS funny. This is the very thing that the >> programming community doesn't want. Don't >> forget, Arabic and Hebrew are read from right >> to left. > > The line, your newsreader messed up, is a comment, as "//" is at the > *start* of the line ;-) > >> It wouldn't be much of a >> shock for programmers to suddenly be >> able to use ? instead of pi. > > In a world, where even newsclients are unable to declare the > character-set, that was used, everything would read ??=3?14... or > US?=?*1?3... (3.1415 would be written 3,1415 in german) > > Falk
On Tue, 27 Jan 2009 16:12:23 +0100, "Boudewijn Dijkstra"
<boudewijn@indes.com> wrote:

>As far as I'm concerned, English is the only language that should be seen >in source code elements (except maybe string literals). It is the >language of choice for technical terms, the language from which >programming languages derive their syntax, and overall the best known >language amongst programmers worldwide. English is one of the few >languages without accents and with relatively short words, thus allowing >relatively efficient typing.
OTOH, if the program refers for instance to an external record dealing with some purely national entities (such as defined by the national legislation), should the programmer invent some unofficial English translation for these entities or use the name without accented characters ? However, at least in Finnish, doing the &#4294967295;=>a and &#4294967295; =>o translation might end up into an other word with completely different meaning. In the worst case, two identifier in the same record might end up into the same US-ASCII representation. IMHO, as a former Fortran programmer, 6 bit characters and 6 characters identifiers should be enough :-) :-) Paul
Paul K. McKneely wrote:
> I actually read the Arabic correctly. It was somewhere in > the reply step where it was changed. Thank you for > being polite. Boudewijn Dijkstra is wrong in his > implication that all Europeans are rude jerks.
Come on, this is usenet, not kindergarten. And if you're trying to revolutionize the world, you should be able to endure a little sarcasm. Actually, your post reminded me of something. When I was 15, I tried to revolutionize the world with a new programming system as well. You started with a character set - I started with an object file format. So I defined the object file format that "would be able to store code for all processors on the planet", without having ever seen anything other than a Z80 and an x86. Far call patching? Alignment? Link-time inlining? What's that? But surely everyone has segment registers. This is similar with your character set. Almost nobody really needs identifiers in native language, because that messes up interoperability. If I have a printed manual saying I should call function &phi;, how would I do that if I don't find it on my keyboard? And a character set that collates nicely (outside A-Z) is rarely of use, too. When I sort things, I either don't care how exactly it is sorted, I just want to be able to find things with a binary search. Or I want a locale-specific, case-blind sort, which, as I've shown, can differ widely depending on the actual locale used. Long ago, for a semester project, we tried to use a coding style using our native language. We set the rule: all code we write has to be in German, so we can more easily tell what is our code and what is code imported from the runtime library and the framework. This taught us two things: (1) mixed-language code looks ugly, because half of the accessor functions are called get/set, and the other half is gib/setze. (2) umlauts break tools. Javadoc refused to generate an index containing umlauts, and all the code metric tools our teacher tried to use crashed and burned on the code. I ultimately hacked up a perl script to get the metrics. Stefan
David Brown wrote:
> Paul K. McKneely wrote: >> What I had in mind is more like ?=3.1415926;
Funny that even today's software cannot post a correct &pi;.
>> The English speaking world has used a lot of >> Greek letters for variables during that past >> few centuries. It wouldn't be much of a >> shock for programmers to suddenly be >> able to use ? instead of pi. > > There are times when non-English identifiers such as pi, or the Greek > lower-case letters, could be useful. But they are few and far between, > mostly restricted to mathematical programming.
Mathematicians use greek letters and funny fonts because they don't have multi-character identifiers. When they write "&alpha;", they usually mean "angle". I actually consider it an advantage to be able to write "angle" in my programs. &pi; might be an exception because it's so prominently known, but it's nothing I would design my language around. Especially in embedded/DSP contexts, trig functions often have a period of, say, 64, not 2&pi; :-) Stefan
Paul Keinanen schrieb:
...
> However, at least in Finnish, doing the &#4294967295;=>a and &#4294967295; =>o translation > might end up into an other word with completely different meaning.
How do you substitute &#4294967295;/&#4294967295;/&#4294967295;? Germans write ae/oe/ue instead. But Finnish is a good example. Most European lanuages are successors of Latin or heavily influenced by Latin. So it is possible to understand comments in e.g. Italian. I am working on C-code, which is partially commented in Finnish. I can't understand any word. Luckily all objects are named in English and http://translate.google.com translates to Finnish.
> IMHO, as a former Fortran programmer, 6 bit characters and 6 > characters identifiers should be enough :-) :-)
IMO, as a former BASIC-programmer, 6 charaters should be minimum for any instance, that is valid over more than two lines :-) Code should be written in english. Comments, if possible, too. Falk
On Wed, 28 Jan 2009 19:32:38 +0100, Falk Willberg
<Faweglassenlk@falk-willberg.de> wrote:

>Paul Keinanen schrieb: >... >> However, at least in Finnish, doing the &#4294967295;=>a and &#4294967295; =>o translation >> might end up into an other word with completely different meaning. > >How do you substitute &#4294967295;/&#4294967295;/&#4294967295;? Germans write ae/oe/ue instead.
I have never seen such transitterations in any Finnish programs, we just drop the dots.
>But Finnish is a good example.
Finnish is a Uralic language (such as Estonian and Hungarian).
>Most European lanuages are successors of >Latin or heavily influenced by Latin. So it is possible to understand >comments in e.g. Italian.
Those are Indo-European languages.
>I am working on C-code, which is partially commented in Finnish. I can't >understand any word. Luckily all objects are named in English and >http://translate.google.com translates to Finnish.
My comment might be a bit outdated, but from the job security point of view in the 1990's, using national identifier names is a good idea:-). Paul