On 10/12/2014 1:07 PM, Tonton Th wrote:> On 2014-10-10, Don Y <this@is.not.me.com> wrote: > >> My point was spaces cause issues -- case sensitivity is an "issue" >> in your book. Why not "fix" the space issue in a manner similar to >> the case one? Just ignore them! Allow "A Very Little Man" >> to be treated as "AVERYLITTLEMAN" -- that way the user doesn't have >> to worry about remembering how *many* spaces or *if* there were >> spaces! > > Space, what space ? What do you do for U+00A0 and sisters ? > http://en.wikipedia.org/wiki/Whitespace_characterWhen you start attributing "meaning" to the symbols used to create names -- and, thus, electing to impose "harmless transformations" on them (case, etc.), you risk altering the intended "meaning" (which only the originator of the identifier can define!). Should U+24B6 and U+24D0 be treated as equivalent? What about U+249C? And *all* of U+00C0 thru U+00C5? And U+00E0 thru U+00E5? (and even more "obvious" mappings elsewhere in the codeset) For even wonkier ideas, should U+2801 and U+2809 be considered "equivalent"? (why not?) In UBC one could argue that they are. Yet, if the creator had intended this to be UEB, they would be VASTLY different! (a vs c) You can make translation maps for any set of symbols. But, it seems to be a lot more work and a lot less "robust" in interpretation...
Filesystem syntax constraints under Windows
Started by ●October 10, 2014
Reply by ●October 12, 20142014-10-12
Reply by ●October 13, 20142014-10-13
David Brown wrote:> On 12/10/14 12:07, Stefan Reuther wrote: >> David Brown wrote: >>> To my knowledge, the only successful use of alternate data streams in an >>> NTFS file was a way to hide viruses without changing the apparent size >>> of a file. >> >> They are also used to store extended attributes such as a marker "this >> .exe file was downloaded from the internet, display a scary message when >> the user tries to run it". > > That's interesting to know. (That particular message is more irritating > than scary - /of course/ I want to run the file, that's why I downloaded > it in the first place!)Because you know what an exe file is and why you'd want to download one. Now think of the grandma who got an email that says she should pay that invoice she'll find in invoice10298234.doc.zip.exe, and if she doesn't, someone will come, kill her dog, poison her geraniums, slash her tires and spray-paint her door :-) Stefan
Reply by ●October 13, 20142014-10-13
Dimiter_Popoff wrote:> On 12.10.2014 г. 13:04, Stefan Reuther wrote: >> Dimiter_Popoff wrote: >>> It has worked for a few milennia, whether you like it or not. Just >>> because a few programmers do not want to be bothered (or are incapable >>> of) handling the naming conventions we have is no good reason to ask >>> for a change. >> >> In previous millenia, people did not try to build systems that work >> internationally. > > Yeah. And because they do now all the whining unix followers would > have the millennia old grammar reinvented just to suit the fact they > have been led by their leader into the wrong corner. > The fact is they got what they deserved (as does anybody following > any leader). Tons of defunct software because of a fundamentally > broken filesystem.Last time I looked, Unix software worked fine. As we have learned, defining semantics for "just ignore case" is hard (if you're in Russia, what reason do you have to prefer the I<>i case pair over the I<>ı case pair?). In a mission-critical piece of software like a kernel or a file system, I don't want code with vaguely-defined or complex semantics. Thus, "here's a byte string, give me the file which has the same byte string as its name" sounds like a pretty good plan. If you want to be case insensitive, do that in the application. Or in a foundation library for applications to use. The application knows when it wants to be case insensitive and when not. And, the application knows what locale it is in, and whether it should access "index.html" or "ındex.html" when the user wants "INDEX.HTML". As a bonus advantags, building a case-insensitive file system on top of a case-sensitive one is easy. The other way round is hard.>> Your misconception is that you assume there is a thing such as "the >> alphabet". > > I am fluent in only 4 languages, English and German among them (OK, > fluent might be overstated for my Russian), what do I know about > alphabets. > And I have written only one OS with only two filesystems, what do I > know about these things.I think we are in comp.arch.embedded, not alt.dick.size.wars, but if you replace one-and-a-half language with a few unicode algorithms and some more filesystems, you end up at my stats. Last project (incomplete): customer wants "hey, let's display an alphabet next to the word list ... what, there are different alphabets? Different sort orders depending on the language? Oh." Stefan
Reply by ●October 13, 20142014-10-13
On 13/10/14 19:14, Stefan Reuther wrote:> David Brown wrote: >> On 12/10/14 12:07, Stefan Reuther wrote: >>> David Brown wrote: >>>> To my knowledge, the only successful use of alternate data streams in an >>>> NTFS file was a way to hide viruses without changing the apparent size >>>> of a file. >>> >>> They are also used to store extended attributes such as a marker "this >>> .exe file was downloaded from the internet, display a scary message when >>> the user tries to run it". >> >> That's interesting to know. (That particular message is more irritating >> than scary - /of course/ I want to run the file, that's why I downloaded >> it in the first place!) > > Because you know what an exe file is and why you'd want to download one. > Now think of the grandma who got an email that says she should pay that > invoice she'll find in invoice10298234.doc.zip.exe, and if she doesn't, > someone will come, kill her dog, poison her geraniums, slash her tires > and spray-paint her door :-) >That's why my mother has a Mac, and my mother-in-law has Linux mint :-)
Reply by ●October 13, 20142014-10-13
On 13.10.2014 г. 20:26, Stefan Reuther wrote:> Dimiter_Popoff wrote: >> On 12.10.2014 г. 13:04, Stefan Reuther wrote: >>> Dimiter_Popoff wrote: >>>> It has worked for a few milennia, whether you like it or not. Just >>>> because a few programmers do not want to be bothered (or are incapable >>>> of) handling the naming conventions we have is no good reason to ask >>>> for a change. >>> >>> In previous millenia, people did not try to build systems that work >>> internationally. >> >> Yeah. And because they do now all the whining unix followers would >> have the millennia old grammar reinvented just to suit the fact they >> have been led by their leader into the wrong corner. >> The fact is they got what they deserved (as does anybody following >> any leader). Tons of defunct software because of a fundamentally >> broken filesystem. > > Last time I looked, Unix software worked fine.Not really, the relevant part has never worked and still does not work. See my example in a previous post how may versions of say "index.htm" Index.htm INDEX.HTM you need.> As we have learned, defining semantics for "just ignore case" is hard > (if you're in Russia, what reason do you have to prefer the I<>i case > pair over the I<>ı case pair?).How you treat cases when you record a string (be it a filename or not) is up to the application writing the name - or just the user typing it in. Once it is recorded along with the case information you no longer need to know in which language it is or which alphabet this is, for that. Just the character set - which may cover a lot of alphabets and their variations (check upthread my way of doing it in dps, I explained it). So your above example is completely irrelevant to what we talk about. Filenames which are for human processing consist of characters, not of bytes. In unix names are stored as bytes and the human is given to process bytes, not characters. >>>Your misconception is that you assume there is a thing such as "the >>>alphabet". There is not "the alphabet". There are hundreds of >>> alphabets, ... >>> ......>> I am fluent in only 4 languages, English and German among them (OK, >> fluent might be overstated for my Russian), what do I know about >> alphabets. >> And I have written only one OS with only two filesystems, what do I >> know about these things. > > I think we are in comp.arch.embedded, not alt.dick.size.wars, but if you > replace one-and-a-half language with a few unicode algorithms and some > more filesystems, you end up at my stats.You should be more specific if you want me to bother to understand that. I gave you part of my stats for a good reason you gave me in your post, you did not get it, fine, I'll live. What I see is that you do not want to accept obvious facts, like what is a byte and what is a character, what is a character string and what is a sentence with meaning.(I assume you do know these?) I can imagine it can be hard to even think you may have spent years and years building on a broken basis. Don't get me wrong, not all is bad about unix of course, but the file naming in it clearly is just someones quick hack which has survived for decades. Not that I expect the devotees to be able to swallow such a fact, not after seeing the reactions to me just stating the obvious. Dimiter
Reply by ●October 14, 20142014-10-14
On 13/10/14 21:00, Dimiter_Popoff wrote:> On 13.10.2014 г. 20:26, Stefan Reuther wrote: >> Dimiter_Popoff wrote: >>> On 12.10.2014 г. 13:04, Stefan Reuther wrote: >>>> Dimiter_Popoff wrote: >>>>> It has worked for a few milennia, whether you like it or not. Just >>>>> because a few programmers do not want to be bothered (or are incapable >>>>> of) handling the naming conventions we have is no good reason to ask >>>>> for a change. >>>> >>>> In previous millenia, people did not try to build systems that work >>>> internationally. >>> >>> Yeah. And because they do now all the whining unix followers would >>> have the millennia old grammar reinvented just to suit the fact they >>> have been led by their leader into the wrong corner. >>> The fact is they got what they deserved (as does anybody following >>> any leader). Tons of defunct software because of a fundamentally >>> broken filesystem. >> >> Last time I looked, Unix software worked fine. > > Not really, the relevant part has never worked and still does not work. > See my example in a previous post how may versions of say > "index.htm" Index.htm INDEX.HTM you need.The file in question is usually called "index.html", except on systems based on outdated and limited MSDOS filesytems. The file's name is "index.html". Not "Index.html", nor "INDEX.HTML", nor any other mixups. The name uses small letters. There is no confusion, and people don't have any trouble with it. Most names in the real world are case-sensitive. My name is written "David" - not "david" or "DAVID".> >> As we have learned, defining semantics for "just ignore case" is hard >> (if you're in Russia, what reason do you have to prefer the I<>i case >> pair over the I<>ı case pair?). > > How you treat cases when you record a string (be it a filename or not) > is up to the application writing the name - or just the user typing it > in. > Once it is recorded along with the case information you no longer need > to know in which language it is or which alphabet this is, for that. > Just the character set - which may cover a lot of alphabets and their > variations (check upthread my way of doing it in dps, I explained it). > > So your above example is completely irrelevant to what we talk about. > > Filenames which are for human processing consist of characters, not of > bytes.Filenames are mainly for programs to process, not humans - and software is perfectly capable of getting the case correct and consistent. So are most humans that I know, except perhaps when names are particularly inconvenient (such as having double spaces, or unicode glyphs that look like other glyphs).
Reply by ●October 14, 20142014-10-14
On 14.10.2014 г. 11:25, David Brown wrote:> ..... > The file's name is "index.html". Not "Index.html", nor "INDEX.HTML", > nor any other mixups. The name uses small letters. There is no > confusion, and people don't have any trouble with it. > ... > > Filenames are mainly for programs to process, not humans - ....> ... Sorry David, I don't do religion. When you are able to grasp what nonsense you have written (see the above)we may have what to talk about again. Dimiter
Reply by ●October 14, 20142014-10-14
On 14/10/14 12:28, Dimiter_Popoff wrote:> On 14.10.2014 г. 11:25, David Brown wrote: >> ..... >> The file's name is "index.html". Not "Index.html", nor "INDEX.HTML", >> nor any other mixups. The name uses small letters. There is no >> confusion, and people don't have any trouble with it. >> ... >> >> Filenames are mainly for programs to process, not humans - .... >> ... > > Sorry David, I don't do religion. When you are able to grasp what > nonsense you have written (see the above)we may have what to talk > about again. >I realise you have a wildly different opinion about this case, which you hold very strongly. I just don't understand it at all. You have been shown clear linguistic reasons why case independence is not universal (even when languages share an alphabet) and therefore should not be part of an OS or filesystem. You have been shown clear practical reasons why it is better for an OS or filesystem to work directly with the bytes of the filename rather than imposing an interpretation on them. You have been shown how real names in (both in the software world and in the "real" world) can often be case-sensitive. And you have been shown many examples of systems that have case-sensitive filesystems which work perfectly well. I don't think there is anything more that can be done here. You have a solid fixed opinion that is different from most other people's, and it does not look like you are going to change that soon - nor does it look like you can explain it in a way others can understand. I'm sure you have good reasons for your thoughts here, even if I can't appreciate them - so we must just agree to disagree here.
Reply by ●October 14, 20142014-10-14
On 14.10.2014 г. 15:37, David Brown wrote:> On 14/10/14 12:28, Dimiter_Popoff wrote: >> On 14.10.2014 г. 11:25, David Brown wrote: >>> ..... >>> The file's name is "index.html". Not "Index.html", nor "INDEX.HTML", >>> nor any other mixups. The name uses small letters. There is no >>> confusion, and people don't have any trouble with it. >>> ... >>> >>> Filenames are mainly for programs to process, not humans - .... >>> ... >> >> Sorry David, I don't do religion. When you are able to grasp what >> nonsense you have written (see the above)we may have what to talk >> about again. >> > > I realise you have a wildly different opinion about this case, which you > hold very strongly. I just don't understand it at all. > > You have been shown clear linguistic reasons why case independence is > not universal (even when languages share an alphabet)Yes, thanks for teaching me the alphabet. If you can suggest a more moronic sort of effort to explain the obvious than that you are welcome to share it.>... and therefore > should not be part of an OS or filesystem.Now that has not only not been shown but I showed in clear, irrefutable terms that it is not the case at all, quite the opposite. The fact that you have religious views on it speaks only about your ability to understand issues at this level. You may want to stop trying, posts like your last two only make you look not as bright as you would want to be. Dimiter
Reply by ●October 14, 20142014-10-14
On 10/14/2014 5:37 AM, David Brown wrote:> I don't think there is anything more that can be done here. You have a > solid fixed opinion that is different from most other people's, and it > does not look like you are going to change that soon - nor does it look > like you can explain it in a way others can understand. I'm sure you > have good reasons for your thoughts here, even if I can't appreciate > them - so we must just agree to disagree here.I think the difference lies in your expectations of the user's role in dealing with "names". E.g., I find it particularly annoying that Windows doesn't use a strict LR alpha sort. So, I am always looking for "90" to follow "9" -- not "89" And, "folder" to follow "ezzz" -- instead of appearing up at the top of the list among the other folders. This happens in other things, too. E.g., calculator keypad vs telephone keypad. In *my* case (as the OP), *names* are primarily (almost exclusively) used by pieces of code. Getting the identifier EXACTLY correct is a small price to INSIST UPON in robust software. And, now I'm late... <frown>







