Dimiter_Popoff wrote:> On 11.10.2014 г. 18:26, David Brown wrote: >> On 11/10/14 16:13, Dimiter_Popoff wrote: >>> On 11.10.2014 г. 12:28, Stefan Reuther wrote: >>>> Dimiter_Popoff wrote: >>>>> As a side note, "the right way" to treat file names is to preserve >>>>> the case information and to ignore it during file search (i.e. >>>>> aaa and AAA locate the same file). >>>> >>>> This will not work. >>> >>> Well it has worked for quite some time already. >> >> It has "worked" in the sense that people live with it despite the >> inadequacies, inconsistencies and complications such as massive amounts >> of locale-dependent code. > > It has worked for a few milennia, whether you like it or not. Just > because a few programmers do not want to be bothered (or are incapable > of) handling the naming conventions we have is no good reason to ask > for a change.In previous millenia, people did not try to build systems that work internationally. Even in the beginning of this century, systems that work just in one region were common. Of course, if you use just Codepage-437 or ISO-8859-1, which do not have the Turkish "ı" letter, you can agree on a unique case mapping. But then your system won't be usable in Turkey.> The above applies to the rest of your post, I really have no time > explaining the alphabet. People learn it in primary school, I am sure > you have been taught that. Just recall it.Your misconception is that you assume there is a thing such as "the alphabet". There is not "the alphabet". There are hundreds of alphabets, many of which contain common characters, and some of which interpret characters differently than others. And then we have not even started talking about languages that don't use alphabets at all, such as Chinese. Neither have we started to talk about things like sorting, which isn't even uniquely-defined for a language (German has "phonebook" and "dictionary" order), and totally nontrivial for multiple languages (does a Cyrillic "А" go before or after a Greek "Α", and where do they go in relation to a Latin "A"?). Stefan
Filesystem syntax constraints under Windows
Started by ●October 10, 2014
Reply by ●October 12, 20142014-10-12
Reply by ●October 12, 20142014-10-12
David Brown wrote:> On 11/10/14 10:06, upsidedown@downunder.com wrote: >>> On 10/10/2014 9:42 AM, Stefan Reuther wrote: >>>> TL;DR, the reserved characters are >>>> - "\", "/" for the path separator >>>> - ":" for the drive letter separator, and to separate file names >>>> and alternate data streams >> >> I had completely forgotten the NTFS alternate data streams, since at >> least in early NT versions, there were several issues using these >> alternate streams. > > To my knowledge, the only successful use of alternate data streams in an > NTFS file was a way to hide viruses without changing the apparent size > of a file.They are also used to store extended attributes such as a marker "this .exe file was downloaded from the internet, display a scary message when the user tries to run it". Stefan
Reply by ●October 12, 20142014-10-12
On 12.10.2014 г. 13:04, Stefan Reuther wrote:> Dimiter_Popoff wrote: >> ..... >> >> It has worked for a few milennia, whether you like it or not. Just >> because a few programmers do not want to be bothered (or are incapable >> of) handling the naming conventions we have is no good reason to ask >> for a change. > > In previous millenia, people did not try to build systems that work > internationally.Yeah. And because they do now all the whining unix followers would have the millennia old grammar reinvented just to suit the fact they have been led by their leader into the wrong corner. The fact is they got what they deserved (as does anybody following any leader). Tons of defunct software because of a fundamentally broken filesystem.> Your misconception is that you assume there is a thing such as "the > alphabet".I am fluent in only 4 languages, English and German among them (OK, fluent might be overstated for my Russian), what do I know about alphabets. And I have written only one OS with only two filesystems, what do I know about these things. Unix or whatever followers are bound to know better. Really. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Reply by ●October 12, 20142014-10-12
Hi Dimiter, On 10/12/2014 5:59 AM, Dimiter_Popoff wrote:> The fact is they got what they deserved (as does anybody following > any leader). Tons of defunct software because of a fundamentally > broken filesystem.To be clear, what are you considering "broken" about the filesystem? (and, *which* -- FFSv1/UFS, FFSv2, CODA, AFS, PORTAL, UNION, ZFS, Reiser, NFS, etc.) Are your objections to the features offered? Or, the implementation details? Performance? Or, solely to the "naming conventions" of it's content? Or, to the interfaces made available to it? Or, conventions imposed on those interfaces (e.g., I am annoyed that Windows doesn't adhere to L-R alpha sorts. "Gee, let's alphabetize the keys on the keyboard to make it easier for folks to find the key in which they are interested?") [Recall, I have *no* filesystem in my design as the entities managed are rarely "files" in the traditional sense. Rather, just a "namespace". The only "persistent store" resides on a "smart", composite block device (I'm still sorting out the implementation details, there)] [Feel free to reply offline, if prefered]
Reply by ●October 12, 20142014-10-12
On 12.10.2014 г. 18:04, Don Y wrote:> Hi Dimiter, > > On 10/12/2014 5:59 AM, Dimiter_Popoff wrote: > >> The fact is they got what they deserved (as does anybody following >> any leader). Tons of defunct software because of a fundamentally >> broken filesystem. > > To be clear, what are you considering "broken" about the filesystem?The fact that in order to have say "index.htm" in a way usable for humans you need to have also INDEX.HTM, Index.htm, and another few to cover the common cases (to cover all cases you need 256 entries). Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Reply by ●October 12, 20142014-10-12
On 10/12/2014 8:47 AM, Dimiter_Popoff wrote:> On 12.10.2014 г. 18:04, Don Y wrote: >> Hi Dimiter, >> >> On 10/12/2014 5:59 AM, Dimiter_Popoff wrote: >> >>> The fact is they got what they deserved (as does anybody following >>> any leader). Tons of defunct software because of a fundamentally >>> broken filesystem. >> >> To be clear, what are you considering "broken" about the filesystem? > > The fact that in order to have say "index.htm" in a way usable for > humans you need to have also INDEX.HTM, Index.htm, and another few > to cover the common cases (to cover all cases you need 256 entries).But that's something that you can address in the user interface. It doesn't really impact the filesystem, per se. [My question hoped to elicit some comments re: the implementation...] (e.g., add a layer that maps all incoming filenames to uppercase in CREATION, SEARCH and DELETION and the filesystem can still be designed to preserve and recognize case -- it just happens that all entries in the filesystem have this uncanny consistency of always being in uppercase *in* the filesystem's name tables) I'm sure MS effectively implements a locale-specific "strncmpi()" when it goes hunting for a match. [Actually, knowing MS's history with buffer overrun issues, I suspect they would use a strcmpi() instead! :-/ ] I still have to test how the NFS client/server here handle these... In my case (namespaces), as the names for most objects have been created by the developers -- or, code that they crafted -- it seems an invitation to sloppiness (i.e., bugginess) to allow the developer to refer to "foo" as "Foo", elsewhere in his codebase AND EXPECT THEM TO REFERENCE THE SAME OBJECT.
Reply by ●October 12, 20142014-10-12
On 10/12/2014 9:01 AM, Don Y wrote:> On 10/12/2014 8:47 AM, Dimiter_Popoff wrote:>> The fact that in order to have say "index.htm" in a way usable for >> humans you need to have also INDEX.HTM, Index.htm, and another few >> to cover the common cases (to cover all cases you need 256 entries). > > But that's something that you can address in the user interface. > It doesn't really impact the filesystem, per se.> (e.g., add a layer that maps all incoming filenames to uppercase > in CREATION, SEARCH and DELETION and the filesystem can still be > designed to preserve and recognize case -- it just happens that > all entries in the filesystem have this uncanny consistency of always > being in uppercase *in* the filesystem's name tables)For example, all of the rules in my speech synthesizers are expressed in uppercase. Yet, obviously, text fed *to* the synthesizer can be in ANY case -- including mixed. So, my pattern matching algorithms ignore the case of the input text (but KNOW that the case of the templates will be strictly UPPERcase). [This allows me to use lowercase in the templates for "special purposes" without fear of "accidentally" matching something in the input text]
Reply by ●October 12, 20142014-10-12
On 12.10.2014 г. 19:01, Don Y wrote:> On 10/12/2014 8:47 AM, Dimiter_Popoff wrote: >> On 12.10.2014 г. 18:04, Don Y wrote: >>> Hi Dimiter, >>> >>> On 10/12/2014 5:59 AM, Dimiter_Popoff wrote: >>> >>>> The fact is they got what they deserved (as does anybody following >>>> any leader). Tons of defunct software because of a fundamentally >>>> broken filesystem. >>> >>> To be clear, what are you considering "broken" about the filesystem? >> >> The fact that in order to have say "index.htm" in a way usable for >> humans you need to have also INDEX.HTM, Index.htm, and another few >> to cover the common cases (to cover all cases you need 256 entries). > > But that's something that you can address in the user interface.Of course you can. Every problem has its solution. The problem in the above case is the fundamental design of the filesystem. Either you store bytes and do not expose the user to them - but to some text representing these - or you store text and allow the user to consume it. In the unix filesystem they store bytes and feed them for user consumption which has been, is and will be a problem as long as they do not bite the bullet and fix it. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Reply by ●October 12, 20142014-10-12
On 2014-10-10, Don Y <this@is.not.me.com> wrote:> My point was spaces cause issues -- case sensitivity is an "issue" > in your book. Why not "fix" the space issue in a manner similar to > the case one? Just ignore them! Allow "A Very Little Man" > to be treated as "AVERYLITTLEMAN" -- that way the user doesn't have > to worry about remembering how *many* spaces or *if* there were > spaces!Space, what space ? What do you do for U+00A0 and sisters ? http://en.wikipedia.org/wiki/Whitespace_character -- \_/°< coin http://weblog.mixart-myrys.org/?post/2014/09/Radio-Myrys
Reply by ●October 12, 20142014-10-12
On 12/10/14 12:07, Stefan Reuther wrote:> David Brown wrote: >> On 11/10/14 10:06, upsidedown@downunder.com wrote: >>>> On 10/10/2014 9:42 AM, Stefan Reuther wrote: >>>>> TL;DR, the reserved characters are >>>>> - "\", "/" for the path separator >>>>> - ":" for the drive letter separator, and to separate file names >>>>> and alternate data streams >>> >>> I had completely forgotten the NTFS alternate data streams, since at >>> least in early NT versions, there were several issues using these >>> alternate streams. >> >> To my knowledge, the only successful use of alternate data streams in an >> NTFS file was a way to hide viruses without changing the apparent size >> of a file. > > They are also used to store extended attributes such as a marker "this > .exe file was downloaded from the internet, display a scary message when > the user tries to run it". >That's interesting to know. (That particular message is more irritating than scary - /of course/ I want to run the file, that's why I downloaded it in the first place!)







