Hi, Does anyone know which filesystem naming constraints are imposed in Windows itself vs. the file system layer? Said another way, which constraints are *invariants* regardless of the filesystem? E.g., can a non-native filesystem redefine '+' to replace "../"? Or, allow support for ':' in names? Or, replace '\' with '/'? Or, ... Thx, --don
Filesystem syntax constraints under Windows
Started by ●October 10, 2014
Reply by ●October 10, 20142014-10-10
Don Y <this@is.not.me.com> wrote:> Does anyone know which filesystem naming constraints are imposed > in Windows itself vs. the file system layer? Said another way, > which constraints are *invariants* regardless of the filesystem?> E.g., can a non-native filesystem redefine '+' to replace "../"? > Or, allow support for ':' in names? Or, replace '\' with '/'? > Or, ...Interesting question. It is well known that the actual system calls accept either / or \ as a separator. The command option processor uses / for options, and so requires \ for the separator. The more interesting cases come when you use NFS. I was just reading yesterday that it is possible to have an NFS server on a Windows system that allows for case significant names. (Unlike most that are case preserving.) The NFS protocols are independent of the actual separator. The : in drive selection has to be processed pretty early. I am not so sure what to say about : later in the name. As far as I know, CMD has to keep track of the current subdirectory. That would complicate any other treatment for \ and :. -- glen
Reply by ●October 10, 20142014-10-10
On 10/10/14 11:06, glen herrmannsfeldt wrote:> Don Y <this@is.not.me.com> wrote: > >> Does anyone know which filesystem naming constraints are imposed >> in Windows itself vs. the file system layer? Said another way, >> which constraints are *invariants* regardless of the filesystem? > >> E.g., can a non-native filesystem redefine '+' to replace "../"? >> Or, allow support for ':' in names? Or, replace '\' with '/'? >> Or, ... > > Interesting question. > > It is well known that the actual system calls accept either / or \ > as a separator. The command option processor uses / for options, > and so requires \ for the separator. > > The more interesting cases come when you use NFS. > > I was just reading yesterday that it is possible to have an NFS > server on a Windows system that allows for case significant names. > (Unlike most that are case preserving.)I believe you can store case-sensitive named files on NTFS as well - it's part of the posix compliance of the filesystem (harking back to the days when MS were still pretending to cooperate with other OS's). Don, why are you asking about this? Are you trying to implement a non-native filesystem and want to support as much as Windows allows? If so, then the answer might depend on how that filesystem interacts with Windows. (I am not sure that I can give you more information no matter what you answer - but it might be of some help.) Even if it is possible to redefine things like directory separators, it might cause a lot more confusion and therefore not be worth implementing (just like using files whose names differ only by letter case). Also consider path lengths in this - the path length limitations are different within Windows itself and in NTFS.
Reply by ●October 10, 20142014-10-10
On 10/10/2014 2:06 AM, glen herrmannsfeldt wrote:> Don Y <this@is.not.me.com> wrote: > >> Does anyone know which filesystem naming constraints are imposed >> in Windows itself vs. the file system layer? Said another way, >> which constraints are *invariants* regardless of the filesystem? > >> E.g., can a non-native filesystem redefine '+' to replace "../"? >> Or, allow support for ':' in names? Or, replace '\' with '/'? >> Or, ... > > Interesting question. > > It is well known that the actual system calls accept either / or \ > as a separator. The command option processor uses / for options, > and so requires \ for the separator.The problem is, these are just observations from *outside* the system. You (I) don't know if the "system" imposes these conventions... OR, if it relies on the filesystem implementation to impose conventions that are appropriate for that specific file system! E.g., imagine the "system" invokes a file system specific *method* to "parse pathname". In that case, all the system does is parse enough of a pathname to get to a particular mount point, *notice* which sort of file system is mounted *at* that mount point, then pass the balance of the pathname off to filesystem->parse_pathname(). Ancient versions of MS C had library routines to parse pathnames that hard-coded such separators. But, that still doesn't indicate if this was mimicking a service performed within the filesystems of that era *or* was the sole mechanism for handling pathnames.> The more interesting cases come when you use NFS.That's an idea! I have NFS client and server running under Windows. I can mount an external filesystem and see if the Windows client recognizes "FILENAME" and "filename" as two coexisting files. And, if it "does the right thing" when I refer to one or the other. Similarly, export a portion of the Windows filesystem ("Foo") and verify that it can ONLY be accessed as "Foo" (and not "fOo" or "FOO").> I was just reading yesterday that it is possible to have an NFS > server on a Windows system that allows for case significant names. > (Unlike most that are case preserving.) > > The NFS protocols are independent of the actual separator. > > The : in drive selection has to be processed pretty early. > I am not so sure what to say about : later in the name.Again, it depends on whether the *system* implements this as a "rule". You can think of drive letters as objects at the "root" of the filesystem. Rules for objects at that level may require <letter>':' (among other top level name conventions). Or, it could be a hardwired prohibition elsewhere in the filesystem hierarchy.> As far as I know, CMD has to keep track of the current > subdirectory. That would complicate any other treatment > for \ and :.That's only an issue if *it* hardcodes an algorithm for extracting current directory (instead of calling a filesystem specific *method* for doing so).
Reply by ●October 10, 20142014-10-10
Hi David, On 10/10/2014 4:32 AM, David Brown wrote:> On 10/10/14 11:06, glen herrmannsfeldt wrote: >> Don Y <this@is.not.me.com> wrote: >> >>> Does anyone know which filesystem naming constraints are imposed >>> in Windows itself vs. the file system layer? Said another way, >>> which constraints are *invariants* regardless of the filesystem? >> >>> E.g., can a non-native filesystem redefine '+' to replace "../"? >>> Or, allow support for ':' in names? Or, replace '\' with '/'? >>> Or, ... >> >> Interesting question. >> >> It is well known that the actual system calls accept either / or \ >> as a separator. The command option processor uses / for options, >> and so requires \ for the separator. >> >> The more interesting cases come when you use NFS. >> >> I was just reading yesterday that it is possible to have an NFS >> server on a Windows system that allows for case significant names. >> (Unlike most that are case preserving.) > > I believe you can store case-sensitive named files on NTFS as well - > it's part of the posix compliance of the filesystem (harking back to the > days when MS were still pretending to cooperate with other OS's).Preserving (and even *enforcing*) case doesn't guarantee that identifiers differing *solely* in case can coexist in the same container! E.g., ReadMe, READme, ReAdMe, etc.> Don, why are you asking about this? Are you trying to implement a > non-native filesystem and want to support as much as Windows allows? If > so, then the answer might depend on how that filesystem interacts with > Windows. (I am not sure that I can give you more information no matter > what you answer - but it might be of some help.)I don't have a filesystem. Rather, the typical filesystem concept is used to manage a universe of (possibly parallel) nested namespaces. Each object defines the rules for the namespace(s) that it exports. I.e., the valid syntax for identifying portions *of* that object. So, a "directory/folder" (to use a familiar concept) might support objects named: - ReadMe.txt - README.TXT - Read/\/\e - Garage:Door:Actuator - OutsideTemperature The Garage:Door:Opener object might support objects (methods) named "open" and "close" and "current_state". I want to be able to make any of these named objects accessible under various other environments. If the host OS('s) enforce their own concept of "what constitutes a name", then I either have to adopt names FOR EXPORTED OBJECTS that are compatible with those OS('s) -- i.e., some GCD thereof -- or provide a translation interface (create a parallel namespace for exported objects such that the exported names comply with the rules of the host OS). OTOH, if name parsing is left to the filesystem implementation (i.e., as it is in my implementation), then all I have to do is port my implementation to each of those host OS's.> Even if it is possible to redefine things like directory separators, it > might cause a lot more confusion and therefore not be worth implementing > (just like using files whose names differ only by letter case).I'm not keen on letting the tail wag the dog. If Windows has a limitation, that's Windows' problem. I'd be comfortable having Windows users bear the inconvenience of Windows' limitations (just like I wouldn't restrict myself to 8.3 names just to make life easy for DOS users).> Also consider path lengths in this - the path length limitations are > different within Windows itself and in NTFS.Yes. But I think much of that issue inherently "goes away" when addressed as an exported namespace. I.e., ONLY what the (Windows) user needs to see has to be made available to him/her. And, at some convenient "mount point" (I guess that's still "drive letter" in the MS world). So, the exported namespace can represent: some/very/long/traditional/pathname/to/a/file as "file" some/other/file as "file2" some.particular\method as "verb" That(&*^@$%(Fool as "A_Fine_Gentleman" [note I've tried to show how to accommodate the host's naming rules as well]
Reply by ●October 10, 20142014-10-10
On 10/10/2014 7:23 AM, Don Y wrote:> So, the exported namespace can represent: > some/very/long/traditional/pathname/to/a/file as "file" > some/other/file as "file2" > some.particular\method as "verb" > That(&*^@$%(Fool as "A_Fine_Gentleman"In case it wasn't obvious: to the "external system", this LOOKS like a single "directory" with four names in it. It is *flat*! (despite the fact that the objects all existed at different "places"/levels in the system exporting them)
Reply by ●October 10, 20142014-10-10
On 10/10/14 16:23, Don Y wrote:> Hi David, > > On 10/10/2014 4:32 AM, David Brown wrote: >> On 10/10/14 11:06, glen herrmannsfeldt wrote: >>> Don Y <this@is.not.me.com> wrote: >>> >>>> Does anyone know which filesystem naming constraints are imposed >>>> in Windows itself vs. the file system layer? Said another way, >>>> which constraints are *invariants* regardless of the filesystem? >>> >>>> E.g., can a non-native filesystem redefine '+' to replace "../"? >>>> Or, allow support for ':' in names? Or, replace '\' with '/'? >>>> Or, ... >>> >>> Interesting question. >>> >>> It is well known that the actual system calls accept either / or \ >>> as a separator. The command option processor uses / for options, >>> and so requires \ for the separator. >>> >>> The more interesting cases come when you use NFS. >>> >>> I was just reading yesterday that it is possible to have an NFS >>> server on a Windows system that allows for case significant names. >>> (Unlike most that are case preserving.) >> >> I believe you can store case-sensitive named files on NTFS as well - >> it's part of the posix compliance of the filesystem (harking back to the >> days when MS were still pretending to cooperate with other OS's). > > Preserving (and even *enforcing*) case doesn't guarantee that > identifiers differing *solely* in case can coexist in the same > container! E.g., ReadMe, READme, ReAdMe, etc.In theory (I haven't tried this), you can create multiple files in an NTFS directory that differ only by case, because it is required for posix compatibility. But even on *nix systems, where this works perfectly well, it is not recommended practice because it can easily confuse people.> >> Don, why are you asking about this? Are you trying to implement a >> non-native filesystem and want to support as much as Windows allows? If >> so, then the answer might depend on how that filesystem interacts with >> Windows. (I am not sure that I can give you more information no matter >> what you answer - but it might be of some help.) > > I don't have a filesystem. Rather, the typical filesystem concept > is used to manage a universe of (possibly parallel) nested namespaces. > Each object defines the rules for the namespace(s) that it exports. > I.e., the valid syntax for identifying portions *of* that object. > > So, a "directory/folder" (to use a familiar concept) might support > objects named: > - ReadMe.txt > - README.TXT > - Read/\/\e > - Garage:Door:Actuator > - OutsideTemperature > > The Garage:Door:Opener object might support objects (methods) named "open" > and "close" and "current_state". > > I want to be able to make any of these named objects accessible under > various other environments. If the host OS('s) enforce their own > concept of "what constitutes a name", then I either have to adopt names > FOR EXPORTED OBJECTS that are compatible with those OS('s) -- i.e., > some GCD thereof -- or provide a translation interface (create a > parallel namespace for exported objects such that the exported names > comply with the rules of the host OS). > > OTOH, if name parsing is left to the filesystem implementation (i.e., > as it is in my implementation), then all I have to do is port my > implementation to each of those host OS's. > >> Even if it is possible to redefine things like directory separators, it >> might cause a lot more confusion and therefore not be worth implementing >> (just like using files whose names differ only by letter case). > > I'm not keen on letting the tail wag the dog. If Windows has a limitation, > that's Windows' problem. I'd be comfortable having Windows users bear the > inconvenience of Windows' limitations (just like I wouldn't restrict myself > to 8.3 names just to make life easy for DOS users). > >> Also consider path lengths in this - the path length limitations are >> different within Windows itself and in NTFS. > > Yes. But I think much of that issue inherently "goes away" when addressed > as an exported namespace. I.e., ONLY what the (Windows) user needs to see > has to be made available to him/her. And, at some convenient "mount point" > (I guess that's still "drive letter" in the MS world). > > So, the exported namespace can represent: > some/very/long/traditional/pathname/to/a/file as "file" > some/other/file as "file2" > some.particular\method as "verb" > That(&*^@$%(Fool as "A_Fine_Gentleman" > > [note I've tried to show how to accommodate the host's naming rules as > well]If I understand you correctly (I'm not sure I do fully - but that's okay for now), then I suspect your best idea is to restrict your names to plain English alphabet letters, assuming case-preservation but not case-sensitivity, with a few specific punctuation symbols. Treat "/" as a directory separator - it should work fine on almost any reasonable system. Punctuation that usually works without trouble is ".", "_", "-" and "+". Most other symbols will work in some circumstances, but can cause issues in particular cases (such as needed escapes or inverted commas when used from the command line). So if you want a namespace separator, without causing directory changes or other complications, pick one of "._-+".
Reply by ●October 10, 20142014-10-10
On 10.10.2014 г. 17:26, Don Y wrote:> On 10/10/2014 7:23 AM, Don Y wrote: >> So, the exported namespace can represent: >> some/very/long/traditional/pathname/to/a/file as "file" >> some/other/file as "file2" >> some.particular\method as "verb" >> That(&*^@$%(Fool as "A_Fine_Gentleman" > > In case it wasn't obvious: to the "external system", this LOOKS > like a single "directory" with four names in it. It is *flat*! > (despite the fact that the objects all existed at different > "places"/levels in the system exporting them) >Hi Don, as David already suggested your best chance is to limit what you accept (and thus have to process) as much as practical. The problems are by far not just related to how this or that character is treated; e.g. at the moment I am struggling with how I process names in _my_ dps scripts using _my_ longnamed directories to be compatible with _my_ older 8.4 directories. The worst issue I have is with names containing spaces; treating those in a script makes word count variable, preserving space count between words must be addressed, forwarding a name which was in quotation marks which get "eaten" during the former processing etc. etc. If you can afford to disallow spaces in names things will be much simpler. Which is not really practical as other systems do allow spaces, so we have to handle this as well. As a side note, "the right way" to treat file names is to preserve the case information and to ignore it during file search (i.e. aaa and AAA locate the same file). Obviously unix machines will have the improper case treatment problem forever but it is their problem after all, they have saved a few minutes of thinking back when they created it. My solution to that was to do the right thing and allow also the wrong thing to be done; i.e. if you compare the textual part of the name (which is stored as upper (or was it lower) case only) you locate the same file, if you want to distinguish also by case you have to compare another few bytes which carry the case bits for the textual part. But I am not sure I even made a call doing the case dependent search, if I do I have never used it so far anyway. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Reply by ●October 10, 20142014-10-10
Don Y wrote:> Does anyone know which filesystem naming constraints are imposed > in Windows itself vs. the file system layer? Said another way, > which constraints are *invariants* regardless of the filesystem? > > E.g., can a non-native filesystem redefine '+' to replace "../"? > Or, allow support for ':' in names? Or, replace '\' with '/'? > Or, ...Here is what Microsoft has to say on the topic: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx TL;DR, the reserved characters are - "\", "/" for the path separator - ":" for the drive letter separator, and to separate file names and alternate data streams - "?" and "*" because they are used in wildcards - ">", "<", "|", and """ because they are used in shell syntax The 'CreateFile' function refuses to create files with these names. It might be possible to use some lower-level interface to create such files, but that wouldn't get you much as no other program would be able to access them. Thus, for the specific questions: a file system driver could interpret "+" to mean the parent directory. It can also somehow support ":" within file names (at least NTFS manages to do that somehow for alternate data streams). Replacing "\" with "/" is automatic in some sense, because the system already supports both as the directory separator. Stefan
Reply by ●October 10, 20142014-10-10
On 10/10/14 18:17, Dimiter_Popoff wrote:> As a side note, "the right way" to treat file names is to preserve > the case information and to ignore it during file search (i.e. > aaa and AAA locate the same file). Obviously unix machines will > have the improper case treatment problem forever but it is > their problem after all, they have saved a few minutes of > thinking back when they created it. My solution to that was to > do the right thing and allow also the wrong thing to be done; > i.e. if you compare the textual part of the name (which is stored > as upper (or was it lower) case only) you locate the same file, > if you want to distinguish also by case you have to compare > another few bytes which carry the case bits for the textual > part. But I am not sure I even made a call doing the case dependent > search, if I do I have never used it so far anyway. >This is getting a bit off topic from Don's question, but it is very clearly a matter of opinion as to what is "the right way" to handle file name cases. I would say that the unix way is the only sensible way, because it is transparent and consistent - the system does not care about the characters used, and does not make any artificial and language-specific distinctions. When people started using UTF-8 filenames, the unix way needed no changes, and everything continues as normal. But if you start trying to say that some characters are equivalent to particular other characters, you have an endless task as soon as you start looking at anything other than plain English. How is the OS, or programs running on it, going to figure out that é and É are the same letter but different capitalisation? And why artificially decide that small and large letters in the English alphabet are equivalent - what about other combinations that are considered the same letter(s) in other languages? In German, you sometimes use "ss" and sometimes "ß" - they should be treated the same. In Norwegian, in some cases you would consider "aa" the same as "å" (this causes great fun when sorting names - "å" comes at the end of the alphabet, but names beginning with "Aa" should be sorted alongside those beginning with "Å"). In Arabic, there are (AFAIUI) 5 different characters for each letter - should these be considered equivalent in file names? I have no doubt that in a well-designed OS and filesystem, you either have to treat different cases as completely distinct, or you limit the whole system to 7-bit ASCII.







