Dimiter_Popoff wrote:> On 15.10.2014 г. 20:57, Hans-Bernhard Bröker wrote: >> Am 14.10.2014 um 12:28 schrieb Dimiter_Popoff: >>> Sorry David, I don't do religion. >> >> I have to call BS on that. The way you've reacted to people voicing any >> different opinion on this issue is a textbook-grade example case of >> exactly what happens when people have their religious dogmas challenged. >> >> So: not only do you do religion quite heavily. This issue evidently >> _is_ your religion. > > Whatever you say. Call again when you can point us to an example where > I was wrong or doing dogma.You have carefully ignored examples given, and stressed that things "have never worked and still don't work", without outlining what you would consider "working" and why. That's closer to religion than to technical debate for me. "Earth was never round, and still isn't." Stefan
Filesystem syntax constraints under Windows
Started by ●October 10, 2014
Reply by ●October 16, 20142014-10-16
Reply by ●October 16, 20142014-10-16
On 16.10.2014 г. 19:16, Stefan Reuther wrote:> Dimiter_Popoff wrote: >> On 15.10.2014 г. 20:57, Hans-Bernhard Bröker wrote: >>> Am 14.10.2014 um 12:28 schrieb Dimiter_Popoff: >>>> Sorry David, I don't do religion. >>> >>> I have to call BS on that. The way you've reacted to people voicing any >>> different opinion on this issue is a textbook-grade example case of >>> exactly what happens when people have their religious dogmas challenged. >>> >>> So: not only do you do religion quite heavily. This issue evidently >>> _is_ your religion. >> >> Whatever you say. Call again when you can point us to an example where >> I was wrong or doing dogma. > > You have carefully ignored examples given, and stressed that things > "have never worked and still don't work", without outlining what you > would consider "working" and why. That's closer to religion than to > technical debate for me. "Earth was never round, and still isn't." > > > Stefan >Your best argument so far is "file names are not for humans but for programs". It does not get a lot more laughable than that, you may want to stop trying to prove the Earth is flat indeed.
Reply by ●October 20, 20142014-10-20
On 10/10/2014 9:42 AM, Stefan Reuther wrote:> Don Y wrote: >> Does anyone know which filesystem naming constraints are imposed >> in Windows itself vs. the file system layer? Said another way, >> which constraints are *invariants* regardless of the filesystem? >> >> E.g., can a non-native filesystem redefine '+' to replace "../"? >> Or, allow support for ':' in names? Or, replace '\' with '/'? >> Or, ... > > Here is what Microsoft has to say on the topic: > http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx > > TL;DR, the reserved characters are > - "\", "/" for the path separator > - ":" for the drive letter separator, and to separate file names > and alternate data streams > - "?" and "*" because they are used in wildcards > - ">", "<", "|", and """ because they are used in shell syntaxC:\> cd \SfU C:\SfU> mkdir XXX C:\SfU> cd XXX C:\SfU\XXX> PATH=..\bin C:\SfU\XXX> touch AAA C:\SfU\XXX> touch aaa C:\SfU\XXX> touch A:a C:\SfU\XXX> touch A'a C:\SfU\XXX> touch A`a C:\SfU\XXX> touch B?b C:\SfU\XXX> touch C*c C:\SfU\XXX> ls A'a A:a AAA A`a B?b C*c aaa It doesn't seem possible to embed redirection operators in filenames regardless of quoting. (e.g., A>a, A<a, A|a) This differs from UN*X shells. This suggests those operators are processed early in Interix's shell -- before quoting! Note that Windows Explorer lists these file names as above with the exception that the ':' and '?' characters are presented as a box and "C*c" appears as "Cc". (IIRC, the "boxes" are not identical characters so "A:a" and "A?a" would appear as two entries in the Explorer window yet indicate different files.) The more interesting issue is how Windows handles these files when you try to manipulate any of them. E.g., attempting to delete "AAA" will prompt you to delete "AAA" and then "AAA" (aka "aaa")!> The 'CreateFile' function refuses to create files with these names. It > might be possible to use some lower-level interface to create such > files, but that wouldn't get you much as no other program would be able > to access them.On the contrary, even Windows Explorer can SEE and ACCESS them! C:\SfU\XXX> ls > foo C:\SfU\XXX> cat foo A'a A:a AAA A`a B?b C*c aaa foo C:\SfU\XXX> cp foo B?b C:\SfU\XXX> rm foo C:\SfU\XXX> ls A'a A:a AAA A`a B?b C*c aaa C:\SfU\XXX> cat B?b A'a A:a AAA A`a B?b C*c aaa foo C:\SfU\XXX> mv B?b B?b.txt C:\SfU\XXX> ls A'a A:a AAA A`a B?b.txt C*c aaa Now, double-click on "B[box]b.txt" in Windows Explorer and see the contents of foo. (B?b.txt) Unfortunately, the rules Windows (and Interix) seems to follow aren't terribly obvious (on casual inspection). I should try the same exercise from NFS (client and server) to see how yet another vendor's code behaves under Windows.
Reply by ●October 21, 20142014-10-21
Don Y wrote:> C:\SfU\XXX> ls > A'a A:a AAA A`a B?b C*c aaa > > It doesn't seem possible to embed redirection operators > in filenames regardless of quoting. (e.g., A>a, A<a, A|a) > This differs from UN*X shells. This suggests those operators > are processed early in Interix's shell -- before quoting! > > Note that Windows Explorer lists these file names as above > with the exception that the ':' and '?' characters are presented > as a box and "C*c" appears as "Cc".This seems to me like it is using some Unicode character which looks like ":" or "?" when displayed on the console, but is actually something else. Doing something like 'ls | od -vtx1', or 'ls > list.txt' and examining 'list.txt' with a hex editor might enlighten us.> The more interesting issue is how Windows handles these > files when you try to manipulate any of them. E.g., > attempting to delete "AAA" will prompt you to delete "AAA" > and then "AAA" (aka "aaa")!That would be the result of Explorer using the SHFileOperation function, which internally uses a FindFirstFile/FindNextFile loop to support wildcards. This loop will interpret "aaa" as a pattern which matches "aaa" and "AAA".> Unfortunately, the rules Windows (and Interix) seems to follow > aren't terribly obvious (on casual inspection).Which is precisely my argument against all this character set fiddling in a kernel :) Stefan
Reply by ●October 21, 20142014-10-21
Hi Don, On 20.10.2014 г. 19:45, Don Y wrote:> ... > > The more interesting issue is how Windows handles these > files when you try to manipulate any of them. E.g., > attempting to delete "AAA" will prompt you to delete "AAA" > and then "AAA" (aka "aaa")!How do you manage to put files with the same name (case being ignored) into a directory? I don't know what and how windows does about this but in dps you can do this only if you hex dump into the directory file. After that obviously if you search for aaa* you will find as many hits as there are in the directory. If you search for just "aaa" the first match will be considered last (i.e. a directory with duplicate file names in it is considered broken) [actually searching non-ambiguous names goes through a different routine which compares on a 32-bit basis, as fast as the CPU would allow this to be done]. I can't see how one can do this through the user interface of windows, either. If you want to copy files with duplicate names (i.e. coming from a unix filesystem) the only correct way is to rename the file(s), e.g. by appending some unique sequential number or sort of. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Reply by ●October 21, 20142014-10-21
On 10/21/2014 9:47 AM, Stefan Reuther wrote:> Don Y wrote: >> C:\SfU\XXX> ls >> A'a A:a AAA A`a B?b C*c aaa >> >> It doesn't seem possible to embed redirection operators >> in filenames regardless of quoting. (e.g., A>a, A<a, A|a) >> This differs from UN*X shells. This suggests those operators >> are processed early in Interix's shell -- before quoting! >> >> Note that Windows Explorer lists these file names as above >> with the exception that the ':' and '?' characters are presented >> as a box and "C*c" appears as "Cc". > > This seems to me like it is using some Unicode character which looks > like ":" or "?" when displayed on the console, but is actually something > else.No. All 7b ASCII codepoints! ":" really *is* ':'...> Doing something like 'ls | od -vtx1', or 'ls > list.txt' and examining > 'list.txt' with a hex editor might enlighten us.That was the point of: C:\SfU\XXX> ls > foo C:\SfU\XXX> cat foo A'a A:a AAA A`a B?b C*c aaa foo It gets weirder... C:\SfU\XXX> dir /b A'a AAA aaa A`a A?a B?b C?c Note that "DOS" refuses to deal with the ':' and '*' characters and transforms them into '?' (which one would assume it would ALSO refuse to deal with!) Note, also, the different sort orders (which each differ from Windows Explorer's wacky rules).>> The more interesting issue is how Windows handles these >> files when you try to manipulate any of them. E.g., >> attempting to delete "AAA" will prompt you to delete "AAA" >> and then "AAA" (aka "aaa")! > > That would be the result of Explorer using the SHFileOperation function, > which internally uses a FindFirstFile/FindNextFile loop to support > wildcards. This loop will interpret "aaa" as a pattern which matches > "aaa" and "AAA".Yes, what was unexpected was the fact that a *single* file entry had been "selected" prior to invoking delete. I.e., their codebase assumes a "selection" can be non-unique.>> Unfortunately, the rules Windows (and Interix) seems to follow >> aren't terribly obvious (on casual inspection). > > Which is precisely my argument against all this character set fiddling > in a kernel :)Think of how much spaghetti code they must have in each of these "programs" ("commands")! I.e., if DIR sees a character in a name that it doesn't like, it maps it to '?'. If Explorer sees a character that it doesn't like (expect), it maps it to [box] (unless that character is '*' which it maps to 'nil') Sheesh! Talk about Principle of Least Surprise... :-/ Note, also, that "names" are processed differently depending on where they are encountered in the command line. E.g., ls > File:List.txt
Reply by ●October 21, 20142014-10-21
Hi Dimiter, On 10/21/2014 3:21 PM, Dimiter_Popoff wrote:> On 20.10.2014 г. 19:45, Don Y wrote: >> ... >> >> The more interesting issue is how Windows handles these >> files when you try to manipulate any of them. E.g., >> attempting to delete "AAA" will prompt you to delete "AAA" >> and then "AAA" (aka "aaa")! > > How do you manage to put files with the same name (case being ignored) > into a directory?I was taking a shortcut to avoid having to remotely mount the Windows disk as an NFS export (in which case, I hypothesized that I could create arbitrary file names from the NFS client). Instead, I used MS's Interix subsystem (essentially, UN*X tools that run under Windows -- hence "ls" instead of "DIR", "cat" instead of "TYPE"? in my examples).> I don't know what and how windows does about this but in dps you > can do this only if you hex dump into the directory file. After that > obviously if you search for aaa* you will find as many hits as > there are in the directory. If you search for just "aaa" the first > match will be considered last (i.e. a directory with duplicate > file names in it is considered broken) [actually searching non-ambiguous > names goes through a different routine which compares on a 32-bit > basis, as fast as the CPU would allow this to be done]. > I can't see how one can do this through the user interface of windows, > either.Windows is case preserving but case ignoring. So, IN THE ABSENCE OF ANY FILENAME CONFLICTS, I can create "AaA" and it will appear as "AaA" everywhere -- in Windows Explorer, in a DOS box looking at the folder's contents via DIR, etc. However, thereafter, any references to "AAA", "aaa", "aAa", etc. will all resolve to this initial "AaA". Under Interix, you bypass Windows' rules for names and write directly to the file system (disk media). So, I can create a file of an arbitrary name (well, not really... there are still some restrictions like I can't seem to embed '>' in a filename) even when a "case conflicting" filename exists. So, "touch AaA" creates a file called "AaA" while "touch aaa" creates ANOTHER file -- called "aaa". Windows Explorer is smart enough (dumb enough?) to display these as separate files. And, will know which one to "open" if I select it with mouse.> If you want to copy files with duplicate names (i.e. coming from a > unix filesystem) the only correct way is to rename the file(s), e.g. > by appending some unique sequential number or sort of.This is exactly the problem I encounter when trying to "manage" large file collections that originate in UN*X *under* Windows. E.g., Makefile and makefile collide in Windows' namespace. So, I end up with one or the other (depends on which order they are REcreated). Likewise, locore.S and locore.s, etc. Also, Windows has a trivial file/path name limitation that is regularly exceeded (while working IN windows as well as importing pieces of a file hierarchy from UN*X). See, also, the other caveats that I posted in my recent reply to Stefan (e.g., DIR silently transforms filenames) Bottom line, Windows is an annoyance. "If Microsoft is The Answer, you're asking the wrong Question!"
Reply by ●October 21, 20142014-10-21
Don Y <this@is.not.me.com> wrote: (snip, someone wrote)>> How do you manage to put files with the same name >> (case being ignored) into a directory?> I was taking a shortcut to avoid having to remotely mount the > Windows disk as an NFS export (in which case, I hypothesized > that I could create arbitrary file names from the NFS client).You might create files that Windows can't read, or doesn't like.> Instead, I used MS's Interix subsystem (essentially, UN*X tools > that run under Windows -- hence "ls" instead of "DIR", "cat" > instead of "TYPE"? in my examples).(snip)> Windows is case preserving but case ignoring. So, IN THE ABSENCE OF > ANY FILENAME CONFLICTS, I can create "AaA" and it will appear as "AaA" > everywhere -- in Windows Explorer, in a DOS box looking at the folder's > contents via DIR, etc.Well, there are more than one file systems used with Windows, and the rules might be different. FAT traditionally had an 8.3 (eight character name, three character extension) format. When they added longer names, the short names were still there, and might be considered the real name for the file. Some older utilities required them. For NTFS, I believe the longer names are really part of the file system and directory, not quite the same as for FAT. It might be that NTFS can still supply a short name for programs that require them.> However, thereafter, any references to "AAA", "aaa", "aAa", etc. will > all resolve to this initial "AaA".Even more fun in DOS days were files with names like COM3. I once had one on a disk (from a system with only two COM ports) and then brought the disk to a system with more COM ports. You can't get to the file! Even names like COM3.TXT still don't work.> Under Interix, you bypass Windows' rules for names and write directly to > the file system (disk media). So, I can create a file of an arbitrary > name (well, not really... there are still some restrictions like I can't > seem to embed '>' in a filename) even when a "case conflicting" filename > exists.> So, "touch AaA" creates a file called "AaA" while "touch aaa" creates > ANOTHER file -- called "aaa". Windows Explorer is smart enough (dumb > enough?) to display these as separate files. And, will know which > one to "open" if I select it with mouse.It might open them some way other than by name.>> If you want to copy files with duplicate names (i.e. coming from a >> unix filesystem) the only correct way is to rename the file(s), e.g. >> by appending some unique sequential number or sort of.> This is exactly the problem I encounter when trying to "manage" > large file collections that originate in UN*X *under* Windows. > E.g., Makefile and makefile collide in Windows' namespace. So, > I end up with one or the other (depends on which order they are > REcreated).> Likewise, locore.S and locore.s, etc.> Also, Windows has a trivial file/path name limitation that is > regularly exceeded (while working IN windows as well as importing > pieces of a file hierarchy from UN*X).Also, the filename parser allowed either / or \ as separators, while command line DOS commands required \. I believe you don't want either / or \ in the file name. (That is, not a separator.) -- glen
Reply by ●October 21, 20142014-10-21
Hi Don, On 22.10.2014 г. 01:51, Don Y wrote:> ... > > Under Interix, you bypass Windows' rules for names and write directly to > the file system (disk media).Well writing to a directory entry not through a system call would easily break the directory, sure. I don't have to tell you this is not the way you want to go in an end product (instability, impredictability issues - how will the next OS version treat these invalid entries etc. etc.).> So, "touch AaA" creates a file called "AaA" while "touch aaa" creates > ANOTHER file -- called "aaa". Windows Explorer is smart enough (dumb > enough?) to display these as separate files.That's not surprising, once you trick the filesystem with an invalid name entry it will not try to do much on it. When it lists a directory it will just go through all the entries and list.> ... And, will know which > one to "open" if I select it with mouse.That is more surprising to me. It means they go through some sideways to locate the clicked file, not by searching for it by name. In DPS, this could be done by using DEN (directory entry number, well it is not a number but pool_no:cluster really), i.e. you list the names on the menu, then for each menu entry you store the DEN and access the file subsequently based on that (possible but impractical). Or you can just keep all the files on the menu open and access not by name but by "registration" (i.e. "handle").>> If you want to copy files with duplicate names (i.e. coming from a >> unix filesystem) the only correct way is to rename the file(s), e.g. >> by appending some unique sequential number or sort of. > > This is exactly the problem I encounter when trying to "manage" > large file collections that originate in UN*X *under* Windows.Well I figured as much in the meantime :D . That was the underlying reason for your initial post I suppose.> E.g., Makefile and makefile collide in Windows' namespace. So, > I end up with one or the other (depends on which order they are > REcreated). > ..... > Bottom line, Windows is an annoyance.It probably is much worse than an annoyance to program under but in the example above I would point the finger at the person who has been shortsighted enough to create duplicate file names in the unix environment first, then on the way the unix filesystem is made to allow duplicate file names being created by users. I don't see how you can handle this situation without inserting a complete name handling layer between. For example, this is what I did in a similar situation - when one wants to copy * from a longnamed directory into a shortnamed (old, 8.4) one. Files get copied by just using the first up to 8 characters and up to 4 past the last "." character; if such a name has been used already creation will fail (duplicate file name) and the copy code before retrying will modify the destination name by replacing the last 4 name characters (I think) by the text hex. representation of a counter which gets incremented every time it is used. No other way around it, you either have to maintain the file name data case dependent (human readable) or case independent (in that case 8 bytes per file would be plenty). Some bridging between these two fundamentally different cases will always be necessary if they have to coexist. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Reply by ●October 21, 20142014-10-21
On 10/21/2014 4:27 PM, glen herrmannsfeldt wrote:> Don Y <this@is.not.me.com> wrote: > > (snip, someone wrote) >>> How do you manage to put files with the same name >>> (case being ignored) into a directory? > >> I was taking a shortcut to avoid having to remotely mount the >> Windows disk as an NFS export (in which case, I hypothesized >> that I could create arbitrary file names from the NFS client). > > You might create files that Windows can't read, or doesn't like.That was the point of opening B?b.txt from within Windows Explorer (double click) to verify that the "correct" file would, in fact, be handed to Notepad.exe and that Notepad would be able to open it!>> Instead, I used MS's Interix subsystem (essentially, UN*X tools >> that run under Windows -- hence "ls" instead of "DIR", "cat" >> instead of "TYPE"? in my examples). > > (snip) > >> Windows is case preserving but case ignoring. So, IN THE ABSENCE OF >> ANY FILENAME CONFLICTS, I can create "AaA" and it will appear as "AaA" >> everywhere -- in Windows Explorer, in a DOS box looking at the folder's >> contents via DIR, etc. > > Well, there are more than one file systems used with Windows, and the > rules might be different. > > FAT traditionally had an 8.3 (eight character name, three character > extension) format. When they added longer names, the short names > were still there, and might be considered the real name for the > file. Some older utilities required them.Yes. If you look at the raw directory contents, the encoding is fairly obvious -- as well as how it was "backward compatible" with an older DOS system (so you could access files created with LFN's on a machine that just runs DOS 4, etc.> For NTFS, I believe the longer names are really part of the file > system and directory, not quite the same as for FAT. It might be > that NTFS can still supply a short name for programs that require > them.NTFS apparently also stores the "file name character translation tables" in the medium.>> However, thereafter, any references to "AAA", "aaa", "aAa", etc. will >> all resolve to this initial "AaA". > > Even more fun in DOS days were files with names like COM3. I once > had one on a disk (from a system with only two COM ports) and then > brought the disk to a system with more COM ports. You can't get > to the file! Even names like COM3.TXT still don't work.MS OS's must be *littered* with "special cases".







