EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Filesystem syntax constraints under Windows

Started by Don Y October 10, 2014
On 22/10/14 14:19, Dimiter_Popoff wrote:
> On 22.10.2014 г. 14:11, David Brown wrote: > >> >>> 2. The user is not exposed to the bitstreams the OS stores but to to >>> text which consists of characters which are part of an alphabet, >>> for example the Latin alphabet as used in English has 26 characters. >> >> In other words, Windows mangles the names it is given. > > No. It reproduces the names exactly as the user has entered them. >
Don entered filenames, albeit unusual ones. These filenames were perfectly valid for the filesystem (NTFS), they were perfectly valid for the OS itself (Windows), and they were perfectly valid for the programs used to enter those names. But they got mangled by other parts of the key Windows software when he viewed them. Do you mean that because "a*b" has a non-letter, it is not a valid name? Or do you mean that Don doesn't count as a "user" because he is using filenames that you don't like?
>>> Of course he can hack his way into doing it, whether through some >>> MS written hack which you say is not a hack or otherwise. >> >> The Win32 API allows files to be created or opened using "posix >> semantics" for filenames, including case-sensitive files, characters >> such as ":" and "*" in filenames, and multiple files differing only in >> the case of their names. Even if you want to call the MS-supplied posix >> compatibility layer a "hack", I don't think the standard Win32 API is a >> hack. > > You may think whatever you want but using a low enough level call to > create invalid directory entries is a hack, whoever may have written > the code within the system call. > Non-hack application code does not go that low in order to defeat the > system-wide rules or compromise the system in other ways, there are > always plenty of opportunities to kill a system.
We are talking about the Win32 API "CreateFile" call here! This is how you create files in Windows programming. If that is a "hack", then /all/ windows programming is a "hack".
> >>> The problem remains and will remain, as unix does >>> not output names but file identifiers (names consist of text, remember >>> the alphabet and the character count). The fact that these identifiers >>> have been misused as text for decades does not mean much beyond >>> the expectations of hardcore unix users that the English alphabet >>> will suddenly begin to have 52 characters. >> >> This goes back to your unique idea that files have a sort of colloquial >> human-friendly nick-name that is a different concept from their >> "filename" that everyone else uses. > > Blimey, so it is my unique idea that file names are meant also for human > consumption/processing.
No, it is your unique idea that an OS has to treat filenames using human language rules because they are always for human processing and consumption - and it is your unique idea to distinguish between a "file name" and a "file identifier", where a "file name" is case insensitive and human friendly, while a "file identifier" can store the case and use other characters.
> > Are you sure you are in good health?
Please try to cut down on the insults or implied insults - this is a friendly technical discussion, not a name-calling contest.
> >> >> If we were to accept that idea, then /all/ systems have that "problem" - >> because no system will be happy with a file system that uses approximate >> names instead of concrete identifiers. By that I mean that >> "index.html", "Index.html", "Index.html", "Index", "The index file", and >> "The first page" are perfectly good human-friendly names for the first >> page of a website - but no OS or filesystem would accept them as >> alternatives for a file identified as "index.html". > > And you go further down the path into demonstrating that you are just > flailing madly being unable to accept the simple fact that you said > something stupid (can happen to everyone) and then defend that for > days and days (does not happen to everyone).
I agree that everyone says stupid things once in a while, especially on Usenet - I have done so often enough, and when I realise it, I have posted apologies or thanks as follow-ups. I - and others - have completely disagreed with you concerning filenames and case sensitivity. From where I stand, you came up with some unusual claims that are at odds to the rest of the world of filesystem and OS design, and have repeatedly labelled these as "obvious facts" while denying all the counter-evidence presented. Your argument boiled down to accusing others of religious delusions. Look back in this thread if you don't remember the details. (For your own OS, I assume that /your/ way of treating file names is the most suitable for the OS and its uses. That's fine. What I object to is your belief that it should apply to every OS, that other OS'es are fundamentally flawed because of their treatment of filenames, and that everyone else is "stupid" or "religious" for disagreeing with you.)
> >> I am just trying to correct your (apparent) misunderstanding about what >> Don was doing, and how Windows and NTFS treat filenames. > > Yeah, you always know better than everyone, I know. > Never mind you have no clue what we are talking about really. >
Some things I know, some things I learn. That's because I don't extrapolate the way /I/ do something to assuming it is the only right way to do it.
On 22.10.2014 г. 16:21, David Brown wrote:
> On 22/10/14 14:19, Dimiter_Popoff wrote: >> On 22.10.2014 г. 14:11, David Brown wrote: >> >>> >>>> 2. The user is not exposed to the bitstreams the OS stores but to to >>>> text which consists of characters which are part of an alphabet, >>>> for example the Latin alphabet as used in English has 26 characters. >>> >>> In other words, Windows mangles the names it is given. >> >> No. It reproduces the names exactly as the user has entered them. >> > > Don entered filenames, ...
No, he wrote a program to squeeze these names through the system. A user cannot enter names the system does not allow to be entered.
> Do you mean that because "a*b" has a non-letter, it is not a valid name?
It might be, "*" is typically a reserved symbol for communication between the user interface and the directory search code. If the user interface won't let you do it chances are it is illegal. If you don't know the answer to that try to copy an existing file named say abc.txt to a*a, see what the error message will be. (Sorry to the rest of the group, obviously everyone knows the answers to that but it is not my fault we go there - and at the moment I don't feel like letting it go just because someone is too pushy with his nonsense).
> Or do you mean that Don doesn't count as a "user" because he is using > filenames that you don't like?
Or do you mean that you would be wrongly labelled mental just because you want to have the name written on your identity card to consist only of hieroglyphs and special characters. Don is a "user" as long as he uses the user interface. A programmer and a user are not the same thing in that contest and please do not go into more bollocks on obvious definitions like that.
>>>> Of course he can hack his way into doing it, whether through some >>>> MS written hack which you say is not a hack or otherwise. >>> >>> The Win32 API allows files to be created or opened using "posix >>> semantics" for filenames, including case-sensitive files, characters >>> such as ":" and "*" in filenames, and multiple files differing only in >>> the case of their names. Even if you want to call the MS-supplied posix >>> compatibility layer a "hack", I don't think the standard Win32 API is a >>> hack. >> >> You may think whatever you want but using a low enough level call to >> create invalid directory entries is a hack, whoever may have written >> the code within the system call. >> Non-hack application code does not go that low in order to defeat the >> system-wide rules or compromise the system in other ways, there are >> always plenty of opportunities to kill a system. > > We are talking about the Win32 API "CreateFile" call here! This is how > you create files in Windows programming. If that is a "hack", then > /all/ windows programming is a "hack".
While I don't know how this is done under windows I think you know that even less than I do. Your above means claim that every application written for windows has to prepare the name such that it will be a valid one. It is obvious that normal windows applications do not write their own name validations, if some call allows writing an invalid name then reading the OS manual will tell you that you must use another call prior to it which validates the name. You should demonstrate at least such basic programming knowledge if you want people to take your posts as something more than standard "always know better" babble by someone who does not really know what he is talking about.
>> >>>> The problem remains and will remain, as unix does >>>> not output names but file identifiers (names consist of text, remember >>>> the alphabet and the character count). The fact that these identifiers >>>> have been misused as text for decades does not mean much beyond >>>> the expectations of hardcore unix users that the English alphabet >>>> will suddenly begin to have 52 characters. >>> >>> This goes back to your unique idea that files have a sort of colloquial >>> human-friendly nick-name that is a different concept from their >>> "filename" that everyone else uses. >> >> Blimey, so it is my unique idea that file names are meant also for human >> consumption/processing. > > No, it is your unique idea that an OS has to treat filenames using human > language rules because they are always for human processing and > consumption -
Ah, now you are trying to cheat your way out of the hole. No, I never said that. I said that file names are ALSO for human consumption. Try to spell over the phone a file name like "ThIs Is An eXamPle Of A nAMe foR iDioTs". Then come and repeat your claim that file names - or whatever names which are represented in text - are to be compared case sensitive. Or simply stop posting nonsense. Dimiter
Don Y wrote:
> On 10/21/2014 10:35 PM, George Neuner wrote: >> On Tue, 21 Oct 2014 15:38:20 -0700, Don Y <this@is.not.me.com> wrote: >>> Note that "DOS" refuses to deal with the ':' and '*' characters and >>> transforms them into '?' (which one would assume it would ALSO refuse >>> to deal with!) >> >> ? and * are filename wildcards in DOS and Windows both ... and DOS >> doesn't know about NTFS streams. > > Yes, but DOS sees "A*a" and "A:a" in the folder and maps BOTH of them > to "A?a" in the DIR listing!
"?" is the replacement character when it cannot map a Unicode character to the console code page. Maybe try this: mark the file name in Explorer, copy it into a text file with Notepad, and hex-dump that. Maybe it's the SFU subsystem which translates. This mailing list post https://www.mail-archive.com/linux-cifs@vger.kernel.org/msg09969.html indicates that they indeed map the reserved characters to Unicode characters above 0xF000. Stefan
Dimiter_Popoff wrote:
> On 22.10.2014 &#1075;. 16:21, David Brown wrote: >> On 22/10/14 14:19, Dimiter_Popoff wrote: >>> On 22.10.2014 &#1075;. 14:11, David Brown wrote: >>>>> 2. The user is not exposed to the bitstreams the OS stores but to to >>>>> text which consists of characters which are part of an alphabet, >>>>> for example the Latin alphabet as used in English has 26 characters. >>>> >>>> In other words, Windows mangles the names it is given. >>> >>> No. It reproduces the names exactly as the user has entered them. >> >> Don entered filenames, ... > > No, he wrote a program to squeeze these names through the system. > A user cannot enter names the system does not allow to be entered.
No, he did not "squeeze these names through the system". He used a program written by Microsoft, for Windows, in the way it is meant to be used, and it obviously let him enter the names. NTFS can (and always could) be configured to be case-sensitive, which the SFU/Interix tools use through the POSIX subsystem. Of course, the Win32 subsystem which uses a different configuration isn't particularily happy about that, much in the same way the virtual DOS subsystem isn't particularily happy about names that don't fit into the 8.3 convention. Stefan
On 10/22/2014 9:39 AM, Stefan Reuther wrote:
> Don Y wrote: >> On 10/21/2014 10:35 PM, George Neuner wrote: >>> On Tue, 21 Oct 2014 15:38:20 -0700, Don Y <this@is.not.me.com> wrote: >>>> Note that "DOS" refuses to deal with the ':' and '*' characters and >>>> transforms them into '?' (which one would assume it would ALSO refuse >>>> to deal with!) >>> >>> ? and * are filename wildcards in DOS and Windows both ... and DOS >>> doesn't know about NTFS streams. >> >> Yes, but DOS sees "A*a" and "A:a" in the folder and maps BOTH of them >> to "A?a" in the DIR listing! > > "?" is the replacement character when it cannot map a Unicode character > to the console code page.
Ah, OK. So, the next (theoretical) question would be what fopen(3c) would expect for such a file.
> Maybe try this: mark the file name in Explorer, copy it into a text file > with Notepad, and hex-dump that.
Ah, that's a good idea! Or, a Unicode editor...
> Maybe it's the SFU subsystem which > translates. This mailing list post > https://www.mail-archive.com/linux-cifs@vger.kernel.org/msg09969.html > indicates that they indeed map the reserved characters to Unicode > characters above 0xF000.
That is, in fact, how the [box] characters are (in Windows Explorer) (U+F03A, iirc, for the ':' and U+F03F for the '?'). Note that all the other characters[1] are displayed properly in explorer ('`) [1] The '*' appears to just "disappear" in explorer -- "A*a" displays as "Aa". Amusing little exercise that just increases the uncertainty that a user will know the "real" name for a file! :-/
Am 22.10.2014 um 01:27 schrieb glen herrmannsfeldt:

> Well, there are more than one file systems used with Windows, and the > rules might be different.
It's not the rules of the file systems that differ from each other. It's the rules of how other subsystems use the file system(s). The clue to the difference is that current NT-based Windows is actually a three-tier system: the Windows subsystem(s) on top of the NT kernel, on top of a hardware abstraction layer. File systems are part of the Kernel, and they're apparently required to be fully case-respecting. The silly "case-preserving, but not case-sensitive" behaviour must be implemented on top of that, by the Windows subsystem. The "Interix" tools Don Y talks about live in an alternative subsystem, distinct from Windows itself, that works directly with the kernel. So Interix can indeed use the same file systems, but do it differently than the Windows subsystem.
> For NTFS, I believe the longer names are really part of the file > system and directory, not quite the same as for FAT.
The difference isn't really that big. For reasons of compatibility, even NTFS directories have to maintain 8.3-format alias names for all entries whose names don't match that format.
> It might be > that NTFS can still supply a short name for programs that require > them.
It can't just supply one, it has to pick an 8.3 alias name at file creation time, and _store_ it along with the "real" one, because that alias has to remain unchanged for as long as the entry exists. Some truly braindead installers even presume that the path to their installation directory must "of course" be below "c:\progra~1", (just to avoid having to evaluate %ProgramFiles%).
On Wed, 22 Oct 2014 11:34:46 -0700, Don Y <this@is.not.me.com> wrote:

> >> Maybe try this: mark the file name in Explorer, copy it into a text file >> with Notepad, and hex-dump that. > >Ah, that's a good idea! Or, a Unicode editor...
Notepad is a Unicode editor. In fact it was nearly the only program in Win NT 3.51 that supported Unicode :-). Install "Arial Unicode MS" font (several tens of megabytes) and select it in Notepad and Notepad will show the file names correctly.
On Wed, 22 Oct 2014 21:00:03 +0200, Hans-Bernhard Br&#4294967295;ker
<HBBroeker@t-online.de> wrote:

>Am 22.10.2014 um 01:27 schrieb glen herrmannsfeldt: > >> Well, there are more than one file systems used with Windows, and the >> rules might be different. > >It's not the rules of the file systems that differ from each other. >It's the rules of how other subsystems use the file system(s). > >The clue to the difference is that current NT-based Windows is actually >a three-tier system: the Windows subsystem(s) on top of the NT kernel, >on top of a hardware abstraction layer. > >File systems are part of the Kernel, and they're apparently required to >be fully case-respecting. The silly "case-preserving, but not >case-sensitive" behaviour must be implemented on top of that, by the >Windows subsystem. > >The "Interix" tools Don Y talks about live in an alternative subsystem, >distinct from Windows itself, that works directly with the kernel. So >Interix can indeed use the same file systems, but do it differently than >the Windows subsystem. > >> For NTFS, I believe the longer names are really part of the file >> system and directory, not quite the same as for FAT. > >The difference isn't really that big. For reasons of compatibility, >even NTFS directories have to maintain 8.3-format alias names for all >entries whose names don't match that format. > >> It might be >> that NTFS can still supply a short name for programs that require >> them. > >It can't just supply one, it has to pick an 8.3 alias name at file >creation time, and _store_ it along with the "real" one, because that >alias has to remain unchanged for as long as the entry exists. > >Some truly braindead installers even presume that the path to their >installation directory must "of course" be below "c:\progra~1", (just >to avoid having to evaluate %ProgramFiles%).
I still do not understand what this thread is all about. If the intention is to mount foreign file systems directly (local drive) or over the network, you really have to use features that all systems support. There is not much point of trying to map the most awkward features of each system to every other file system. Some Linux based systems support Unicode file names in UTF-8, while Windows NTFS is UTF-16 based (but the supported characters might be different), while some Unix systems are Latin-1 based or just supporting 7 bit ASCII (upper and lower case). Some older 6 bit systems only supported upper case letters or just 40 symbols. IMHO, it is pointless to try to make very special mappings. In order to co-operate, you really have to forget any "purity" claims and try to find what is common in different systems.
On 22/10/14 16:26, Dimiter_Popoff wrote:
> On 22.10.2014 &#1075;. 16:21, David Brown wrote: >> On 22/10/14 14:19, Dimiter_Popoff wrote: >>> On 22.10.2014 &#1075;. 14:11, David Brown wrote: >>> >>>> >>>>> 2. The user is not exposed to the bitstreams the OS stores but to to >>>>> text which consists of characters which are part of an alphabet, >>>>> for example the Latin alphabet as used in English has 26 characters. >>>> >>>> In other words, Windows mangles the names it is given. >>> >>> No. It reproduces the names exactly as the user has entered them. >>> >> >> Don entered filenames, ... > > No, he wrote a program to squeeze these names through the system. > A user cannot enter names the system does not allow to be entered.
He typed "touch C*c". "touch" is an extremely common command on all posix systems, including the various versions of MS's "unix services for windows" throughout the history of Windows NT. So no, Don did not "write a program" or "hack the system" - he used a standard command line utility available to users.
> >> Do you mean that because "a*b" has a non-letter, it is not a valid name? > > It might be, "*" is typically a reserved symbol for communication > between the user interface and the directory search code. > If the user interface won't let you do it chances are it is illegal.
The point is that /some/ interfaces on Windows let you use this symbol (and other symbols, and filenames differing only in case), while other interfaces disallow entering such filenames, and mangle the view of such files. This is all using programs and utilities that come out of the box with Windows (at least the server and "ultimate" versions).
> If you don't know the answer to that try to copy an existing file > named say abc.txt to a*a, see what the error message will be. > (Sorry to the rest of the group, obviously everyone knows the answers > to that but it is not my fault we go there - and at the moment I > don't feel like letting it go just because someone is too pushy > with his nonsense). > >> Or do you mean that Don doesn't count as a "user" because he is using >> filenames that you don't like? > > Or do you mean that you would be wrongly labelled mental just because > you want to have the name written on your identity card to consist > only of hieroglyphs and special characters.
There are plenty of languages which are written in "hieroglyphs" or have special characters as part of their written form. I'm lucky - my name uses only characters from the 7-bit ASCII character set. But other people could certainly want their names written in non-ASCII characters.
> > Don is a "user" as long as he uses the user interface. A programmer > and a user are not the same thing in that contest and please do not > go into more bollocks on obvious definitions like that.
See above for a note on the "touch" user program.
> >>>>> Of course he can hack his way into doing it, whether through some >>>>> MS written hack which you say is not a hack or otherwise. >>>> >>>> The Win32 API allows files to be created or opened using "posix >>>> semantics" for filenames, including case-sensitive files, characters >>>> such as ":" and "*" in filenames, and multiple files differing only in >>>> the case of their names. Even if you want to call the MS-supplied >>>> posix >>>> compatibility layer a "hack", I don't think the standard Win32 API is a >>>> hack. >>> >>> You may think whatever you want but using a low enough level call to >>> create invalid directory entries is a hack, whoever may have written >>> the code within the system call. >>> Non-hack application code does not go that low in order to defeat the >>> system-wide rules or compromise the system in other ways, there are >>> always plenty of opportunities to kill a system. >> >> We are talking about the Win32 API "CreateFile" call here! This is how >> you create files in Windows programming. If that is a "hack", then >> /all/ windows programming is a "hack". > > While I don't know how this is done under windows I think you know that > even less than I do. Your above means claim that every application > written for windows has to prepare the name such that it will be a > valid one. > It is obvious that normal windows applications do not write their own > name validations, if some call allows writing an invalid name then > reading the OS manual will tell you that you must use another call > prior to it which validates the name.
Most programs would simply pass on the filename to the Win32 API (CreateFile, which by MS logic is also used to open existing files), and if the function returns an error about an invalid filename, that will be passed on to the user. Programs would normally only do their own validation if they had reason to be extra fussy. Note that the Win32 API supports two different semantics for file name validity, which you can choose - you can use "posix semantics", allowing case sensitive operation (and some extra characters), or "default windows semantics" with case insensitive operation. Either way, the function will check the validity of the operation requested.
> You should demonstrate at least such basic programming knowledge > if you want people to take your posts as something more than standard > "always know better" babble by someone who does not really know what > he is talking about.
Let me demonstrate my basic programming knowledge by my ability to look up CreateFile with google to find the MSDN page: <http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx> Looking up the documentation for the "touch" program, to see that it is a common user program and not a "hack" or a special program written by Don, is left as an exercise to the reader.
> >>> >>>>> The problem remains and will remain, as unix does >>>>> not output names but file identifiers (names consist of text, remember >>>>> the alphabet and the character count). The fact that these identifiers >>>>> have been misused as text for decades does not mean much beyond >>>>> the expectations of hardcore unix users that the English alphabet >>>>> will suddenly begin to have 52 characters. >>>> >>>> This goes back to your unique idea that files have a sort of colloquial >>>> human-friendly nick-name that is a different concept from their >>>> "filename" that everyone else uses. >>> >>> Blimey, so it is my unique idea that file names are meant also for human >>> consumption/processing. >> >> No, it is your unique idea that an OS has to treat filenames using human >> language rules because they are always for human processing and >> consumption - > > Ah, now you are trying to cheat your way out of the hole. No, I never > said that. I said that file names are ALSO for human consumption.
And it has been pointed out that humans use wildly different rules for how they write according to their language, alphabet (or non-alphabetic writing system), and even their personal habits - thus the OS should not try to second-guess them.
> > Try to spell over the phone a file name like "ThIs Is An eXamPle Of A > nAMe foR iDioTs". > Then come and repeat your claim that file names - or whatever names > which are represented in text - are to be compared case sensitive.
Filenames that are meant to be typed by humans (or read over the phone) should be chosen to make sense - but most humans will do that automatically. And there is /no/ requirement to be case insensitive for that - /I/ certainly have no problem saying "readme dot txt, all small letters" or "readme dot txt, with a capital r". No case-insensitive system is going to protect someone from a file name like "This is an exampel of a name for idiots". Case-insensitivity made sense when there were computers with 6-bit characters. And it is at least consistent when you are limited to 8.3 capital letters purely from the 7-bit ASCII set. (And it can still make sense on small, niche OS'es where you need a simple and limited system.) But on a modern multi-lingual general-purpose OS and filesystem? At best you get an inordinately complex, inconsistent system that works for some languages and not others - that's Window's solution.
> > Or simply stop posting nonsense. > > Dimiter >
On 22.10.2014 &#1075;. 19:46, Stefan Reuther wrote:
> Dimiter_Popoff wrote: >> On 22.10.2014 &#1075;. 16:21, David Brown wrote: >>> On 22/10/14 14:19, Dimiter_Popoff wrote: >>>> On 22.10.2014 &#1075;. 14:11, David Brown wrote: >>>>>> 2. The user is not exposed to the bitstreams the OS stores but to to >>>>>> text which consists of characters which are part of an alphabet, >>>>>> for example the Latin alphabet as used in English has 26 characters. >>>>> >>>>> In other words, Windows mangles the names it is given. >>>> >>>> No. It reproduces the names exactly as the user has entered them. >>> >>> Don entered filenames, ... >> >> No, he wrote a program to squeeze these names through the system. >> A user cannot enter names the system does not allow to be entered. > > No, he did not "squeeze these names through the system". He used a > program written by Microsoft, for Windows, in the way it is meant to be > used, and it obviously let him enter the names. > > NTFS can (and always could) be configured to be case-sensitive, which > the SFU/Interix tools use through the POSIX subsystem. Of course, the > Win32 subsystem which uses a different configuration isn't particularily > happy about that, much in the same way the virtual DOS subsystem isn't > particularily happy about names that don't fit into the 8.3 convention.
Apparently so. Though MS seem to be somewhat more than "particularly unhappy" about case sensitive name searches, I stumbled across their text in the link David looked up demonstrating his programmer's skills saying "do not assume case sensitivity", "can be used to but" etc., basically what I say about DPS - which is also perfectly capable of case dependent compares - and does these as well as the case independent ones likely faster than the rest, it compares 32 bit words - but I expect one would run into problems when going into not that chartered territory, just as Don did with MS. Dimiter
The 2026 Embedded Online Conference