Hi Don,> But that's the difference: other OS's have "file systems". I don't. > I have "object namespaces". There are no "files" in my system.OK, I get that - in DPS this is the concept of runtime "objects", which have a 16 byte "name" but it is not for user consumption. I use text in it but this is just a matter of convenience when I code. When the system searches for an object it just compares bytes (well longwords in reality, it does that with filenames too actually :-) ). But we were talking of a "file system", and in a file system one does have files which do have names some of which are created by humans and supposed to be read/memorized by humans. At some level we do need the text for the name stored and searchable. If we just store the name text as bytes we end up needing twice the search overhead to do it case independent; which is why I think the unix makers back then left it case dependent, did not want to be bothered. It takes only a little - a bit per character - to do it the way it is done in DPS, and you have the best of both worlds. Case-free name information followed by the respective case bitstream. Somewhat more demanding to code but completely doable, was that for me anyway.> > I only need a bridge to other OS's (Windows in this example) to get > things into or out of the system. E.g., if a user wants to print a > diagnostic log from process 345345, he needs a "handle" to access > that (or, a mechanism that effectively/implicitly provides that handle).Well if you do not need to reproduce the names you get back you can simply hash the incoming names into what, 64 or may be just 32 bits and you are done. If you want to reproduce them forget it, just storing them as you got them is the only sensible way to go. Which does not preclude you from "hashing" (in fact you can just use the addresses of the stored names or sort of) for your internal purposes, of course.> > OTOH, why should the developer have to pick a unique N-bit number > to refer to "the task that runs the air conditioner"? And, another > one to refer to the diagnostic log generated by that task? etc. > > How many "objects" exist inside your cell phone for which NO "user > visible name" exists? Or, your "smart TV"/media tank?Oh come on Don, we all know the alphabet here, let's not go over it again.> > But, EVERY object has a name -- including those that are intended > to be (or potentially) visible to the user. The problem that I am > trying to address is how to make those names visible in a particular > environment (e.g., Windows) *without* impacting the choice that the > developer has to make in creating their "native names". > > It looks like the only way to accomplish this -- given Windows' > limitations/constraints -- is to create another "exported" namespace > that maps the developer's names to names that Windows will tolerate.Yes, for the dps objects I wrote earlier about I do that sort of thing by making them do "listname" (or whatever the action was called). The plainest of objects just paste as text ("paste" at an address in memory, that is... :-) ) their 16 byte ID, more sophisticated ones which must be shown to the user have something better to paste (which may be static or not). But generally there is not much else you can do, if you need two different names for a thing you need two separate names, what can you do. OTOH common file systems pose very little restrictions which will be in your way when you invent names during programming so I am not sure at all you have a real issue here. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Filesystem syntax constraints under Windows
Started by ●October 10, 2014
Reply by ●October 10, 20142014-10-10
Reply by ●October 10, 20142014-10-10
Hi Dimiter, On 10/10/2014 2:18 PM, Dimiter_Popoff wrote:>> But that's the difference: other OS's have "file systems". I don't. >> I have "object namespaces". There are no "files" in my system. > > OK, I get that - in DPS this is the concept of runtime "objects", > which have a 16 byte "name" but it is not for user consumption.Exactly. In my case, names are arbitrary length -- *like* file names in a modern OS. The cost of this is insignificant as a process typically has a *small* namespace. I.e., it knows nothing of objects that it is NOT SUPPOSED TO ACCESS! When the process tries to access ("open") an object, initially, the process's namespace is the ONLY place that is examined for the object name provided. If a match is not found, there, then the object does not exist (in the context of that process). In other words, if you don't want a process to be able to access an object, don't give the process any way of *referencing* the object in the first place. And, since the process's creator (parent) can only reference objects that exist in *its* namespace, once you remove any references to an object from one process's namespace, it is inaccessible by any of that process's offspring! There's no concept of a unified "global" namespace. So, you're never walking "long" paths from some "system root node".> I use text in it but this is just a matter of convenience whenExactly. "stdout" is far more meaningful than "1299". And, exactly *what* that "stdout" is bound to (in some "global" sense) is immaterial. A process never knows.> I code. When the system searches for an object it just compares > bytes (well longwords in reality, it does that with filenames > too actually :-) ). > > But we were talking of a "file system", and in a file system > one does have files which do have names some of which are created > by humans and supposed to be read/memorized by humans.The filesystem analogy is the only thing that "others" could relate to. My namespaces are disjoint. I.e., how does something running on one of your netmca's reference an object on *another* netmca? There's no sense of "global naming" that each is aware of. The "user" isn't creating objects (directly) in the same sense that a user creates "files" in a "filesystem". The user's *actions* result in objects being created. But, the user typically doesn't know -- or care -- what these are called.> At some > level we do need the text for the name stored and searchable. > If we just store the name text as bytes we end up needing > twice the search overhead to do it case independent; which is > why I think the unix makers back then left it case dependent, > did not want to be bothered. It takes only a little - a bit > per character - to do it the way it is done in DPS, and you > have the best of both worlds. Case-free name information followed > by the respective case bitstream. Somewhat more demanding to code but > completely doable, was that for me anyway. > >> >> I only need a bridge to other OS's (Windows in this example) to get >> things into or out of the system. E.g., if a user wants to print a >> diagnostic log from process 345345, he needs a "handle" to access >> that (or, a mechanism that effectively/implicitly provides that handle). > > Well if you do not need to reproduce the names you get back you > can simply hash the incoming names into what, 64 or may be just > 32 bits and you are done. If you want to reproduce them forget > it, just storing them as you got them is the only sensible way > to go.There are very few things that the user "injects" and may later want to "remove". I.e., few cases where the user needs to *pick* a name -- and, later, remember it! OTOH, there are places where the user (esp a developer-type) may want to inquire as to what's happening at some place in the system. Having to remember that "1299" is the error log for a particular process is tedious. Easier to give the process a unique name WHEN YOU DESIGNED IT and the error log that *it* creates an equally recognizable name WITHIN THAT CONTEXT so you can find it later without having to examine the equivalent of a "link map".> Which does not preclude you from "hashing" (in fact you can > just use the addresses of the stored names or sort of) for your > internal purposes, of course.As namespaces tend to be small (consider how many objects one of *your* processes encounters in its LIMITED scope of operation), you can adopt simple schemes for maintaining "handles" on those objects. E.g., one of my OS structures carries a name that is identical to it's location in memory! Sure makes it easy to *find* it! :>>> OTOH, why should the developer have to pick a unique N-bit number >> to refer to "the task that runs the air conditioner"? And, another >> one to refer to the diagnostic log generated by that task? etc. >> >> How many "objects" exist inside your cell phone for which NO "user >> visible name" exists? Or, your "smart TV"/media tank? > > Oh come on Don, we all know the alphabet here, let's not go over > it again. > >> But, EVERY object has a name -- including those that are intended >> to be (or potentially) visible to the user. The problem that I am >> trying to address is how to make those names visible in a particular >> environment (e.g., Windows) *without* impacting the choice that the >> developer has to make in creating their "native names". >> >> It looks like the only way to accomplish this -- given Windows' >> limitations/constraints -- is to create another "exported" namespace >> that maps the developer's names to names that Windows will tolerate. > > Yes, for the dps objects I wrote earlier about I do that sort of > thing by making them do "listname" (or whatever the action was called). > The plainest of objects just paste as text ("paste" at an address > in memory, that is... :-) ) their 16 byte ID, more sophisticated > ones which must be shown to the user have something better to paste > (which may be static or not). > But generally there is not much else you can do, if you need two > different names for a thing you need two separate names, what can you > do.Yes. In my case, a particular "physical" object (bad choice of words) can have a bunch of names -- each different (or the same!) but in different namespaces (even multiple references from within a single reference! e.g., stdout and stderr can resolve to the same "physical" object -- which can only be accessed by *this* process through these two names!)> OTOH common file systems pose very little restrictions which will > be in your way when you invent names during programming so I am > not sure at all you have a real issue here.Modern file systems impose naming constraints that are essentially arbitrary. Why can't I use '>' in a name? Oh, because some APPLICATION considers it a special symbol! Why can't I use ' ' in a name? Oh, because whitespace has historically delimited tokens. Why does the file name have to be "short"? Oh, because there is an arbitrarly low limit on total pathname length and you never know if some application (shell) will be called on to try to copy that file to a point in the hierarchy that has a long path prefix. (recall, each process in my world has it's own "root" node to its private namespace). Etc.
Reply by ●October 11, 20142014-10-11
On Fri, 10 Oct 2014 13:43:31 -0700, Don Y <this@is.not.me.com> wrote:>On 10/10/2014 9:42 AM, Stefan Reuther wrote: >> Don Y wrote: >>> Does anyone know which filesystem naming constraints are imposed >>> in Windows itself vs. the file system layer? Said another way, >>> which constraints are *invariants* regardless of the filesystem? >>> >>> E.g., can a non-native filesystem redefine '+' to replace "../"? >>> Or, allow support for ':' in names? Or, replace '\' with '/'? >>> Or, ... >> >> Here is what Microsoft has to say on the topic: >> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx >> >> TL;DR, the reserved characters are >> - "\", "/" for the path separator >> - ":" for the drive letter separator, and to separate file names >> and alternate data streamsI had completely forgotten the NTFS alternate data streams, since at least in early NT versions, there were several issues using these alternate streams. There are similar issues with file systems supporting multiple versions of a file, such as VMS with multiple versions with same name but different versions in the same directory. Version control software also save multiple versions of a file. These can be problematic, when trying to map these to a foreign system.>> - "?" and "*" because they are used in wildcards >> - ">", "<", "|", and """ because they are used in shell syntax >> >> The 'CreateFile' function refuses to create files with these names. It >> might be possible to use some lower-level interface to create such >> files, but that wouldn't get you much as no other program would be able >> to access them. > >This suggests I just come up with some klunky OBVIOUS algorithm to translate >my names to names compatible with Windows. And, push the problem of sorting >that mess out onto the Windows user -- in much the same way that the 8.3 >user has to guess at the shortened forms of LFN's.Sorting file names for human consumption is a very culture specific issue, even with ISO/IEC 8859-x not to mention Unicode. The sorting for display needs to be done at the user machine at the language selected by the currently logged in user preference. For internal data processing, strict binary sorting order could be used, but for user interaction, the cultural aspect should be noted. One aspect that I haven't seen discussed in this thread is that a "file" does not necessary have a single "name". While there may be physical allocations of blocks on a disk and there might be some kind of index file entry for those, there can be multiple directory entries with different file or multiple entries in multiple directories pointing to the same physical file. Various links and directory entries are used e.g. for user, protection or language specific views to create multiple views of a file.
Reply by ●October 11, 20142014-10-11
On 10/11/2014 1:06 AM, upsidedown@downunder.com wrote:> On Fri, 10 Oct 2014 13:43:31 -0700, Don Y <this@is.not.me.com> wrote:>>> The 'CreateFile' function refuses to create files with these names. It >>> might be possible to use some lower-level interface to create such >>> files, but that wouldn't get you much as no other program would be able >>> to access them. >> >> This suggests I just come up with some klunky OBVIOUS algorithm to translate >> my names to names compatible with Windows. And, push the problem of sorting >> that mess out onto the Windows user -- in much the same way that the 8.3 >> user has to guess at the shortened forms of LFN's. > > Sorting file names for human consumption is a very culture specific > issue, even with ISO/IEC 8859-x not to mention Unicode. The sorting > for display needs to be done at the user machine at the language > selected by the currently logged in user preference. For internal data > processing, strict binary sorting order could be used, but for user > interaction, the cultural aspect should be noted. > > One aspect that I haven't seen discussed in this thread is that a > "file" does not necessary have a single "name". While there may be > physical allocations of blocks on a disk and there might be some kind > of index file entry for those, there can be multiple directory entries > with different file or multiple entries in multiple directories > pointing to the same physical file. Various links and directory > entries are used e.g. for user, protection or language specific views > to create multiple views of a file.Recall that I am only using the concept of "files" to relate this to a conventional OS mechanism. In my case, each *name* (in a namespace) is backed by a particular server. Names that would be the equivalent of "files" (persistent data on some sort of medium) would be backed by a "file server". But, other names may be bound to things like dynamic kernel or process structures, system variables, hardware devices, etc. E.g., "time_of_day" may provide the current time of day (in some particular format) when read. "GarageDoor" may cause the garage door to open when the string "open" is written to it; close when "close" is supplied. The names in a namespace can be bound to a variety of different types of objects. As well as multiple instances of the *same* object. E.g., "time of day" and "now" can both be bound to the same object. Or, to different accessors on a single object. I think this is the solution to my problem: I can freely create ANOTHER namespace that I populate with "Windows compatible names" and export *this* to the Windows host. So, I could bind the name "ReadMe" in the exported namespace to the same object that the local system has named "Read/\/\e" (which would upset Windows' notion of a "proper" name to reside in the file system interface). At the same time, that same object can be referenced in YET ANOTHER namespace as "README" for export to a system that expects 8.3 names.
Reply by ●October 11, 20142014-10-11
Dimiter_Popoff wrote:> As a side note, "the right way" to treat file names is to preserve > the case information and to ignore it during file search (i.e. > aaa and AAA locate the same file).This will not work. Do "i" and "I" name the same file? (In Turkish, the upper-case version of "i" is "İ"; the lower-case version of "I" is "ı".) Do "MASSE" and "Maße" name the same file? And what about "MASZE"? (In German, the proper upper-case version of "ß" is "SS", with "SZ" being allowed when it would otherwise be ambiguous; this being the case in the example, where both "Masse" ("mass/weight") and "Maße" ("measurements/ dimensions") are proper words. An upper-case version of "ß" has been recently introduced, but is not in wide use.) The "I" ambiguity leads to a number of interesting bugs, such as this one: <https://bugs.php.net/bug.php?id=35050> I wouldn't want complex, environment-dependant code like that in any system I have to depend on, such as a kernel or file system. We already have it in mission-critical systems like the Domain Name System, and I'm not entirely happy with that. Stefan
Reply by ●October 11, 20142014-10-11
On 11.10.2014 г. 12:28, Stefan Reuther wrote:> Dimiter_Popoff wrote: >> As a side note, "the right way" to treat file names is to preserve >> the case information and to ignore it during file search (i.e. >> aaa and AAA locate the same file). > > This will not work.Well it has worked for quite some time already.> Do "i" and "I" name the same file? (In Turkish, the upper-case version > of "i" is "İ"; the lower-case version of "I" is "ı".)Yes, I and i name the same file. No, I with two dots above it and I with one dot do not, these are different characters.> Do "MASSE" and "Maße" name the same file?No, they do not.> And what about "MASZE"? (In > German,Naming is not language specific, it is alphabet specific only. Various languages may have various alphabets. I would of course prefer if we all just used the Latin alphabet plain, as it is used in English, but there is no problem at all with the capitalization for its variations. So there is no problem storing the file case information the right way. Then if you want to store some caseless hieroglyphs you can do it by just leaving the case information blank (e.g. in DPS you have up to 255 bytes for text and as many bits for the corresponding case data). Language processing is something else and has nothing to do with names. Similar to the way we deal with a persons name, we do not translate it but we do spell it correctly whenever the alphabet we use would allow it.> The "I" ambiguity leads to a number of interesting bugs, such as this > one: <https://bugs.php.net/bug.php?id=35050>There is no ambiguity at all in using the Latin alphabet. If one chooses to introduce one he has to live with it. Your example is not related to how we store and process case information.> > I wouldn't want complex, environment-dependant code like that in any > system I have to depend on, such as a kernel or file system.Like it or not file systems deal with files and files are named and the names are for human consumption to a great part. So the way the names are stored and searched for belongs there, together with the alphabet rules which apply to reading/writing these names. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Reply by ●October 11, 20142014-10-11
On 11.10.2014 г. 11:06, upsidedown@downunder.com wrote: >....> One aspect that I haven't seen discussed in this thread is that a > "file" does not necessary have a single "name". While there may be > physical allocations of blocks on a disk and there might be some kind > of index file entry for those, there can be multiple directory entries > with different file or multiple entries in multiple directories > pointing to the same physical file. Various links and directory > entries are used e.g. for user, protection or language specific views > to create multiple views of a file. >In DPS I treat this as an error. It can happen, one could even reproduce it from a command line (e.g. by copying a directory to another file and by leaving the destination copy file type being that of a directory; will take just a little more typing than normal copy). But the "repair" function will capture that and will report an error, the only way around which would be to delete one of the directory entries pointing to the same (or overlapping) disk areas. [Repair walks all the directories and builds a new CAT (cluster allocation table)]. Sometimes this can occur inadvertently, say the system gets reset before the latest CAT has been updated and some newly allocated file stays "unallocated". Then upon boot some other file gets allocated over it (quite a mess really), the fix is to delete one of the two files. But copying the directory file can be useful, I have used it during some rescue missions. Then there is no problem to copy the directory file as a non-directory type as a sort of backup, repair will not analyze it then. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Reply by ●October 11, 20142014-10-11
On 11/10/14 16:13, Dimiter_Popoff wrote:> On 11.10.2014 г. 12:28, Stefan Reuther wrote: >> Dimiter_Popoff wrote: >>> As a side note, "the right way" to treat file names is to preserve >>> the case information and to ignore it during file search (i.e. >>> aaa and AAA locate the same file). >> >> This will not work. > > Well it has worked for quite some time already.It has "worked" in the sense that people live with it despite the inadequacies, inconsistencies and complications such as massive amounts of locale-dependent code.> >> Do "i" and "I" name the same file? (In Turkish, the upper-case version >> of "i" is "İ"; the lower-case version of "I" is "ı".) > > Yes, I and i name the same file. No, I with two dots above it and I with > one dot do not, these are different characters.Did you fail to read what Stefan wrote? In Turkish, I and i are not the same letter. I with two dots is a different case altogether - in some languages, a glyph like "ï" is considered a letter completely distinct from "i" or "ı", while in other languages it might be considered an accented form of a normal "i".> >> Do "MASSE" and "Maße" name the same file? > > No, they do not.Why should they not, when "MASSE" is the capitalised version of "Maße" and you want capitalised versions to refer to the same file? To a German speaker, it is exactly the same as "Readme" and "README" being the same.> >> And what about "MASZE"? (In >> German, > > Naming is not language specific, it is alphabet specific only.That is completely and utterly incorrect, and is perhaps the basis for your misunderstandings here. Different languages can use the same alphabet in different ways, and it is not uncommon for them to have variations (such as accents or additional letters) that are treated in wildly different ways from others who use the same accents or letters. There are many languages and alphabets where the glyphs used for particular letters vary according to their position in a word or sentence - the appearance is different and yet they are the same "letter".> Various languages may have various alphabets. I would of course > prefer if we all just used the Latin alphabet plain, as it is used > in English, but there is no problem at all with the capitalization > for its variations. So there is no problem storing the file case > information the right way. Then if you want to store some caseless > hieroglyphs you can do it by just leaving the case information > blank (e.g. in DPS you have up to 255 bytes for text and as many > bits for the corresponding case data).There are only two possible ways to handle naming consistently and rationally in an operating system. You can restrict everything to the plain ANSI character set, in which case you can choose to make names case independent if you want. Or you can make it completely transparent and provide no interpretation beyond a minimum number of "special" characters such as "/". That way you leave it up to applications or libraries to decide how to deal with capitalisation, sorting, etc. - it is not part of the basic OS or filesystem.> > Language processing is something else and has nothing to do with names.Naming is highly language-dependent.> Similar to the way we deal with a persons name, we do not translate > it but we do spell it correctly whenever the alphabet we use would > allow it.People translate their names all the time. Usually they are translated into something roughly similar but which can be pronounced and written in the other language - occasionally people choose to translate more significantly. Different languages handle names in different ways - in some languages names are declined according to how they are used, leading to even more variety.> > >> The "I" ambiguity leads to a number of interesting bugs, such as this >> one: <https://bugs.php.net/bug.php?id=35050> > > There is no ambiguity at all in using the Latin alphabet. If one chooses > to introduce one he has to live with it. Your example is not related to > how we store and process case information.So people who choose to be born in Turkey, and choose to be given names containing an "i" or an "ı", have only themselves to blame - and have to live with the consequences?> >> >> I wouldn't want complex, environment-dependant code like that in any >> system I have to depend on, such as a kernel or file system. > > > Like it or not file systems deal with files and files are named and the > names are for human consumption to a great part. So the way the names > are stored and searched for belongs there, together with the alphabet > rules which apply to reading/writing these names. > > Dimiter >For someone with a clearly Russian/Eastern European name and background, yet with a perfect grasp of English, you are remarkably provincial and demonstrate a serious lack of knowledge and understanding about language, alphabets, and names in an international context.
Reply by ●October 11, 20142014-10-11
On 11/10/14 10:06, upsidedown@downunder.com wrote:> On Fri, 10 Oct 2014 13:43:31 -0700, Don Y <this@is.not.me.com> wrote: > >> On 10/10/2014 9:42 AM, Stefan Reuther wrote: >>> Don Y wrote: >>>> Does anyone know which filesystem naming constraints are imposed >>>> in Windows itself vs. the file system layer? Said another way, >>>> which constraints are *invariants* regardless of the filesystem? >>>> >>>> E.g., can a non-native filesystem redefine '+' to replace "../"? >>>> Or, allow support for ':' in names? Or, replace '\' with '/'? >>>> Or, ... >>> >>> Here is what Microsoft has to say on the topic: >>> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx >>> >>> TL;DR, the reserved characters are >>> - "\", "/" for the path separator >>> - ":" for the drive letter separator, and to separate file names >>> and alternate data streams > > I had completely forgotten the NTFS alternate data streams, since at > least in early NT versions, there were several issues using these > alternate streams. >To my knowledge, the only successful use of alternate data streams in an NTFS file was a way to hide viruses without changing the apparent size of a file.> > One aspect that I haven't seen discussed in this thread is that a > "file" does not necessary have a single "name". While there may be > physical allocations of blocks on a disk and there might be some kind > of index file entry for those, there can be multiple directory entries > with different file or multiple entries in multiple directories > pointing to the same physical file. Various links and directory > entries are used e.g. for user, protection or language specific views > to create multiple views of a file. >In *nix systems, it is normal for there to be a layer of indirection - directory entries contain names and point to inodes, and inodes contain metadata (ownership, access dates, security flags, etc.) and point to lists of datablocks. It is therefore perfectly normal to have multiple directory entries pointing to the same data, and each directory entry has equal "status" as the "name" of the file. Systems can also have additional methods of connecting names to files, such as symbolic links. And on some file systems (such as btrfs), there is another layer of indirection beneath inodes so that different files that coincidentally share the same data can share the same data blocks, with copy-on-write mechanisms used to keep them logically independent.
Reply by ●October 11, 20142014-10-11
On 11.10.2014 г. 18:26, David Brown wrote:> On 11/10/14 16:13, Dimiter_Popoff wrote: >> On 11.10.2014 г. 12:28, Stefan Reuther wrote: >>> Dimiter_Popoff wrote: >>>> As a side note, "the right way" to treat file names is to preserve >>>> the case information and to ignore it during file search (i.e. >>>> aaa and AAA locate the same file). >>> >>> This will not work. >> >> Well it has worked for quite some time already. > > It has "worked" in the sense that people live with it despite the > inadequacies, inconsistencies and complications such as massive amounts > of locale-dependent code.It has worked for a few milennia, whether you like it or not. Just because a few programmers do not want to be bothered (or are incapable of) handling the naming conventions we have is no good reason to ask for a change. The above applies to the rest of your post, I really have no time explaining the alphabet. People learn it in primary school, I am sure you have been taught that. Just recall it. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/







