Hi Dimiter, On 5/22/2015 2:32 PM, Dimiter_Popoff wrote:> On 22.5.2015 г. 21:06, George Neuner wrote: >> On Fri, 22 May 2015 14:57:24 +0300, Dimiter_Popoff <dp@tgi-sci.com> >> wrote: >> >>> On 21.5.2015 ?. 14:11, George Neuner wrote: >>>> >>>>>> However, this has implications - new directory entry type (so the >>>>>> data gets deallocated only for one of the directory entries) or just >>>>>> making a "link" type directory entry which may easily turn out >>>>>> to point to nothing (user deletes the file it points to). Solvable >>>>>> issues but adding complexity and no serious benefit. >>>> >>>> That's why Unix indirects through the inode - the complexity is >>>> necessary to correct operation. >>> >>> Yes, like I said solvable issues - but the added expense just does not >>> justify the benefit this give you. The inode must link back to all >>> directory entries pointing to it for this to work - they probably do >>> that - which has further implications, e.g. do they store the path >>> name as text - this must then be resolved for every access - or do >>> they store medium specific (LBN etc.) data - this will take special >>> processing when copying (my guess is they have opted for the first). >> >> No. Links are one way: from directory entries to the inode. The >> inode is reference counted and persists as long as there is at least >> one directory entry which references it. > > Thanks for the explanation. This way it would be impractical to have > directory entries from different directories pointing to one inode, > or are they doing it? Would take a lot of directory digging before > an inode is deleted so my guess is obviously "no", but it is only > a guess.You don't delete an inode; you unlink directory entries from it (this is the operation that is performed when you "delete" a file... you sever the linkage between a particular *name* and the inode that it references). Each "unlink()" decreases the reference count as the reference is removed. When the last reference is unlink()-ed, you now have no way to reference the inode so it is *then* deleted. A consequence of this is that you can only "link" (i.e., add another *name* that references a particular inode) names residing on the same volume together. E.g., if: /some/particular/pathname /some/long/name each reside on the same volume, then: /some/particular/pathname/file1 /some/particular/pathname/file2 /some/particular/pathname/file3 /some/long/name/fileA /some/long/name/fileB can all reference the same "physical" file. (These are called "hard links" -- see below) If, however: /another/point/in/the/filesystem is on a *different* volume, then you can't link to the file referenced above -- because the other volume has no ties to the first one. For that reason, there are "soft links" (symbolic links) that can be located anywhere and, effectively, encode the referenced pathname *in* the symlink. So, the inode for: /another/point/in/the/filesystem/othername would actually "contain" the equivalent of: "/some/particular/pathname/file1" When /another/point/in/the/filesystem/othername is referenced, it is seen to be such a structure and it's contents are used to find the actual file. Note that a symlink can reference another symlink, etc. So, this resolution process can be extended arbitrarily. A consequence of this symlinks implementation is that the *pathname* which the symlink references can be *deleted* and the symlink is never notified. E.g., if: /some/particular/pathname/file1 is unlinked ("deleted"), then any later attempt to access /another/point/in/the/filesystem/othername will result in "file not found". This is because the *name* "/another/point/in/the/filesystem/othername" persists but the name that it *references*, "/some/particular/pathname/file1", doesn't -- even though the *file* that it was intended to reference is still present as: /some/particular/pathname/file2 /some/particular/pathname/file3 /some/long/name/fileA /some/long/name/fileB I.e., there's no way of knowing how many symlinks reference a particular name when that name is "unlinked"! Nor of sorting out where they reside in the file system(s) -- some of which may not even be *mounted* at the present time!>> The inode maintains the structure and security information for the >> file. Directory entries are just a name and an inode reference. >> >> Inodes for open files are cached in memory and updates are lazily >> written back to disk [unless you deliberately (f)sync]. There is a >> lock on the in-memory cached inode, but it is used only for updates to >> the inode itself to coordinate sync flushes. File content locks are >> handled separately through a filesystem service - content locks are >> neither in nor on the inode. > > In DPS there is no separate lock for the RIB - since I do not allow > multiple directory references to the same data. Even if I would opt > to implement it, I would make a different directory entry type (the > entry type is 5 or 6 bits I think (it is a byte and I don't remember > how many bits of it I used for flags). This directory entry can easily > point to the "original" entry which points to the "data" (RIB or no > RIB), or could use some level of indirection etc. Not that I plan > to do it, I see no use for that sort of thing myself (I imagine they > implemented in in the 70-s to save space for the multiple commands, > which would have a negligible effect today if one just uses a file > per command - or, if space allocation conscious, would put > all the commands in a file with a "disk" image having a small cluster > size (say 32 bytes or even 1 byte)).Links (hard and soft) have value in many ways. E.g., I can put a "name" in a particular place in the filesystem hierarchy (remember, filesystem is just a big, unified namespace!). Then, place links to the actual file in different places for convenience or access control. E.g., links in every user's home directory that point to "important message" allows everyone to read their apparently *local* "copy" of that "important message" while, in fact, only referencing that one copy. When you've read it, you can freely delete your "copy" of it (unlink the *name* that resides in your home directory) yet still have the actual file persist for the other users who may not, yet, have read it.>>> In DPS the inode is called RIB (.... this is what it was called on the >>> first OS I had contact with - MDOS, Motorolas OS on their Exorciser, >>> Retrieve Information Block - so I just reused a name I remembered well). >>> It does not link back to the directory entry - and for files split >>> in up to 2 pieces it can be unused, thus saving one disk access per >>> file open - but I could easily expand that if I wanted to. >>> For an application it would be trivial which name was used so it >>> was invoked as iocb-s contain the directory entry name. >>> But I can't see how this buys me anything of real value. >> >> In Unix the inode directly links to some small number of data blocks >> before indirecting through index blocks. The size of a file that can >> be addressed directly depends on the logical block size of the media, >> but it typically is 40-64KB for a desktop filesystem and may be >> megabytes in a server filesystem. > > Does that mean the inode is part of the directory file? That would > be very similar to the way I do it - only I have put > "segment descriptors" (two of them) in the directory entry, if there are > more than 2 segments the first one of the two points to the RIB (which > is not inside the directory file).inodes are used for multiple purposes in the filesystem. For a regular file, they contain most of the metadata associated with the actual file (its size, owner, access permissions, timestamps, etc -- and, pointers to the actual disk blocks that contain the file's data, for small files... inodes are cascaded to multiple levels of indirection to support bigger files; e.g., the first inode contains pointers to "indirect" inodes which then reference other inodes -- or, the actual data blocks themselves) A directory is actually a special file whose contents are (name,inode) pairs. [Disk blocks can contain multiple inodes, so the INITIAL size of a directory varies with the underlying format of the filesystem; bigger blocks create bigger directories before the directory starts to need to reference other disk blocks... just like regular files grow!]
Locking semantics
Started by ●May 19, 2015
Reply by ●May 22, 20152015-05-22
Reply by ●May 22, 20152015-05-22
On 5/22/2015 5:44 PM, Don Y wrote:> On 5/22/2015 2:32 PM, Dimiter_Popoff wrote:>> Thanks for the explanation. This way it would be impractical to have >> directory entries from different directories pointing to one inode, >> or are they doing it? Would take a lot of directory digging before >> an inode is deleted so my guess is obviously "no", but it is only >> a guess. > > You don't delete an inode; you unlink directory entries from it > (this is the operation that is performed when you "delete" a file... > you sever the linkage between a particular *name* and the inode > that it references). > > Each "unlink()" decreases the reference count as the reference is > removed. When the last reference is unlink()-ed, you now have no > way to reference the inode so it is *then* deleted.Grrr... the inode is never "deleted". Rather, it is marked as "free" and reused for some other filesystem use.
Reply by ●May 23, 20152015-05-23
On 5/22/2015 5:44 PM, Don Y wrote:>> Does that mean the inode is part of the directory file? That would >> be very similar to the way I do it - only I have put >> "segment descriptors" (two of them) in the directory entry, if there are >> more than 2 segments the first one of the two points to the RIB (which >> is not inside the directory file). > > inodes are used for multiple purposes in the filesystem. For a regular > file, they contain most of the metadata associated with the actual file > (its size, owner, access permissions, timestamps, etc -- and, pointers > to the actual disk blocks that contain the file's data, for small > files... inodes are cascaded to multiple levels of indirection to support > bigger files; e.g., the first inode contains pointers to "indirect" > inodes which then reference other inodes -- or, the actual data blocks > themselves) > > A directory is actually a special file whose contents are (name,inode) > pairs.(sigh) Yet another source of confusion... Directories are (name, inode REFERENCE) pairs -- much the same as my namespaces contain (name, object handle) pairs. When a filesystem is built, some number of disk blocks are set aside to contain inodes. The system uses these to build the infrastructure for the file system. E.g., each directory entry "points to" an inode. That inode contains metadata for the file in question (OR, a symlink to a name that does!). In addition to the metadata, it contains pointers to the first N disk blocks that contain the file's data (I think N=12?). So, for "short" files (< 12 blocks), a single inode can encode the position of the entire file's content. If the file is longer than this, then another pointer in the inode points to a second level of decoding -- another inode that references additional blocks. Etc. As different directory entries can point to this first inode, it is possible to have different "names" for the same "physical" file. And, because the metadata for the file is stored *in* that inode (and NOT in the directory!), every name that references a particular "physical file" will show the exact same metadata (size, owner, access time, etc.) regardless of how the file is accessed.> [Disk blocks can contain multiple inodes, so the INITIAL size of a > directory varies with the underlying format of the filesystem; bigger > blocks create bigger directories before the directory starts to need to > reference other disk blocks... just like regular files grow!]A directory is just like a regular file but is composed entirely of (name, inode #) pairs. So, when the first disk block is allocated for the directory, some number of entries can be referenced in it (which depends on how large that block is -- as dictated by the format of the filesystem). As the directory grows, additional blocks are logically appended to it -- just like a regular file growing. inodes are a fixed size. So, as the underlying filesystem format increases the size of the disk blocks, the number of inodes in any given block increases. (when you build a file system, you decide how large the blocks should be; larger blocks more efficiently handle larger files -- and larger volumes!)
Reply by ●May 24, 20152015-05-24
On Fri, 22 May 2015 11:12:15 -0700, Don Y <this@is.not.me.com> wrote:>On 5/22/2015 6:57 AM, George Neuner wrote:>> The time to reserve the name is when the module is compiled. > >That requires everyone to cooperate on name choices. Or, >everything to have independent namespaces (\Program Files\HP) >to make this possible. Entities then can't effectively >share things.No. It means only that names must be guaranteed unique within their scope. The easiest way to enforce that is to guarantee that they are unique in any scope. I wouldn't call that "cooperation" ... it's a condition that needs to be met to be part of the system.>By putting the reservations in the IDL, I am hoping to force the >conflict issue to be more visible. "Hello, Support Desk? I >just purchased your ABC module and it won't install: 'name >conflict'..." > >Like someone deciding to name their program "/kernel" and >*expecting* that name in their code! > >>> Having been successfully installed, this (e.g., "disc") reservation >>> now further constrains *future* installs ("Hmmm... 'disc' is in >>> use; let's call the disk object 'bob'! This works because >>> installs are serial (and seldom). >> >> Maybe it works. Maybe it doesn't. > >If it doesn't, it is because the developer hasn't played by the >rules.What rules? So far you have been arguing against rules. You can't have it both ways - you'll end up like Fred and Barney trying to name the boat.>> I'm currently thinking the best approach is to compute Wilson mean for >> each variable - substituting range median for nulls - and then treat >> the ordered values as vector components of a polyline. > >I think your biggest exposure will be if the "numbers" don't get/stay >large. I.e., the "fad factor" (but, I can't see how you can work >around this -- you need some meat in order to make sense of anything). >At some point, even "incentives" lose their appeal.That's why Wilson - it takes into account the number of "good" samples vs total number of sample attempts.>> From there are a number interesting comparisons possible, but I think >> the most meaningful is the ratio of the volume of 2 N-balls centered >> on the point where all components are minimized: ball with "radius" >> distance to endpoint of the polyline vs ball with "radius" the >> distance to the point where all components are maximized. > >I'd have to think about whether volume or radius would be the better >comparative metric. (too early in the day to do stuff like that!)There are problems with every approach I can think of. Every measure is subject to relative displacements on different axes canceling each other. Weighting the components doesn't help.>> Dunno. Noodles are still wet. > >Throw against wall. If they *stick*, consider them done! :> >(but don't eat those samples, regardless!)These are just out of the press. I used to test cooked with my dog Calvin ... he wouldn't eat a noodle unless it was done. George
Reply by ●May 24, 20152015-05-24
On 5/24/2015 12:35 PM, George Neuner wrote:> On Fri, 22 May 2015 11:12:15 -0700, Don Y <this@is.not.me.com> wrote: >> On 5/22/2015 6:57 AM, George Neuner wrote: > >>> The time to reserve the name is when the module is compiled. >> >> That requires everyone to cooperate on name choices. Or, >> everything to have independent namespaces (\Program Files\HP) >> to make this possible. Entities then can't effectively >> share things. > > No. It means only that names must be guaranteed unique within their > scope. The easiest way to enforce that is to guarantee that they are > unique in any scope.But objects can have multiple names and in multiple contexts. E.g., "stdin" means different things to different processes. How do you design so that you can "connect" to vendorA's globally unique named object *or* vendorB's *equivalent* yet differently named object?> I wouldn't call that "cooperation" ... it's a condition that needs to > be met to be part of the system.Competing processes manage to share *memory* -- they don't each *expect* to be using memory address "27". The system provides mechanisms so their individual needs are met without them realizing there is a conflict.>> By putting the reservations in the IDL, I am hoping to force the >> conflict issue to be more visible. "Hello, Support Desk? I >> just purchased your ABC module and it won't install: 'name >> conflict'..." >> >> Like someone deciding to name their program "/kernel" and >> *expecting* that name in their code! >> >>>> Having been successfully installed, this (e.g., "disc") reservation >>>> now further constrains *future* installs ("Hmmm... 'disc' is in >>>> use; let's call the disk object 'bob'! This works because >>>> installs are serial (and seldom). >>> >>> Maybe it works. Maybe it doesn't. >> >> If it doesn't, it is because the developer hasn't played by the >> rules. > > What rules? So far you have been arguing against rules.The rule is: you place a reservation (on a resource: memory, cpu cycles, names, etc.) and, if that reservation can't be met (just like malloc returning NULL), you address it -- or aren't allowed to run/install. So, developers who are lazy and assume their needs will be met end up producing products that tend to be seen as "problems" instead of those products produced by folks who play be those rules.> You can't have it both ways - you'll end up like Fred and Barney > trying to name the boat. > >>> I'm currently thinking the best approach is to compute Wilson mean for >>> each variable - substituting range median for nulls - and then treat >>> the ordered values as vector components of a polyline. >> >> I think your biggest exposure will be if the "numbers" don't get/stay >> large. I.e., the "fad factor" (but, I can't see how you can work >> around this -- you need some meat in order to make sense of anything). >> At some point, even "incentives" lose their appeal. > > That's why Wilson - it takes into account the number of "good" samples > vs total number of sample attempts.But how do you decide what is truly "good"?>>> From there are a number interesting comparisons possible, but I think >>> the most meaningful is the ratio of the volume of 2 N-balls centered >>> on the point where all components are minimized: ball with "radius" >>> distance to endpoint of the polyline vs ball with "radius" the >>> distance to the point where all components are maximized. >> >> I'd have to think about whether volume or radius would be the better >> comparative metric. (too early in the day to do stuff like that!) > > There are problems with every approach I can think of. Every measureThat's why its called "Engineering": "choosing the LEAST BAD solution"!> is subject to relative displacements on different axes canceling each > other. Weighting the components doesn't help. > >>> Dunno. Noodles are still wet. >> >> Throw against wall. If they *stick*, consider them done! :> >> (but don't eat those samples, regardless!) > > These are just out of the press. I used to test cooked with my dog > Calvin ... he wouldn't eat a noodle unless it was done.I don't make "thin" pasta (well, sometimes capellini). I much prefer homemade cavatelli -- a heartier pasta. But, far too much effort considering how much/fast I eat it! :-/ Sunday lunch. Finestkind. (C annoyed that I slept late :> )
Reply by ●May 25, 20152015-05-25
On Sun, 24 May 2015 13:54:41 -0700, Don Y <this@is.not.me.com> wrote:>On 5/24/2015 12:35 PM, George Neuner wrote:>But objects can have multiple names and in multiple contexts. >E.g., "stdin" means different things to different processes. > >How do you design so that you can "connect" to vendorA's >globally unique named object *or* vendorB's *equivalent* >yet differently named object?Just add one or more top level name services. You're already walking paths by handing them off to different objects ... what's wrong with a vendorA object? : stdin = open( "/vendorA/stdin" ) : Why does the string "stdin" have to be unique in the entire system? If there may be more than one, then make it so a process can query for all the "stdin" that exist. Or turn it around and have the console connect to the process and tell it the meaning of "stdin".>> I wouldn't call that "cooperation" ... it's a condition that needs to >> be met to be part of the system. > >Competing processes manage to share *memory* -- they don't >each *expect* to be using memory address "27". The system >provides mechanisms so their individual needs are met without >them realizing there is a conflict.Memory is not the "end all" of sharing examples: in a virtual address system, every process may indeed have an address "27". And they all might be multiplexed on the real address "27". Leaving aside virtual addressing, how many memory blocks have an *offset* "27"? There always is some kind of name/reference conflict if you ridiculously include disjoint scopes. The compiler doesn't use your name "foo" - it maps "foo" to something unique in the scope(s) where "foo" is used. It may complain that "foo" conflicts, but really it is not the name that conflicts but the mapping of the name in the intended scope. That distinction is important.>>>> I'm currently thinking the best approach is to compute Wilson mean for >>>> each variable - substituting range median for nulls - and then treat >>>> the ordered values as vector components of a polyline. >>> >>> I think your biggest exposure will be if the "numbers" don't get/stay >>> large. I.e., the "fad factor" (but, I can't see how you can work >>> around this -- you need some meat in order to make sense of anything). >>> At some point, even "incentives" lose their appeal. >> >> That's why Wilson - it takes into account the number of "good" samples >> vs total number of sample attempts. > >But how do you decide what is truly "good"?"good" = "exists". Users don't have to answer survey questions: out of W surveys taken, there may be X responses to Q1, Y responses to Q2, no responses to Q3, Z responses to Q4 ... The potential for null responses invalidates the simple mean - you need to take into account the number of responses vs the number of times the question has been asked. That's the mean with confidence.>> There are problems with every approach I can think of. > >That's why its called "Engineering": "choosing the LEAST BAD solution"!8-) I'm looking for a way to compare sets/tuples of values for degree of similarity. I need to be able to "score" them in a simple way to be able to say, e.g., S1 is 92% wrt S2. The computation of the score needs to be not *too* difficult to explain to a layman. It's important that every element/component contribute noticeably to the score - which rules out simple sums and products because displacements of different components may cancel. Ditto simple things like minimum or average of components. And "scores" have to be directly comparable, which rules out hashing because hashes are inherently unordered. AFAICS, that leaves some kind of geometric comparison. There are a number of possibilities there, but all have some sticking points: either the computations are complex, or they lose data (as with sum and product), [or both], or direct comparisons are difficult, or impossible in N space, or are spatially "location" dependent. Etc. ad nauseam. Hyperplanes would be at different "angles" and so there is no good notion of "distance between them. The end points of differing vectors form an unordered cloud which makes directly comparing end points meaningless. The lengths of the vectors (or of the component paths) suffer from displacement cancellation, as does volume wrt origin. What seems like it has potential is comparing N-balls whose origins are the end point of the minimum component vector, and whose radii are the distances to the end points of other, non-minimum vectors. AFAICS, every unique path should result in a different end point, so this incorporates all component information, it is not terribly difficult to compute, and it has the attraction that 2D and 3D notions of it can be illustrated easily. ??? George
Reply by ●May 25, 20152015-05-25
Hi George, On 5/25/2015 5:18 AM, George Neuner wrote:> On Sun, 24 May 2015 13:54:41 -0700, Don Y <this@is.not.me.com> wrote: >> On 5/24/2015 12:35 PM, George Neuner wrote: > >> But objects can have multiple names and in multiple contexts. >> E.g., "stdin" means different things to different processes. >> >> How do you design so that you can "connect" to vendorA's >> globally unique named object *or* vendorB's *equivalent* >> yet differently named object? > > Just add one or more top level name services. You're already walking > paths by handing them off to different objects ... what's wrong with a > vendorA object? > > : > stdin = open( "/vendorA/stdin" ) > :First, who picks those names? See my comment re: "Hugh's Pizza" opting to choose "HP" as the name of it's namespace. :> I.e., how would Hugh know that someone else hadn't already expected /tmp/HP to be of exclusive use to *it*? Also, it doesn't address the problem. That (presumably) makes everything under "/vendorA" *private*. I.e., vendorA's app can now create stdin, foo, baz, etc. and know that they are unique -- because the namespace below /vendorA hasn't been "shared" with anyone. The goal is to be able to share a name with another process. I.e., so "/vendorA" is, in fact, shared and accessible by more than one process. Once that happens, what's to stop one of those other processes from binding "stdin" (your example) to a device that is incompatible with vendorA's application's intended use of that name? /vendorA's app needs to be able to say: "'stdin' is a name I have chosen to be 'significant'; don't let anyone else use this name in this SHARED namespace! And, in the case where someone has already reserved this name (or, it is already in use), let me know so I can pick a new name that I will be ASSURED will be unique when I eventually execute" I can do this by adding a reserve() method that can be applied to any "directory object" and forcing the installer to invoke it based on parameters specified in the IDL for a particular module/group of modules. So, when <anyone> invokes a method that effectively adds a name to that (shared (portion of)) namespace, the directory server sees the registered reservation and blocks the operation ("E_NAME_CONFLICT") unless the request is being issued by the reserving entity.> Why does the string "stdin" have to be unique in the entire system?It *isn't*. "stdin" may be something like a traditional file to processA, a pipe name to processB, etc. But, it might deliberately appear in some portion of a namespace. If other actors can create names in that (shared portion of) namespace, then any of them could create stdin and bind it to anything they choose. When processX eventually runs, it discovers that stdin is already in use in the namespace that it was designed to use. (it would have to create a private portion of the namespace like "./HP" and create stdin there)> If there may be more than one, then make it so a process can query for > all the "stdin" that exist. Or turn it around and have the console > connect to the process and tell it the meaning of "stdin".The goal is to allow multiple objects to be named "stdin" (or <whatever>) in much the same way that every (UNIX) process has a "stdin" object. So, each process wants to *know* that it can have a stdin (or a "fizzle") regardless of which other processes have access to *its* namespace.>>> I wouldn't call that "cooperation" ... it's a condition that needs to >>> be met to be part of the system. >> >> Competing processes manage to share *memory* -- they don't >> each *expect* to be using memory address "27". The system >> provides mechanisms so their individual needs are met without >> them realizing there is a conflict. > > Memory is not the "end all" of sharing examples: in a virtual address > system, every process may indeed have an address "27". And they all > might be multiplexed on the real address "27". > > Leaving aside virtual addressing, how many memory blocks have an > *offset* "27"? There always is some kind of name/reference conflict > if you ridiculously include disjoint scopes.A VM system grants you the ability to have these (apparent) conflicts because it ensures each *is* disjoint. No other process can access "your" 27, by design. You have to explicitly share it. Once it is shared, those that have access to it can opt to use it for whatever OTHER reason's they choose.> The compiler doesn't use your name "foo" - it maps "foo" to something > unique in the scope(s) where "foo" is used. It may complain that > "foo" conflicts, but really it is not the name that conflicts but the > mapping of the name in the intended scope. > > That distinction is important.It doesn't matter to the actor using the name. As a UN*X-ish example, imagine trying to code a "string" in your application for the name of a file in /tmp through which you will exchange data with another process that has access to /tmp's contents. You'd have to create the name at install time (so it is available for you when you want/need it), then set the group permissions on the "file" so that only members of a particular group could access it, then ensure you and that other process are the only members of that group. If you postpone these actions to run-time, you may discover that some other process has created a file (under /tmp) with exactly that name and locked *you* out of it. Or, worse, NOT locked you out but is using it to store entirely different data -- your use of that data would be erroneous (it's not what you think it is!); or, your alteration of it would crash a preexisting, functioning application.>>>>> I'm currently thinking the best approach is to compute Wilson mean for >>>>> each variable - substituting range median for nulls - and then treat >>>>> the ordered values as vector components of a polyline. >>>> >>>> I think your biggest exposure will be if the "numbers" don't get/stay >>>> large. I.e., the "fad factor" (but, I can't see how you can work >>>> around this -- you need some meat in order to make sense of anything). >>>> At some point, even "incentives" lose their appeal. >>> >>> That's why Wilson - it takes into account the number of "good" samples >>> vs total number of sample attempts. >> >> But how do you decide what is truly "good"? > > "good" = "exists".<frown> I think that may be wishful thinking. OTOH, I'm not sure you can do much better than that as the participants self-select.> Users don't have to answer survey questions: out of W surveys taken, > there may be X responses to Q1, Y responses to Q2, no responses to Q3, > Z responses to Q4 ...Unless you structure the presentation so they see each question individually Note that I complained about that form of presentation: it hides the next questions from the respondent (which I personally dislike) and makes the interaction more tedious (unless you preload the entire survey and just client-script its presentation!) Regardless, that would just ensure |Q1| >= |Q2| >= |Q3| >= ... which doesn't significantly improve your situation (unless you push the more important questions to the head of the survey)> The potential for null responses invalidates the simple mean - you > need to take into account the number of responses vs the number of > times the question has been asked. That's the mean with confidence. > >>> There are problems with every approach I can think of. >> >> That's why its called "Engineering": "choosing the LEAST BAD solution"! > > 8-)That;s what makes it *fun*! Digging a ditch is simple: these are the desired dimensions; if you encounter stone, buried human remains, etc. that doesn't change the end result!> I'm looking for a way to compare sets/tuples of values for degree of > similarity. I need to be able to "score" them in a simple way to be > able to say, e.g., S1 is 92% wrt S2. The computation of the score > needs to be not *too* difficult to explain to a layman.So, you're using it as an automated alternative to my "equivalence tables" suggestion?> It's important that every element/component contribute noticeably to > the score - which rules out simple sums and products because > displacements of different components may cancel. Ditto simple things > like minimum or average of components. And "scores" have to be > directly comparable, which rules out hashing because hashes are > inherently unordered. > > AFAICS, that leaves some kind of geometric comparison. There are a > number of possibilities there, but all have some sticking points: > either the computations are complex, or they lose data (as with sum > and product), [or both], or direct comparisons are difficult, or > impossible in N space, or are spatially "location" dependent. > Etc. ad nauseam. > > Hyperplanes would be at different "angles" and so there is no good > notion of "distance between them. The end points of differing vectors > form an unordered cloud which makes directly comparing end points > meaningless. The lengths of the vectors (or of the component paths) > suffer from displacement cancellation, as does volume wrt origin.And, as you say below, they tend to have the "whoosh" effect (sound of something flying over the heads of those to whom you are trying to explain).> What seems like it has potential is comparing N-balls whose origins > are the end point of the minimum component vector, and whose radii are > the distances to the end points of other, non-minimum vectors. > > AFAICS, every unique path should result in a different end point, so > this incorporates all component information, it is not terribly > difficult to compute, and it has the attraction that 2D and 3D notions > of it can be illustrated easily.I think you'll end up with people *thinking* "well, that makes sense"... but, still, not really relating (grok-ing) to the concept. OTOH, if it is enough to get them past the "why is S1Q4 treated the same as S83Q7 when they appear to be different" then that will be enough. How'd the noodles end up? Chores...
Reply by ●May 26, 20152015-05-26
On Mon, 25 May 2015 10:42:56 -0700, Don Y <this@is.not.me.com> wrote:>On 5/25/2015 5:18 AM, George Neuner wrote:>> Just add one or more top level name services. You're already walking >> paths by handing them off to different objects ... what's wrong with a >> vendorA object? >> >> : >> stdin = open( "/vendorA/stdin" ) >> : > >Also, it doesn't address the problem. That (presumably) makes everything >under "/vendorA" *private*. I.e., vendorA's app can now create stdin, >foo, baz, etc. and know that they are unique -- because the namespace >below /vendorA hasn't been "shared" with anyone.Not exactly. The namespace can be shared immutably or can exist with various protections against unwanted modification.>/vendorA's app needs to be able to say: > "'stdin' is a name I have chosen to be 'significant'; don't let anyone > else use this name in this SHARED namespace! And, in the case where > someone has already reserved this name (or, it is already in use), > let me know so I can pick a new name that I will be ASSURED will be > unique when I eventually execute" > >I can do this by adding a reserve() method that can be applied to any >"directory object" and forcing the installer to invoke it based on >parameters specified in the IDL for a particular module/group of modules. > >So, when <anyone> invokes a method that effectively adds a name to >that (shared (portion of)) namespace, the directory server sees the >registered reservation and blocks the operation ("E_NAME_CONFLICT") >unless the request is being issued by the reserving entity.But it doesn't require any kind of reservation system. Think of it in terms of objects and accessors - if there is no set() for a binding, then it can't be modified. Or you can make namespaces extensible with the rule that existing binding can't be shadowed by new ones. If some process wants to change a binding, it has to make a copy of the namespace. With suitable structure and "functional" copying, that doesn't have to be very expensive.>As a UN*X-ish example, imagine trying to code a "string" in your application >for the name of a file in /tmp through which you will exchange data with >another process that has access to /tmp's contents. You'd have to >create the name at install time (so it is available for you when you want/need >it), then set the group permissions on the "file" so that only members of >a particular group could access it, then ensure you and that other process >are the only members of that group. > >If you postpone these actions to run-time, you may discover that some >other process has created a file (under /tmp) with exactly that name >and locked *you* out of it. Or, worse, NOT locked you out but is >using it to store entirely different data -- your use of that data >would be erroneous (it's not what you think it is!); or, your alteration >of it would crash a preexisting, functioning application.That's why functions like tmpfile() and tmpnam() exist. The whole point is that names should be system generated - guaranteed non-conflicting - and only aliases for them should be permitted to be "friendly".>> Users don't have to answer survey questions: out of W surveys taken, >> there may be X responses to Q1, Y responses to Q2, no responses to Q3, >> Z responses to Q4 ... > >Unless you structure the presentation so they see each question individually >Note that I complained about that form of presentation: it hides the >next questions from the respondent (which I personally dislike) and makes >the interaction more tedious (unless you preload the entire survey and >just client-script its presentation!) > >Regardless, that would just ensure |Q1| >= |Q2| >= |Q3| >= ... >which doesn't significantly improve your situation (unless you push the >more important questions to the head of the survey)Right.>> I'm looking for a way to compare sets/tuples of values for degree of >> similarity. I need to be able to "score" them in a simple way to be >> able to say, e.g., S1 is 92% wrt S2. The computation of the score >> needs to be not *too* difficult to explain to a layman. > >So, you're using it as an automated alternative to my "equivalence >tables" suggestion?No. Regardless of any notions of equivalence, there has to be a doorknob *simple* way for a layman to compare result sets. A tuple of values is too complicated - it requires understanding what the values represent. [Don't start about "needing to understand" ... Americans vote all the time without understanding anything of the premises, the people, the politics, the problems, the proposals or the potential consequences of their votes.] My idea is predicated on each variable's value falling within a known range. Seen as a vector tuple, a set of variables/values is a path to a point in N-space. The components of the path can be all minimized, all maximized, or somewhere in between. The hard part is that the points associated with all possible paths form a cloud confined within a ball in N-space. In 3-space you can think of it as a sphere with the minimum and maximum paths designating the poles, and all other combinations ending at some interior point. Same principle - more dimensions. With discrete values for the variables you can think of it as a rectangular solid inscribed within a sphere. The object then is to describe the "distance" of a point from the poles. In 3-space it's obvious and mostly intuitive ... in N-space not so much. A "ball" isn't spherical - or any approximation thereof - it's "shape" is the union of hypersolids that result from all subset combinations of its basis ranges. IOW: you know that 2 different points represent different paths, but it isn't necessarily obvious how the components of the paths relate to one another. That's what I would like to remedy - to take the "cloud" of points and squeeze it into a line. Not possible I know.>How'd the noodles end up?My grandmother made noodles ... I make bad noodle jokes. I shouldn't even think about noodles. George
Reply by ●May 27, 20152015-05-27
Hi George, On 5/26/2015 3:12 PM, George Neuner wrote:> On Mon, 25 May 2015 10:42:56 -0700, Don Y <this@is.not.me.com> wrote: > >> On 5/25/2015 5:18 AM, George Neuner wrote: > >>> Just add one or more top level name services. You're already walking >>> paths by handing them off to different objects ... what's wrong with a >>> vendorA object? >>> >>> : >>> stdin = open( "/vendorA/stdin" ) >>> : >> >> Also, it doesn't address the problem. That (presumably) makes everything >> under "/vendorA" *private*. I.e., vendorA's app can now create stdin, >> foo, baz, etc. and know that they are unique -- because the namespace >> below /vendorA hasn't been "shared" with anyone. > > Not exactly. The namespace can be shared immutably or can exist with > various protections against unwanted modification.Relatively little in the system is "persistent". Objects come (and go) as applications demand services to come on-line, etc. So, *new* objects are the primary occupants of namespaces. A new process gets little more than a namespace handle provided *to* it. From this, the process must locate everything that it needs to provide its function and/or service. "Live" objects don't need to exploit a namespace -- for anything other than an initial communication channel. ProcessA can create an object (of <whatever> type) and directly pass a handle to that object to some other process -- that it locates via the namespace with which it was created. These "anonymous" objects never need to be named unless they need to be located through some *other* process's namespace. Given that a process may draw on resources and services provided by many other processes, there is need for (portions of) *shared* namespaces. So, processX can create a resource that processA will need (giving it a particular name) while processY creates some other resource that processA will also need (giving it another name), etc. As potentially many actors can create and consume names in that namespace, there is no *guarantee* that a name will be available for a particular use (e.g., another actor comes along and *thinks* some particular name is appropriate for *its* use -- without realizing that some other actor had already planned on using that name... whenever it got around to doing so!)>> /vendorA's app needs to be able to say: >> "'stdin' is a name I have chosen to be 'significant'; don't let anyone >> else use this name in this SHARED namespace! And, in the case where >> someone has already reserved this name (or, it is already in use), >> let me know so I can pick a new name that I will be ASSURED will be >> unique when I eventually execute" >> >> I can do this by adding a reserve() method that can be applied to any >> "directory object" and forcing the installer to invoke it based on >> parameters specified in the IDL for a particular module/group of modules. >> >> So, when <anyone> invokes a method that effectively adds a name to >> that (shared (portion of)) namespace, the directory server sees the >> registered reservation and blocks the operation ("E_NAME_CONFLICT") >> unless the request is being issued by the reserving entity. > > But it doesn't require any kind of reservation system. Think of it in > terms of objects and accessors - if there is no set() for a binding, > then it can't be modified.The directory object (or equivalend) is the object that is being acted upon. If multiple actors have rights to create() names in that object, then any of them can create the name that processZ was expecting to use! A reservation effectively says: "You can invoke the create() method on this object -- but, in addition to the names that are already in use, representing current objects in that namespace, you can not use any of THESE reserved names: .... Doing so will result in a FAIL." An analogy with a system heap would be: "You can request memory from the heap -- but, can never request more than a particular quota reserved for your use... even if there are gobs of memory available!"> Or you can make namespaces extensible with the rule that existing > binding can't be shadowed by new ones. If some process wants to > change a binding, it has to make a copy of the namespace.A process can always augment *it's* namespace and deny others access to it. But, the process must be "live". And, the name that it uses to bind the "augmentation" into the namespace must be "available". E.g., if that namespace is shared, then some other process could opt to create an object called "augment" (or "subdir" or "private", etc.) and the namespace's owner would effectively be prohibited from making this binding.> With suitable structure and "functional" copying, that doesn't have to > be very expensive. > >> As a UN*X-ish example, imagine trying to code a "string" in your application >> for the name of a file in /tmp through which you will exchange data with >> another process that has access to /tmp's contents. You'd have to >> create the name at install time (so it is available for you when you want/need >> it), then set the group permissions on the "file" so that only members of >> a particular group could access it, then ensure you and that other process >> are the only members of that group. >> >> If you postpone these actions to run-time, you may discover that some >> other process has created a file (under /tmp) with exactly that name >> and locked *you* out of it. Or, worse, NOT locked you out but is >> using it to store entirely different data -- your use of that data >> would be erroneous (it's not what you think it is!); or, your alteration >> of it would crash a preexisting, functioning application. > > That's why functions like tmpfile() and tmpnam() exist. The whole > point is that names should be system generated - guaranteed > non-conflicting - and only aliases for them should be permitted to be > "friendly".Temporary files are seldom "shared" -- except directly to child processes, etc. E.g., if two users did: $ cd /tmp $ cc ~/myfile.c At least one of them would be unpleasantly surprised! And, there is no way to ensure a particular one "wins" that race!>>> I'm looking for a way to compare sets/tuples of values for degree of >>> similarity. I need to be able to "score" them in a simple way to be >>> able to say, e.g., S1 is 92% wrt S2. The computation of the score >>> needs to be not *too* difficult to explain to a layman. >> >> So, you're using it as an automated alternative to my "equivalence >> tables" suggestion? > > No. Regardless of any notions of equivalence, there has to be a > doorknob *simple* way for a layman to compare result sets. A tuple of > values is too complicated - it requires understanding what the values > represent. > > [Don't start about "needing to understand" ... Americans vote all the > time without understanding anything of the premises, the people, the > politics, the problems, the proposals or the potential consequences of > their votes.]I just think letting a *person* make that decision will lead to better data (summaries). Even if they have to hire someone to create and maintain these "equivalence relations". E.g., someone who is happy that they could *return* a product is different from someone who is happy with a product's *purchase*.> My idea is predicated on each variable's value falling within a known > range. Seen as a vector tuple, a set of variables/values is a path to > a point in N-space. The components of the path can be all minimized, > all maximized, or somewhere in between. > > The hard part is that the points associated with all possible paths > form a cloud confined within a ball in N-space. In 3-space you can > think of it as a sphere with the minimum and maximum paths designating > the poles, and all other combinations ending at some interior point. > Same principle - more dimensions. With discrete values for the > variables you can think of it as a rectangular solid inscribed within > a sphere.Understood.> The object then is to describe the "distance" of a point from the > poles. In 3-space it's obvious and mostly intuitive ... in N-space > not so much. A "ball" isn't spherical - or any approximation thereof > - it's "shape" is the union of hypersolids that result from all subset > combinations of its basis ranges. > > IOW: you know that 2 different points represent different paths, but > it isn't necessarily obvious how the components of the paths relate to > one another. > > That's what I would like to remedy - to take the "cloud" of points and > squeeze it into a line. Not possible I know.I think you're setting yourself up for some "conclusions" that end up appearing incompatible -- yet mathematically "correct". <frown>>> How'd the noodles end up? > > My grandmother made noodles ... I make bad noodle jokes. > I shouldn't even think about noodles.Ah. Now you've got me thinking about them and how yummy some cavatelli would be... Of course, I'll probably be cursing you when/if I actually make them ("Cripes! I forgot how long this takes!!") I need more automation: a bigger pasta machine; a bigger ice cream churn; a hydraulic lift in the garage floor; etc. You know... "life's little essentials!" ;-)







