EmbeddedRelated.com
Forums

Managing "capabilities" for security

Started by Don Y November 1, 2013
Hi Tom,

On 11/5/2013 4:03 PM, Tom Gardner wrote:
> On 05/11/13 22:19, Don Y wrote: >>> The new processor architecture will, it is claimed, >>> work well with existing code, with roughly an order >>> of magnitude speedup. They've managed to get DSP >>> performance! >> >> I can't see how this speedup is a consequence of the capabilities >> themselves -- "with existing code". > > Correct, it isn't. CAP is a topic that came up as > part of the non-objectives of the new architecture.
Ah, OK.
> Example of just how different the architecture is: > it doesn't have registers and isn't a stack machine. > Internal micro-op work with a use-it-or-lose-it > "belt" where an "register" address is of the form > "the fifth to last aritnmetic result".
Some of the old Burroughs machines were wonky like that. Makes you wonder why those mechanisms disappeared over time... (illusion that increased speed renders cleverness less important?)
On 05/11/13 23:31, Don Y wrote:
> Hi Tom, > > On 11/5/2013 4:03 PM, Tom Gardner wrote: >> On 05/11/13 22:19, Don Y wrote: >>>> The new processor architecture will, it is claimed, >>>> work well with existing code, with roughly an order >>>> of magnitude speedup. They've managed to get DSP >>>> performance! >>> >>> I can't see how this speedup is a consequence of the capabilities >>> themselves -- "with existing code". >> >> Correct, it isn't. CAP is a topic that came up as >> part of the non-objectives of the new architecture. > > Ah, OK. > >> Example of just how different the architecture is: >> it doesn't have registers and isn't a stack machine. >> Internal micro-op work with a use-it-or-lose-it >> "belt" where an "register" address is of the form >> "the fifth to last aritnmetic result". > > Some of the old Burroughs machines were wonky like that. > Makes you wonder why those mechanisms disappeared over time... > (illusion that increased speed renders cleverness less important?)
Ivan Godard wrote the Burroughs DCAlgol compiler :) Have a look at http://ootbcomp.com/docs/index.html "The Mill is a new CPU architecture designed for very high single-thread performance within a very small power envelope. It achieves DSP-like power/performance on general purpose codes, without reprogramming. The Mill is a wide-issue, statically scheduled design with exposed pipeline. High-end Mills can decode, issue, and execute over thirty MIMD operations per cycle, sustained. The pipeline is very short, with a mispredict penalty of only four cycles."
Hi Tom,

On 11/5/2013 6:09 PM, Tom Gardner wrote:
> On 05/11/13 23:31, Don Y wrote:
>>> Example of just how different the architecture is: >>> it doesn't have registers and isn't a stack machine. >>> Internal micro-op work with a use-it-or-lose-it >>> "belt" where an "register" address is of the form >>> "the fifth to last aritnmetic result". >> >> Some of the old Burroughs machines were wonky like that. >> Makes you wonder why those mechanisms disappeared over time... >> (illusion that increased speed renders cleverness less important?) > > Ivan Godard wrote the Burroughs DCAlgol compiler :)
OK, then his mind is already "sufficiently warped" in this regard.
> Have a look at http://ootbcomp.com/docs/index.html > > "The Mill is a new CPU architecture designed for very high single-thread > performance within a very small power envelope. It achieves DSP-like > power/performance on general purpose codes, without reprogramming. The > Mill is a wide-issue, statically scheduled design with exposed pipeline. > High-end Mills can decode, issue, and execute over thirty MIMD > operations per cycle, sustained. The pipeline is very short, with a > mispredict penalty of only four cycles."
Ah, that explains all the posts (that I "killed" :< ) mentioning "Mills"! :< I will have to see how to "unkill" them in Tbird...
On 11/5/13, 3:58 PM, Don Y wrote:
> Hi Richard, > > [attrs elided] > > On 11/4/2013 9:45 PM, Richard Damon wrote: >> All questions to be decided at design phase, with no "generic answer". >> Presumably, if there is a deadline for when the acknowledgement can be >> given, then presumably this spec is applied when designing such a real >> time system. > > But that's the problem. When is the design phase "over" for an open > system? Someone (third party) adds a "feature" a year after product > release. Does he get to claim the design phase extended to a > period MONTHS after "initial release" -- because that was when *he* > was working on the design of *his* feature? >
Hopefully the design phase for the first release is over before the first release is released! (How else can you see if it is correct?) And, yes, the third party can, if he wishes, open back up the spec and change it, but then HE takes on the responsibility to verify that all previous code can work with the new spec, and if he can't then he can't change the spec! Generally later modifications want to be backwards compatible to avoid this problem.
> [of course not] > > At some point, you say, "this is the environment for which you have to > design". Every mechanism that you make available is a mechanism that > has to be maintained and utilized. And, also acts as a *constraint* > on the system and its evolution: "Crap! I have to notify each Holder > of a pending capability revocation 100ms before revocation. But, my > satellite transmission path is twice that! I guess I just can't use > satellites (or, can't revoke capabilities)" >
Actually, in this case there isn't really a problem for the revoker. If the spec is that I need to give the Holder 100ms to reply, and it takes 200 ms to send (and presumably receive) a message, then I just need to send the notice 500ms before I actually revoke the privilege, that will give the Holder the required time to respond. And yes, if a system is built on certain assumptions, trying to move to a less capable environment often requires looking at lots of the system. What you really want to do is think when you first make the assumptions as to what you really need as assumptions, and what you don't need to assume. In this case, I would likely have made the time allowed to notify as something configurable/negotiated, and if the grantor really needs that 100ms, then yes, you can't use the satellites, but if it doesn't then perhaps the grantor can be told that the link is slow so it needs to be patient.
> E.g., I handle physical resource revocation asynchronously BECAUSE > I HAVE NO CONTROL OVER EXTERNAL EVENTS. If I wrap the resources > in a capability, now I suddenly have to provide different semantics? > ("Hey, you can't revoke the 'sunlight' capability!")
First, I never said that all revocations need to have notification! It sounds like in your case here, because there is a real chance of resources going away asynchronously from external causes, asynchronously removing permissions should not cause significant issues. My comment was just that this is not always the case, so there are some situations where asynchronous revocation is not the right way to do things.
> >>> So, as you acknowledge below, your app design must be able to handle >>> this case -- which is essentially the asynchronous case. >>> >>> I currently manage *physical* resources asynchronously (though with >>> notification after the fact) -- because they *can* disappear even >>> without my explicit control (e.g., power failure, drop in water >>> pressure, etc.). So, this same sort of reaasoning would at least >>> be *consistent*. >>> >>> I.e., do an operation and *check* to see if it completed as expected >>> (just like checking return value of malloc). >> >> Some operations do not make checking at each operation so easy. > > Life isn't guaranteed to be easy! :> > >> What if >> the resource is access to some memory, do you check for an "error" after >> every access? This presumes that the system even gives you an >> application level ability to continue pass this sort of error. What do >> you do about cooperative "authorization" to access parts of structures >> for things like synchronization where there isn't a hardware/OS >> capability to stop you? > > If "backing store" could go away while it was being used, then > your "system" would obviously need a way of detecting that and > informing the "holder" of that resource that this has, in fact, > happened. The holder would also ned to be aware of what resources > could "disappear" and code to accommodate those possibilities. > > If I am driving a motor, power to the motor driver/translator > could fail while I am in the middle of an operation. Even if I > have a backup power supply, the motor driver itself could fail. > Even if I have a redundant motor driver, the *motor* could > fail. Or, a gearbox, mechanism, *sensor*, etc. > > Shit Happens. > > If you don't plan to accommodate the (likely/consequential) > failures, you have a bug. >
I wasn't talking about the memory physically going away, but some process t first granting another process the right to access some chunk of memory and then suddenly and without warning revoking that permission and removing the access rights. Since the normal result of this would be aborting that process, this can be very bad. The only way that process can reliably operate would be to use some form of operation that atomically checks the rights, and does the access and returns an error flag that needs to be tested. This will very likely greatly slow down the process defending itself from privilege revocation just because the grantor is unwilling to first send notice and wait a reasonable time before actually revoking the right.
>> In your case, since the operation do have the >> capability of suddenly starting to fail, an asynchronous revocation >> likely doesn't cause problems that you didn't need to handle anyway, as >> long as the system structure to allow it. > > That's the point! You (developer) know shit CAN happen. Anything that > you are "holding" can be revoked. Plan on it. (Heck, I can "kill -9" > *you* without giving any advanced warning! Gee, *then* what?)
Many things are very unlikely to "just happen" at random. Presumably the grantor of the privilege is doing so because there is a reason to grant the privilege. It doesn't make sense to burden the process being granted the privilege with unneeded problems.
> >>>> Yes, sometimes just doing an asynchronous revocation may make sense, >>>> and >>>> in many cases having it as a fall back if the cooperation method fails >>>> to complete in a needed time is needed, but that doesn't mean that >>>> asynchronous is generally preferred. >>>> >>>> As to the transitively granting, the same method could be used to relay >>>> the request to revoke. >>> >>> This is a tougher call (though I think I have a solution that addresses >>> these issues). Who does the relaying? The actor who delegated >>> the capability? (what if he is now a zombie?) Or, does the kernel >>> track "derived capabilities" and treat them as part of the original >>> capability? >> >> I would generally say that the actor who was given a permission is >> responsible for relaying the revocations to those it relayed to. If it >> has shared a right that it might have revoked from it, it needs to >> maintain a way to do that. > > The actor may be gone! BY DESIGN! I.e., he has done <whatever> *he* > needed to do (with "greater privilege") and is now leaving *you* to > clean up (with some reduced capability).
Generally if you grant a privilege to an actor, and it is subject to a revocation request, they will reply back that they are done with the privilege (a "self revoke"), perhaps because there may be a limit to how many people this privilege will be given to at a time. You also can learn that they aren't there anymore when you signal them that you are preparing to revoke.
> > E.g., he can turn motor on, set direction and turn off. He starts > motor in right direction, then delegates the "off" capability to you > (your role being to watch a limit switch and turn off the motor at > that time -- or, when some timeout is exceeded) and exits. (no need > for him to hang around consuming ALL the resources that he originally > needed to determine how the motor should be operated) > > However, since my capabilities reside in the kernel, I can opt to > have the kernel track derivations and cascade revocations. But, this > means all derived capabilities must come from a single "parent" > >>> As I began my original post: >>> "... i.e., how best to differentiate the examples where >>> X should be allowed vs X should be prohibited." >>> you can come up with examples where /each/ approach is "right" >>> and the others *wrong*. :< >>> >>> Engineering: finding the least wrong solution to a problem. >>> >>> <frown> But, at least its interesting! :> >> >> This is why I object to the statement that it SHOULD ALWAYS be >> asynchronous. The only real answer is that "it depends", and lists can >> be made of what it depends on. Some examples include: >> >> If the authorization even remotely revokable? (Sometimes it isn't) > > You obviously can't revoke authorization for a fait accompli. > But, what other authorizations, once granted, can't be rescinded? > Some may leave you in a predicament (e.g., never being able > to turn off the power) but expecting the capability system to > know about these sorts of dependencies is, I think, too much.
You normally can't revoke access to a file once the other process has opened it. Many times privilege is managed, not by force of kernel, but by cooperation of the actors (this presumes that the system can be assumed free of hostile actors). Actors ask for permission, not because they could do the operation without it, but because the permission is needed to do it correctly. Of course there are catastrophic conditions like loss of power where the crashing of a given task is minor compared to other effects that are happening, and many normal promises aren't going to be met, hopefully the emergency recovery system will work to minimize the damage.
> >> What is the effect on the requesting task if the authorization goes away >> unexpectedly? > > The designer of the holding task would have to consider that in how > the tasks actions and recoveries are structured. What would it have > done had the authorization not been granted in the first place? >
If the holder really needed the permission, then it would have waited until it got it. Many operations get MUCH more complicated to have to worry continuously about every possible failure mode. To casually convert and error condition that normally would be indicative of a major hardware failure (and thus major software failure isn't unreasonable) to something that really might happen and should be dealt with REALLY make programs much hard to right correctly and even harder to test to make sure they are correct. All this because the designer figures it is ok to define that authorization carries no promises that it will continue?
>> What is the effect of delaying the revocation? > > The big problem with "being considerate" is that it encourages > others to be exploitative. There is no downside to their > "selfishness" so, "why not?"! "Heads, I win; tails, you lose" >
I pity your team if this is how you think of them. First, you should only be granting permission for things that you are will to give it for. If the system is theirs, they have the right to be greedy, and if it causes problems, it is their problems. If the system is yours, why are you giving them permission in the first place, if they aren't giving you the value you want, then kill them. If they are paying for the access, make sure you charge them for their usage (and shame on you for not requiring them to meet the design requirements for their pieces).
> OTOH, if you take a heavy-handed approach (unilaterally revoking > capabilities) then sloppy coders pay a price -- by having thier code > *crash* (presumably, users will then opt to avoid applications from > those "developers") > > [There's no other pressure I can bring to bear on them to "do the > right thing"]
Then you also should consider that you are making your "friends" bear much higher costs to do what you want them to.
Hi Richard,

On 11/5/2013 9:58 PM, Richard Damon wrote:
> On 11/5/13, 3:58 PM, Don Y wrote: >> On 11/4/2013 9:45 PM, Richard Damon wrote: >>> All questions to be decided at design phase, with no "generic answer". >>> Presumably, if there is a deadline for when the acknowledgement can be >>> given, then presumably this spec is applied when designing such a real >>> time system. >> >> But that's the problem. When is the design phase "over" for an open >> system? Someone (third party) adds a "feature" a year after product >> release. Does he get to claim the design phase extended to a >> period MONTHS after "initial release" -- because that was when *he* >> was working on the design of *his* feature? > > Hopefully the design phase for the first release is over before the > first release is released! (How else can you see if it is correct?)
That was my point! The design phase *is* done (from your standpoint) before the third party starts adding that new feature. From *his* standpoint, he would like to think the system can accommodate *his* goals, as well. *He* wants the design phase to overlap *his* activities so he has a "say". (sorry, too late. sooner or later, you've got to "shoot the engineer")
> And, yes, the third party can, if he wishes, open back up the spec and > change it, but then HE takes on the responsibility to verify that all > previous code can work with the new spec, and if he can't then he can't > change the spec! Generally later modifications want to be backwards > compatible to avoid this problem.
In reality, that's not practical. Will Apple let you revise iOS to suit your needs? Or, will they say, "sorry, too late"? So, you want to address *likely* needs without dragging in the kitchen sink (only to discover that no one *uses* the sink anyways!)
>> At some point, you say, "this is the environment for which you have to >> design". Every mechanism that you make available is a mechanism that >> has to be maintained and utilized. And, also acts as a *constraint* >> on the system and its evolution: "Crap! I have to notify each Holder >> of a pending capability revocation 100ms before revocation. But, my >> satellite transmission path is twice that! I guess I just can't use >> satellites (or, can't revoke capabilities)" > > Actually, in this case there isn't really a problem for the revoker. If > the spec is that I need to give the Holder 100ms to reply, and it takes > 200 ms to send (and presumably receive) a message, then I just need to > send the notice 500ms before I actually revoke the privilege, that will > give the Holder the required time to respond.
That assumes *you* can delay when you revoke it! Or, can tell in advance when you will need to so you can give the early warning. It also assumes you *know* that the transport delay will be as long as it happens to be -- which you might not be aware of until after the message is actually sent (the route a network message takes can vary over time based on availability, bandwidth, error conditions, etc.)
> And yes, if a system is built on certain assumptions, trying to move to > a less capable environment often requires looking at lots of the system. > What you really want to do is think when you first make the assumptions > as to what you really need as assumptions, and what you don't need to
Exactly. What can you (reasonably) *expect* your users/developers to accommodate -- given the other design criteria that you have to address.
> assume. In this case, I would likely have made the time allowed to > notify as something configurable/negotiated, and if the grantor really > needs that 100ms, then yes, you can't use the satellites, but if it > doesn't then perhaps the grantor can be told that the link is slow so it > needs to be patient.
But all this adds complexity. And, at the end of the day, the holder will still have to be able to deal with the case where his authorizations "don't work" (e.g., what if the object is deleted??) See what I'm saying? If you're going to have to deal with this possibility anyway, then why complicate things with other mechanisms that might not work? Mach has a concept called a "port". It's a communication mechanism that is probably the singlemost important item/concept in the design. I.e., it's not some "afterthought" shoehorned in at the 11th hour. Mach includes a provision whereby you can request a given port be "renamed". Sort of like saying, "I want file descriptor 2 to hereafter be called 73". There are some potential advantages to allowing a client to have such a change made on its behalf. But, the request isn't guaranteed to be handled. It may simply not be possible -- today. As a result, the client has to be able to live with the port having its original "name" (presumably, there was a reason the client did NOT want to do this!). So, all of the code that is in place to exploit accessing *renamed* ports sits idle -- and the code to handle unrenamed ports remains in play. THEN WHY IS THIS FACILITY IMPLEMENTED?
>> E.g., I handle physical resource revocation asynchronously BECAUSE >> I HAVE NO CONTROL OVER EXTERNAL EVENTS. If I wrap the resources >> in a capability, now I suddenly have to provide different semantics? >> ("Hey, you can't revoke the 'sunlight' capability!") > > First, I never said that all revocations need to have notification! It > sounds like in your case here, because there is a real chance of > resources going away asynchronously from external causes, asynchronously > removing permissions should not cause significant issues. My comment was > just that this is not always the case, so there are some situations > where asynchronous revocation is not the right way to do things.
There's never a perfect fit for all cases. But, if you try to include provisions that make *every* case "easy", you end up with a more complex system/implementation. And, questionable "returns". E.g., the Mach system call is a privileged operation. It's more expensive to implement things "in the kernel" (kernel gets bigger, mistakes can have dramatic consequences, etc.). If you can't be guaranteed that it will be usable, why go to this effort?
>>> What if >>> the resource is access to some memory, do you check for an "error" after >>> every access? This presumes that the system even gives you an >>> application level ability to continue pass this sort of error. What do >>> you do about cooperative "authorization" to access parts of structures >>> for things like synchronization where there isn't a hardware/OS >>> capability to stop you? >> >> If "backing store" could go away while it was being used, then >> your "system" would obviously need a way of detecting that and >> informing the "holder" of that resource that this has, in fact, >> happened. The holder would also ned to be aware of what resources >> could "disappear" and code to accommodate those possibilities. >> >> If I am driving a motor, power to the motor driver/translator >> could fail while I am in the middle of an operation. Even if I >> have a backup power supply, the motor driver itself could fail. >> Even if I have a redundant motor driver, the *motor* could >> fail. Or, a gearbox, mechanism, *sensor*, etc.
> I wasn't talking about the memory physically going away, but some > process t first granting another process the right to access some chunk > of memory and then suddenly and without warning revoking that permission > and removing the access rights. Since the normal result of this would be > aborting that process, this can be very bad.
Why does that have to be the "normal result"? The consumer could check to see if his operation "succeeded" at the end of his use of that region. Or not. Then, roll back whatever part of his activities has (may have been) compromised in the process. If you are dealing with something as basic as memory, then you would presumably have hardware supporting memory objects. If it's just a mutex governing a shared object, that's below the granularity of what I am discussing. How long you hold a lock isn't the same issue. If, OTOH, I opt to protect all of *my* files and deny you access to them (filesystem analogy), I should be able to enforce this restriction even if you currently have several of them "open" (i.e., the permission need not ONLY be implemented "on open()" but on *any* reference to the object). I could, conceivably, remove the *image* of each open file that you are actually operating on at the time (e.g., un-mmap() them)
> The only way that process > can reliably operate would be to use some form of operation that > atomically checks the rights, and does the access and returns an error > flag that needs to be tested. This will very likely greatly slow down > the process defending itself from privilege revocation just because the > grantor is unwilling to first send notice and wait a reasonable time > before actually revoking the right.
Why does it have to "check the rights"? Just *do* what you intended to do. Your request will either succeed (which implicitly tells you that you *held* a valid capability and that the capability was still valid while your request was being processed) or fail (which tells you that you either had a bad capability *or* that it was revoked before your request could be completed). [Remember, you have to present the capability in order to perform *any* operation on the object. Everything is mediated by the object's "Handler"]
>>> In your case, since the operation do have the >>> capability of suddenly starting to fail, an asynchronous revocation >>> likely doesn't cause problems that you didn't need to handle anyway, as >>> long as the system structure to allow it. >> >> That's the point! You (developer) know shit CAN happen. Anything that >> you are "holding" can be revoked. Plan on it. (Heck, I can "kill -9" >> *you* without giving any advanced warning! Gee, *then* what?) > > Many things are very unlikely to "just happen" at random. Presumably the > grantor of the privilege is doing so because there is a reason to grant > the privilege. It doesn't make sense to burden the process being granted > the privilege with unneeded problems.
But a privilege (capability) can be granted *hours* or days before it is ever used! There's no "freshness seal" imposed on capabilities. Surely you don't want the holder to have to periodically check to see that the capability is "still valid". Likewise, if you force a client to defer requesting a capability until just before it is needed, then the client risks a delay in beginning his activity as the capabilities are negotiated, etc. Often, a task is spawned with the knowledge of what it is intended to do. It makes sense to endow it with the capabilities that it is going to eventually need when it is created -- instead of having it remain connected to its parent *just* so it can request those capabilities when they are needed.
>>> I would generally say that the actor who was given a permission is >>> responsible for relaying the revocations to those it relayed to. If it >>> has shared a right that it might have revoked from it, it needs to >>> maintain a way to do that. >> >> The actor may be gone! BY DESIGN! I.e., he has done <whatever> *he* >> needed to do (with "greater privilege") and is now leaving *you* to >> clean up (with some reduced capability). > > Generally if you grant a privilege to an actor, and it is subject to a > revocation request, they will reply back that they are done with the > privilege (a "self revoke"), perhaps because there may be a limit to how > many people this privilege will be given to at a time. You also can > learn that they aren't there anymore when you signal them that you are > preparing to revoke.
That's not an assumption you can make. You are assuming a parent always hangs around to watch its children die. It might, instead, create its offspring and *then* die -- knowing they have the tools that they need to perform their tasks (what other role does the parent have -- cheerleader?)
>>> This is why I object to the statement that it SHOULD ALWAYS be >>> asynchronous. The only real answer is that "it depends", and lists can >>> be made of what it depends on. Some examples include: >>> >>> If the authorization even remotely revokable? (Sometimes it isn't) >> >> You obviously can't revoke authorization for a fait accompli. >> But, what other authorizations, once granted, can't be rescinded? >> Some may leave you in a predicament (e.g., never being able >> to turn off the power) but expecting the capability system to >> know about these sorts of dependencies is, I think, too much. > > You normally can't revoke access to a file once the other process has > opened it.
In the systems *you* may be familiar with. That doesn't mean its a *rule*! (see above example). As all accesses to the object (file) have to involve the capability/ticket/key/Handle, I can choose to not let you read/write another sector/byte. If I maintain the backing store for the file system, I can opt to replace that page with a page full of 0x00. The *file server* defines the contract for the files that it handles. Clients use its services with this in mind.
> Many times privilege is managed, not by force of kernel, but by > cooperation of the actors (this presumes that the system can be assumed > free of hostile actors). Actors ask for permission, not because they > could do the operation without it, but because the permission is needed > to do it correctly.
I'm using capabilities for the express purpose of preventing rogue/malfunctioning actors from "doing things they shouldn't". That includes "doing things they have been TRICKED into doing". See my email_address_t example.
> Of course there are catastrophic conditions like loss of power where the > crashing of a given task is minor compared to other effects that are > happening, and many normal promises aren't going to be met, hopefully > the emergency recovery system will work to minimize the damage. > >>> What is the effect on the requesting task if the authorization goes away >>> unexpectedly? >> >> The designer of the holding task would have to consider that in how >> the tasks actions and recoveries are structured. What would it have >> done had the authorization not been granted in the first place? > > If the holder really needed the permission, then it would have waited > until it got it. Many operations get MUCH more complicated to have to > worry continuously about every possible failure mode. To casually > convert and error condition that normally would be indicative of a major > hardware failure (and thus major software failure isn't unreasonable) to > something that really might happen and should be dealt with REALLY make > programs much hard to right correctly and even harder to test to make > sure they are correct. All this because the designer figures it is ok to > define that authorization carries no promises that it will continue?
Let the holder wait until he needs it. He asks. And is told "no". Now what? He is told *yes*, presents the capability for his first access to the resource. All is well. A moment later, presents same capability for second access and the request *fails*. (capability revoked; resource deleted; service unavailable; etc.) Now what? You have to expect these sorts of failures -- especially in complex systems. "Network is down; try again later"
>>> What is the effect of delaying the revocation? >> >> The big problem with "being considerate" is that it encourages >> others to be exploitative. There is no downside to their >> "selfishness" so, "why not?"! "Heads, I win; tails, you lose" > > I pity your team if this is how you think of them. First, you should > only be granting permission for things that you are will to give it for.
Not my "team". Rather, folks who will maintain this after me. If there is an easy and a right way to do things, I'm willing to bet "easy" is going to win out. And, all it has to do is win out *once* and it will invariably have consequences that make lots of other "right" decisions harder. It's *really* hard going into an existing "mess" after-the-fact and trying to fix it... especially when you've been tasked with doing something else, entirely! My goal is to make the "easy" way the *right* way.
> If the system is theirs, they have the right to be greedy, and if it > causes problems, it is their problems.
No! It's not *theirs* any more than the systems you design belong to *you*!
> If the system is yours, why are you giving them permission in the first > place, if they aren't giving you the value you want, then kill them.
Shooting people is frowned upon in the FOSS world -- just because someone's code isn't up to snuff. :>
> If they are paying for the access, make sure you charge them for their > usage (and shame on you for not requiring them to meet the design > requirements for their pieces). > >> OTOH, if you take a heavy-handed approach (unilaterally revoking >> capabilities) then sloppy coders pay a price -- by having thier code >> *crash* (presumably, users will then opt to avoid applications from >> those "developers") >> >> [There's no other pressure I can bring to bear on them to "do the >> right thing"] > > Then you also should consider that you are making your "friends" bear > much higher costs to do what you want them to.
"Friends" can see having *two* ways of doing something as "extra work": "Oh, great! Now I have to code for the early notification case *and* the asynchronous revocation case..." You can't force people to be good designers. But, you can put tools in place that make it a lot more likely that you get the results that you want. SWMBO tracked construction expenses at a large local hospital. "The Guys" (construction/maintenance staff) would complain that when they needed "a few things", they had to fill out a lot of paperwork which took a lot of time: "The bathroom is flooded *now*! We can't wait for purchasing to approve the supplies to repair the leak!" So, they created a policy whereby they could use credit cards issued in the hospital's name. Suddenly, there are no more formal purchase orders -- even for the long term, *big* projects! And, everyone has thousands of dollars on their credit cards each month. Needless to say, the folks in purchasing are pissed -- cuz they have been cut out of the loop; accounting is pissed because they have no control over the monies; management is pissed because they have no idea what sort of progress and budgetary constraints have been applied. The only guys who are happy are The Guys (construction/maintenance). Hmmm... can't disallow the credit cards as there will always be "emergencies". So, take their initial complaints and fold them back on themselves: "OK, you can sidestep the paperwork process and the delays that are associated with it by using the credit cards. However, you have to file the paperwork *after* the purchase in order for *your* card to be paid off -- forget to file, and your credit is automatically turned off. And, so that 'we' know why you chose to use the expedited credit card approach instead of the normal purchasing procedure, we need you to prepare an analysis of the factors that went into this decision and include that with the abovementioned paperwork. Surely, that ALL makes sense, right?" Suddenly, credit cards see a lot less usage (construction workers, plumbers, electricians, etc.) aren't real keen on writing up "reports". Much easier to just fill out a purchase order for "normal stuff" and let it go through normal channels! The "right" way is now the *easy* way!
A followup...

On 11/6/2013 3:45 AM, Don Y wrote:

> Suddenly, credit cards see a lot less usage (construction workers, > plumbers, electricians, etc.) aren't real keen on writing up > "reports". Much easier to just fill out a purchase order for > "normal stuff" and let it go through normal channels! The > "right" way is now the *easy* way!
One of the goals of the automation system I am deploying here is to address the needs of folks with (various) "disabilities" (whatever that means). So, the UI is very abstract. Unlike most systems, it isn't implicitly assumed to be "visual" (my preferred means of interacting with it is aural -- so I can keep my eyes and hands free to do other things! I'd hate to have to set down something I was carrying *just* so I could pick up a "display" to ask for the lights in the room to be turned on!). But, I expect others will write more (user specific) code than I. I've invested a lot in the infrastructure and core services (along with hardware/firmware). How do I "entice" others to embrace this same "neutral UI" approach that I have adopted? I suspect their first inclination is to "draw" some pretty control panel... then figure out how they will resize it for different output devices, etc. Along the way, support for other non-visual interfaces will go away. Because folks tend to focus on *their* needs first (and often "move on" thereafter). Will they "back fill" the support for audio interfaces? haptic ones? etc. (wanna bet the answer is "I don't have the time... besides, the visual one is really COOL looking! I've even got flying toasters in the background!!") In my case, I don't provide *any* tools that make visual displays easy to create. Instead, the user interface is just a set of available commands for any particular situation that are presented in whatever output modality is selected. I.e., for a visual display, this may just be a bunch of large rectangular buttons with text legends. For an aural display, it may be a spoken menu. etc. I can't *prevent* someone from developing a fancy GUI. But, they'll then find that the rest of the system doesn't "fit" into that. So, they'll also have to reengineer the existing UI's. etc. Easier to just follow the (code) templates that I create and *know* that the system will present them to the user in whatever form the user needs! The "easy" way is the "right" way. Exploit laziness.
Don,
I started to do a point by point rebuttal, but realized that we were
losing the forest by classifying every tree.

My complaint was to your statement that the ONLY proper way to revoke a
permission is asynchronously. My position is that you can't make such a
statement and that you need to apply design to the situation, and that
in some conditions revocation should have a defined notification before
it is to be revoked.

Let me put a real world example, generally ones right to drive a vehicle
is a privilege granted by the government, and the government has the
power to revoke this privilege, but if it does so, there are
notification requirements so that you do know your privilege is being
revoked. This means that once your have gone through the procedures to
get the privilege to drive, you can safely do so, knowing that if for
some reason your privilege is revoked, you will be given sufficient
notice so that you don't get in trouble.

Imagine instead, that the government reserved the right to revoke your
privilege without notice (but did give you a way for you to check if
your right has been revoked), also check points were established at
random to check that you DO have current privilege to drive, and that
driving without privilege was a capital offense. Would you want to
drive? IF you did have to, I bet you would want to spend a lot of effort
checking that you haven't been revoked.

This is exactly like the case that can happen for some forms of
privilege, like access to shared memory, if this sort of access is to be
revoked asynchronously, it generally means that process doing it will be
aborted, or the process needs to not treat it as shared memory but use
some sort of kernel call to check the permission and do the access
atomically (instead of just accessing the memory).

I agree, that in SOME cases, the asynchronous revocation is a good
model, but not all. In most cases where a notification/cooperative
revocation system makes sense, for reliability concerns, a backup
asynchronous method make sense, to allow you to revoke a malfunctioning
process, but at that point, since it is already malfunctioning (since it
didn't complete the cooperative revocation method), the problems imposed
on the task are likely reasonable. This doesn't mean that the
cooperative method was worthless.

Also, non-backwards compatible specification changes ARE expensive. That
is just the way things work, at least if you want to be able to talk
about software having correctness. This does mean that you do want to
put some effort into defining your requirements, to put into them the
things you need to  verify/prove correctness, but not things that you
don't need that add unneeded future limitations.


Hi Richard,

On 11/6/2013 10:43 PM, Richard Damon wrote:

> My complaint was to your statement that the ONLY proper way to revoke a > permission is asynchronously.
Sorry, that wasn't my intent. What I was trying to address was the *practical* aspect of all this. *I* have to create the mechanisms that will ultimately be used throughout the system. Run the thought experiment(s): -Imagine I make a system that notifies, waits "some" time, then revokes. -Imagine I make a system that just revokes -- and notifies after the fact. I then tried to present possible scenarios for what *might* happen in each case. I.e., notification gets lost/delayed/ignored -- or he was "blocked" while the notification came in. In each case, how does that affect the eventual actions of the "holder"? Ans: he has to deal with NOT having the capability when he opts to use it. I.e., he can't blindly *assume* it will "work". So, he has to code for both cases: that he received the notification and is going to try to comply in an orderly fashion; and, that he didn't have enough warning (or *any* warning) when the notification arrived and has effectively *lost* the resource before/during its intended use. My *opinion* is that this extra complexity -- both "in the system" and in the "applications" -- will end up wasted. That to be effective, it would require even *more* mechanism than we have discussed (e.g., negotiating a "early warning" interval, deciding how to handle the case when that interval can't be met, etc.). Given that "holders" will have to tolerate the case of the capability "going away", it seems easier to just handle that case and make folks aware of it in the API. Remember, these are "exceptional conditions". You *expect* to be able to hold a capability that you have requested and been granted. I'm just not willing to make that a *guarantee*. So, I need a way to "change my mind" -- BECAUSE I HAVE A GOOD REASON FOR DOING SO, NOW (just like I can preempt your execution if I have a good reason!).
> My position is that you can't make such a > statement and that you need to apply design to the situation, and that > in some conditions revocation should have a defined notification before > it is to be revoked.
What if I can't *guarantee* that notification arrives sufficiently early for you to do anything about it? If you will be able to cope with this (by implementing your algorithm differently -- even if it requires a complete rollback), then why shouldn't I opt for this as the normal behavior? If you *insist* on this, then I may need other concessions from you to ensure the level of performance is met. E.g., maybe you can only hold a capability for a fixed period of time -- that way, *I* know all I have to do is wait and I get it back automatically? But, this complicates your work in other ways...
> Let me put a real world example, generally ones right to drive a vehicle > is a privilege granted by the government, and the government has the > power to revoke this privilege, but if it does so, there are > notification requirements so that you do know your privilege is being > revoked. This means that once your have gone through the procedures to > get the privilege to drive, you can safely do so, knowing that if for > some reason your privilege is revoked, you will be given sufficient > notice so that you don't get in trouble.
Ah, but they will only *try* to give you notification! If that notification doesn't make it to you (you've moved, were out of town, etc.) and you later encounter a police officer, you're in the same situation as if you *had* been notified and chose to ignore it. [Sure, you could go to court and hope you get a rational judge but it's not The State's responsibility to ensure you have been notified -- only that they "made a concerted attempt".]
> Imagine instead, that the government reserved the right to revoke your > privilege without notice (but did give you a way for you to check if > your right has been revoked), also check points were established at > random to check that you DO have current privilege to drive, and that > driving without privilege was a capital offense. Would you want to > drive? IF you did have to, I bet you would want to spend a lot of effort > checking that you haven't been revoked.
But you're assuming there *is* some "really bad consequence". What if you rarely drive? What if there are few police officers in the parts of town that you frequent? What if you are approached by a cop while *walking* and he asks for an ID. He sees that your license is expired and confiscates it. Or, you go to cash a check at a bank and the bank officer does this on behalf of The State? I.e., any time you would *normally* use that credential you run the risk of it NOT being honored -- even if you aren;t "punished" for this (your "punishment" is not being allowed to USE it)
> This is exactly like the case that can happen for some forms of > privilege, like access to shared memory, if this sort of access is to be > revoked asynchronously, it generally means that process doing it will be > aborted, or the process needs to not treat it as shared memory but use > some sort of kernel call to check the permission and do the access > atomically (instead of just accessing the memory).
Protect shared memory with a mutex. Hold it as long as you want. If I want to control that with a capability, I can wrap the mutex access with the capability: so, you can't *take* the lock without permission but, once held, I can't interfere with your holding the lock. It's a capability. I can make it "control" whatever I choose. And, implement whatever *else* I choose to ensure that this control makes sense. E.g., if I revoke access to a piece of memory, I could opt to *suspend* your process at the same time. Then, make a copy of the memory while someone else accesses it. Then, restore the original before resuming your process (and restoring your capability). I.e., you are *always* at the mercy of the kernel. I just have to ensure that I uphold any contracts that I have agreed to with you. And vice versa (of course, if *you* cheat, I can bitch-slap you! :> )
> I agree, that in SOME cases, the asynchronous revocation is a good > model, but not all. In most cases where a notification/cooperative > revocation system makes sense, for reliability concerns, a backup > asynchronous method make sense, to allow you to revoke a malfunctioning > process, but at that point, since it is already malfunctioning (since it > didn't complete the cooperative revocation method), the problems imposed > on the task are likely reasonable. This doesn't mean that the > cooperative method was worthless.
I'm not claiming "one is good" and "the other is bad". I'm just trying to look at the realistic consequences of each approach. How to balance complexity, resources, etc. against "convenience" (for want of a better word :< ) I suspect most folks will just code as if they could lose a resource prior to using it or *while* using it. I imagine the result code from accessing the service/resource will be *all* they look at. And, that any signal handler for "resource revocation" will simply be undefined. It's just the least effort approach (it should be obvious that I expect folks to be lazy in their implementations!). When faced with this sort of condition, I *also* expect these folks to just report "FAIL" for their activities and not even *try* to get things "right" (i.e., "as good as possible in the circumstances")
> Also, non-backwards compatible specification changes ARE expensive. That > is just the way things work, at least if you want to be able to talk > about software having correctness. This does mean that you do want to > put some effort into defining your requirements, to put into them the > things you need to verify/prove correctness, but not things that you > don't need that add unneeded future limitations.
This is why I am spending the effort *now* considering how variuous scenarios are likely to be handled. I don't want to have to make a change down the road because I "discovered" something that "can't work". I'm not vain enough to think I can come up with the Right way to handle every situation. But, I *do* think I can come up with a practical way that handles most situations economically and *all* situations "properly", even if not efficiently. I can always decide *not* to revoke a capability! Then, *none* of the mechanism gets invoked.
Hi Don,

On Tue, 05 Nov 2013 12:54:15 -0700, Don Y <This.is@not.Me> wrote:


>Amoeba's "ticket" is far more efficient than my approach. It can be >copied, moved, etc. "for the cost of a long long" (IIRC).
In the original version yes ... later they went to a 256-bit ticket to include more end-point information and better crypto-signing.
>In my case, a trap to the kernel is required for each operation on a >"Handle" -- because it's a kernel structure that is being manipulated >(or referenced). > >I can still give user-land services the final say in what a Handle >*means* (along with the "authorities" that it conveys to its bearer). >But, you have to go *through* the kernel to get back to userland. > >A subtle difference [vs Amoeba]: if "task" (again, forgetting lexicon >differences) A decides to manipulate object H backed by service B, >in Amoeba's case, >B does all the work for each attempt A makes. >EVEN IF THE ATTEMPT IS DISALLOWED by H's authorizations. >B's resources are consumed even though A has no authority to use B's >object (H)!
In your case, kernel resources are consumed. 6 of one ... And unless you can prevent A from even connecting to B there will be "wasted" effort on B's part anyway. I may be misunderstanding, but ISTM that you're trying to pack too much into the meaning of capabilities [or possibly too much stock into prior authorization]. Regardless of how capabilities are implemented (user vs kernel), every system I have read about would divide the credentials and authorizations involved in this problem among multiple capabilities: - X(H) is a legal operation on H - B administers H - A can perform X(H) - A can connect to B - B can perform X(H) as a proxy - B can perform X(H) as proxy for A etc. It seems as if you want to go straight to the final one - but the question is: how do you get there? Who grants to A that final capability that implies all the others? To get that capability presumes that A can talk to B (or some other granting authority) in the first place ... which you seem to want to prevent. Obviously, B can tell the kernel that B administers H ... but how does the kernel know what A wants with B? How can A try to access H directly? "URN: A doesn't know about B." Ok, but then can B act as a proxy for anyone, or just for "authorized" users? Who decides A is authorized for H? B? How does B (or anyone else) know A wants access to H if A can't even ask? Amoeba and others solve the problem by letting B administrate. A connects to B, asks for access to H. A can present a ticket for H if it has one, or B can issue a ticket to A if A is allowed but doesn't have one. [Amoeba servers have a public access API which anyone can connect to ask for a ticket granting specific access to a managed object. After first getting the ticket, they can connect to actually perform the allowed operations. Getting access then is a 2 step process.] None of this requires free roaming user-space capabilities ... it all can be with handles referencing secure capabilities kept by the kernel or another credential server (Kerberos model).
>In my case, if A tries to use one of B's resources (H), it first must >truly *be* one of B's resources (not just a long long that A *claims* >is managed by B). If not, the kernel disallows the transaction.
How does the kernel know H belongs to B? How does A know to ask for H in the first place?
>If H truly *is* backed ("handled") by B, then the kernel allows the >transaction -- calling on B to enforce any finer grained authorities >(besides "access"). I.e., B knows which authorities are available >*in* H and can verify that the action requested is one of those allowed.
What "transaction"? The set of possible objects and the actions that might need to be performed on them both are unbounded. A generic "do-it" kernel API that can evaluate every possible action on any object is a major bottleneck and a PITA to work with. Even if the high level programmer has a sweet wrapper API, the low level programmer has to deal with absolutely anything that can be pushed through the nonspecific interface. For decades, Unix has been moving toward more verbose APIs and away from trying to cram everything into ioctl(). [How many options do sockets have now? And how many different parameter blocks?] Linux, OTOH, went back-ass-wards with its new driver model in which every operation is performed by reading/writing some special file.
>(File systems are bad examples because they are so commonly used to >implement namespaces and not just "files")
A common directory service is fine, but I'm not particularly a fan of uniform "file" interfaces. I rather like the idea of being able to ask an object (or its managing proxy) what functions are available. Unfortunately, doing this generically is a PITA (so no one does it). If you are familiar with COM or Corba, it amounts to the server returning an IDL specification, and the program [somehow] being able to interpret/use the IDL spec to make specific requests.
>> ... revoking a master capability must also revoke any other >> capabilities derived from it [even if located on another host]. > >This means "something" must track history/relationships.
Yes. However it is necessary. If you no longer trust Q, then, by transitivity, you no longer trust anyone Q may have delegated to.
>It also says nothing about *when* the revocation takes place >(effectively) and when notification of that event occurs.
Yes. But as you said to someone else, every program must deal with the possibility of permission being denied. Under those circumstances, notification can be deferred until attempted use. System-wide synchronous revocation is impractical, but revocation can be done asynchronously if master capabilities are versioned and derived capabilities indicate which version of the master was in force when they were issued. It suffices for the owner/manager to be able to say "all capabilities for H [or better, X(H)] issued prior to CurVer(H) are no good". It also can be done with time stamping, but that presupposes a system wide synchronized notion of time. In practice, versioning is simpler.
>I.e., in Amoeba's case, the kernel never knows who is holding which >(copies!) of a particular ticket (derived from some other ticket, etc.). >So, there is no wy for it to know who to notify AT THE TIME OF >REVOCATION. Instead, it has to rely on the Holder(s) noticing that >fact when they *eventually* try to use their capabilities >(tickets/keys).
So? In your system host kernel's exchange capabilities and proxy for one another. How are you going to notify a host that's powered down?
>And, you are never sure when every ticket has been "discovered" to be >voided -- a task can have a copy of a ticket (you can hold multiple >copies of any ticket!) that he just hasn't got around to trying! > >Sort of like finding a bunch of keys in a desk drawer and not discarding >them because you're not quite sure you *want* to discard them *maybe >they still FIT something!)
The analogy is semi-flawed: capabilities shouldn't be thought of as student key cards that open some subset of the doors on campus. Properly a capability opens only one lock [i.e. addresses one object]. A rejected capability is known to be useless, so there's no point to keeping it. The "one lock" principle is applicable to replicated services: every instance of a particular service should answer to the same set of capabilities. Obviously a capability system *can* provide key card functionality, but you need to look at the situation in the opposite way: i.e. the student's key card doesn't open a group of locks, but rather a group of locks share capability to admit the card. Semantics ... but important semantics.
>In my case, kernels are the only things that *hold* capabilities. >So, all kernels can be notified that a particular capability has been >revoked and they all *are* revoked. Just like if your kernel >chooses to delete a file descriptor (remembering that it is now >a zombie), any future references by you (the task) to that fd can >throw an error assuming you ignore the signal sent to notify you >that it has been destroyed).
But hosts may be offline: powered down or network partitioned. How long do you keep the "expiration" of a capability? That just clutters up your store. At some point, you have to accept that a remote host may try to use a capability the resource's host no longer honors.
>Yes. My "factory" publishes Handles for key services that tasks may >want to avail themselves of. These are accessed by a single "Service >Locator" Handle that is given to each task (task == process == resource >container) as the task is created. [Conceivably, the Handle for this >service given to Task A can differ from Task B if the authorizations >between A and B are to be different!]. > : >The task can then contact the Handler behind that Handle -- i.e., the >service in question -- and make whatever requests it is authorized >to make (based on its Handle).
But who decides what permissions A and B have wrt the service?
>More importantly, the creating task can do all of this for the "child" >cramming the appropriate Handles for the Objects (incl Services) that >the child will need AND THEN DELETING THAT INSTANCE OF THE SERVICE >LOCATOR handle to effectively sandbox the child. I.e., these are >the resources you can use and operate on -- nothing more!
That's a nice feature. Amoeba didn't have this, but other capability systems did.
>If I were to tag Handles with "rightful owners", then proxies would >be more apparent. But, how do you validate a proxy's request for a >Handle on behalf of another? ("Please give me Bob's door keys...")
Again, this is a scenario of replicated service: local proxies should be considered an instance of the remote service. The user's capability to access the service lets it access the proxy. The proxy itself should have a separate capability to access the remote service so that the chain of trust remains valid.
>--don
George
Hi George,

[eliding a lot for fear of hitting upper message length limit]

On 11/7/2013 2:27 PM, George Neuner wrote:
> In the original version yes ... later they went to a 256-bit ticket to > include more end-point information and better crypto-signing.
OK. But that just changes the size of the copy. It still allows you to create as many copies as you want -- without anyone knowing about them. And, makes "a certain bit pattern" effectively the same as another copy of that capability!
>> A subtle difference [vs Amoeba]: if "task" (again, forgetting lexicon >> differences) A decides to manipulate object H backed by service B, >> in Amoeba's case,>B does all the work for each attempt A makes. >> EVEN IF THE ATTEMPT IS DISALLOWED by H's authorizations. >> B's resources are consumed even though A has no authority to use B's >> object (H)! > > In your case, kernel resources are consumed. 6 of one ...
Yes. No free lunch. *Big* limitation but, I'm hoping, one with worthwhile tradeoffs!
> And unless you can prevent A from even connecting to B there will be > "wasted" effort on B's part anyway. > > I may be misunderstanding, but ISTM that you're trying to pack too > much into the meaning of capabilities [or possibly too much stock into > prior authorization].
A user (task) somehow gets a set of "authorizations" to a particular object (an object may actually be a service, another task/thread, etc.). This could come from a "parent" task handing the authorizations and object reference -- together called a Handle, in my lexicon -- to the task. Or, from the task requesting that (object,authorization) from some chain of "directory" services -- ultimately terminating at a service that is responsible (and capable!) of satisfying this request. The user then wants to invoke a method supported by that object. The Handle (which indicates the object and the authorizations thereof FR THIS INSTANCE OF THE HANDLE) is presented to the kernel in an IPC/RPC request (wrapper for the method to be invoked). If the user doesn't have the *right* to connect to the "service" that implements that object, then the RPC fails before it gets started. I.e., a task can't talk to anything that it doesn't have the *right* to talk to (this is a more fundamental "permission" than the "authorizations" implemented in the capability/Handle). I.e., I can disconnect your Handle from the service that backs it and you're just a spoiled brat crying in a sandbox. Nothing you can do about it -- even if you *had* the authorizations to do grand and wonderful things! I've just "unplugged" the cable tying you to that service. Once the kernel has decided that you *can* "talk" to that service (the one that backs the object in question), the IPC/RPC proceeds (marshall arguments, push the message across the comm media, await reply, etc.). On the receiving end, the service sees your request come in. Knows the object to which it applies (because of which "wire" it came in on), identifies the action you want to perform (becasue of the IPC/RPC payload) and *decides* if you have been allowed to do that! It does so by noting what permissions it has *recorded* for your Handle when it *gave* you that Handle (or, when someone else gave it to you on its behalf). If the recorded permissions/authorizations allow the action that you have requested to proceed, then the service implements those actions and completes the IPC/RPC accordingly (possibly returning ERROR if some OTHER, non-permission-related aspect of the action fails). As the Handler makes the *final* determinationas to whether or not it wants to *do* whatever you've asked it to do to the referenced object, it is free to define any number of such actions -- and any number of arbitrary constraints on them! E.g., it may let *you* write numbers into a file but someone else can only write *letters* -- to that same file! (I have no idea why this would be important :> ) So, unlike AMoeba and other ticket-based systems, the number of "authorizations" isn't defined by a bitfield *in* the "ticket/key". Rather, its whatever the Handler considers to be important. "I'll let you send a message to this email_address_t -- but, it has to be a short one." "I'll let you send a message to this email_address_t -- but it can't have any attachments!" "I'll let you send a message to this email_address_t -- but it can't contain any profanity" "I'll..." Much of the implementation is Mach-inspired. Think of Handles as port+authorizations. Handles that don't implicitly have *send* rights to the receiving port (which is held by the "Handler") can't reference it (remembering that send rights can be revoked. I.e., the holding task can be "disconnected" if the Handler decides he is being abusive, etc.)
> Regardless of how capabilities are implemented (user vs kernel), every > system I have read about would divide the credentials and > authorizations involved in this problem among multiple capabilities: > > - X(H) is a legal operation on H
I.e., there is an IDL for X(H)
> - B administers H
... and task B holds the receive rights for the port that references H (so, any references to H USING THAT HANDLE will end up in B's lap)
> - A can perform X(H)
... because "someone" told B to allow those permissions for requests coming in on the port assigned (given) to A by which it can access object H
> - A can connect to B
... because A (still) holds a send right to the port for which B is the receiver
> - B can perform X(H) as a proxy
... because it is B's job to implement X on H (or, to know how to get *other* agents to perform portions of that operation) A doesn't know *how* to "read a file", "turn on a motor", etc. I.e., the methods associated with H
> - B can perform X(H) as proxy for A
As above.
> It seems as if you want to go straight to the final one - but the > question is: how do you get there? > > Who grants to A that final capability that implies all the others? To > get that capability presumes that A can talk to B (or some other > granting authority) in the first place ... which you seem to want to > prevent.
In the Beginning, ... :>
> Obviously, B can tell the kernel that B administers H ... but how does > the kernel know what A wants with B?
Kernel doesn't *care* what A's intentions are! Doesn't *want* to care! It wants *H* to determine what can be done -- on H! Expects "someone" (task) to implement those actions -- call him B, Q or Elephant. All kernel does is let these two parties talk to each other. And, prevent others from talking that don't have the "right" (deliberate choice of word) to talk to each other. The Handler for an object ultimately implements the permission(s) and actions ("Sorry, I don't want to do that for you and you can't make me!")
> How can A try to access H directly?
A has no knowledge of who is "backing" H. A starts with a *name* for an object (assuming it isn't trying to *create* a yet-to-be-named object). It consults a namespace (another Object that has been created for it and, to which, it has been given access "authorizations" -- of some degree) that has been created for its use. Only things that are referenced in that namespace "exist", as far as A is concerned! Think of it as chroot($HOME) -- /etc/passwd doesn't exist in that context unless *you* happen to have coincidentally created your own "object" and named it such. The namespace, like any other object, is "backed" (handled) by some active entity. When you use the Handle that you have been pre-endowed with (by init?) to access (and operate on!) that namespace, you can ask the namespace to resolve a name... however "names" are defined in your namespace (e.g., names might be simple integers, or 8000 character strings, or binary numbers, or...). You obviously must have some agreed upon convention WITH THE ENTITY THAT CREATED YOUR NAMESPACE about how names are defined -- and possibly *used* -- in that namespace. That convention may be different for some other namespace -- even if that other namespace is handled by the same active entity! All that matters is the agreed upon syntax of the API -- as evidenced in the IDL for that "method" -- and the conventions you agree to (when your code was written). When you "lookup" a name, the namespace service (for that namespace, yada yada yada) gives you a Handle to the *object* that is paired with the name you provided. Or, "ERROR_NOT_FOUND", etc. Again, by convention, you know the type of the object that you have just been granted a "reference" to. So, you know what methods you can *potentially* ask to be performed by that "object" on your behalf. The Handler that backs that object (referenced in your Handle), holds the receive right (Mach-speak) for that "port". (You now hold a *send* right to it). When that Handle is used in an IPC/RPC, the identifier of the particular IPC/RPC "method" of interest, along with any arguments involved, will be delivered to the Handler holding that receive right FOR THAT PORT (meaning the *object* associated with that port/Handle). If, for example, "H" is the file system, then you might be asking B to "create a new file" in that filesystem. Where in the *real* filesystem it actually resides may be hidden from you. All you care is that you will subsequently be able to access it using the name "foo" -- that you provided (presumably avoiding any conflict with other names IN YOUR NAMESPACE -- because the Handler for your namespace won't let you create a "new name" that conflicts with an "old name" (part of the convention that you adhere to when you interact with a Namespace object!) Presumably, you will put something in this file. Or, perhaps not. Maybe your role was just to create it, prevent its deletion and place it into a *new* namespace that you will pass onto one of your "offspring" -- so *it* can fill it with content!
> "URN: A doesn't know about B." Ok, but then can B act as a > proxy for anyone, or just for "authorized" users? Who decides A is > authorized for H? B? How does B (or anyone else) know A wants access > to H if A can't even ask?
Who decides that UID "don" can access ~don but not ~george?
> Amoeba and others solve the problem by letting B administrate. A > connects to B, asks for access to H. A can present a ticket for H if > it has one, or B can issue a ticket to A if A is allowed but doesn't > have one.
Same thing, here.
> [Amoeba servers have a public access API which anyone can connect to > ask for a ticket granting specific access to a managed object. After > first getting the ticket, they can connect to actually perform the > allowed operations. Getting access then is a 2 step process.]
Same sort of approach. But, the kernel has no explicit knowledge of what that "specific access" entails. It just routes messages between endpoints after ensuring that you have the "right" to use a particular endpoint!
> None of this requires free roaming user-space capabilities ... it all > can be with handles referencing secure capabilities kept by the kernel > or another credential server (Kerberos model).
User-space capailities allow the kernel to get out of the loop. But, mean that the kernel can't *do* anything to control the proliferation of copies, etc.
>> In my case, if A tries to use one of B's resources (H), it first must >> truly *be* one of B's resources (not just a long long that A *claims* >> is managed by B). If not, the kernel disallows the transaction. > > How does the kernel know H belongs to B?
It doesn't. It just pushes a message down that "pipe" and... Gee, look, B is suddenly READY to execute, again! How'd that happen? :>
> How does A know to ask for H in the first place?
Convention. How do you know to ask for ~/.profile when a user logs in? Why not /foo/biguns?
>> If H truly *is* backed ("handled") by B, then the kernel allows the >> transaction -- calling on B to enforce any finer grained authorities >> (besides "access"). I.e., B knows which authorities are available >> *in* H and can verify that the action requested is one of those allowed. > > What "transaction"? The set of possible objects and the actions that > might need to be performed on them both are unbounded.
Yes. Kernel cares not about *what* A is asking B to do on H. Does your UNIX box care if you push "ABCD" down a particular named pipe to some random process on the other end? All it does is make the mechanism available to you as an AUTHORIZED USER of that mechanism. The fact that ABCD causes the reciving process to erase every odd byte on /dev/rdsk is no concern of the kernel!
> A generic "do-it" kernel API that can evaluate every possible action > on any object is a major bottleneck and a PITA to work with. Even if > the high level programmer has a sweet wrapper API, the low level > programmer has to deal with absolutely anything that can be pushed > through the nonspecific interface.
Handlers and Holders conspire as to what actions they want/need to support. If you want to be able to erase every odd byte on the raw disk device, then *someone* has to write the code to do that! If you want to ensure this action isn't casually initiated, then someone has to enforce some "rules" as to who can use it -- and even *how*/when (e.g., you might have authorization to do this, but the Handler only lets it happen on Fridays at midnight). Let the Handler and Holder decide what makes sense to them! I wanted to keep the kernel out of the "policy" issues and just let it provide/enforce "mechanism". Unfortunately, it makes the kernel a bottleneck as all IPC/RPC has to be authenticated there. But, it gives me a stranglehold on "who can do what". It also gives Handlers the ability to decide what constitutes abuse of privilege -- *its* privilege! And, provides far more refined ideas of what those privleges actually *are*. E.g., the email example (that I seem to have become obsessed with). I can have "something" put textual representations of email addresses in the RDBMS. Something (else?) can pull them out, wrap them in a "method" and hand them to "consumers". Those consumers can invoke the method (".sendmail") on the object (address) and never anything more. If I later want to ensure they can;t continue to use that object (email address), I can revoke their "authorization" to use that method on that instance of that object. (Or, I can "unwire" the Handle completely -- so, any future operation throws an error)
> For decades, Unix has been moving toward more verbose APIs and away > from trying to cram everything into ioctl(). [How many options do > sockets have now? And how many different parameter blocks?]
My approach is more like pushing untyped data through a function interface and knowing that the thing on the other end will make sense of it. The IDL lets "humans" agree on just what any particular set of data on a particular interface are LIKELY to mean!
> Linux, OTOH, went back-ass-wards with its new driver model in which > every operation is performed by reading/writing some special file.
This is the Inferno way, as well. In some aspects, its nice. But, its also tedious.
>> (File systems are bad examples because they are so commonly used to >> implement namespaces and not just "files") > > A common directory service is fine, but I'm not particularly a fan of > uniform "file" interfaces. I rather like the idea of being able to > ask an object (or its managing proxy) what functions are available.
I don't have a filesystem. I have *namespaces*. *Multiple* namespaces. Filesystems traditionally bound names (and containers) to "magnetic domains on a medium". Then, to "drivers" for particular devices. In my case, a namespace binds a name to a Handler. What that Handler does and how it does it can have absolutely nothing in common with any other Handler in the system. The *namespace* "object" has operations that can be performed on it (methods defined in the IDL that can be applied to any Handle that references that particular *flavor* of namespace). E.g., resolve(), create(), delete(), etc. But, it has no sense of reading/writing *to* the Handles that it manages.
> Unfortunately, doing this generically is a PITA (so no one does it). > If you are familiar with COM or Corba, it amounts to the server > returning an IDL specification, and the program [somehow] being able > to interpret/use the IDL spec to make specific requests.
I don't implement a full-fledged factory. Rather, I assume you know everything there is to know about the objects with which you are interacting. That you and their Handlers have conspired beforehand to some set of agreed upon methods (abilities? trying to avoid using the word "capabilities"). So, when you decide to revoke the "move motor left at high speed" authorization from a Handle that previously *had* that authorization, *you* and the Handler know what this means. The kernel doesn't care! If, tomorrow, you decided to implement a "reduce motor operating current until full stall" authorization, so be it. Kernel never changes. None of the other "tasks" change. Just users of that IDL (and, specifically, this new method added to it)
>> It also says nothing about *when* the revocation takes place >> (effectively) and when notification of that event occurs. > > Yes. But as you said to someone else, every program must deal with > the possibility of permission being denied. Under those > circumstances, notification can be deferred until attempted use.
I'm trying to find a middle ground. I don't want a Holder to have "poll" to see if an authorization is still valid (or, that even the *object* to which that authorization applied still exists!). Nor do I want to prenotify before revoking authorizations (or deleting objects or unwiring connections or...). I figured the best compromise (noun: a situation where EVERYONE gets screwed) is to allow asynchronous revocation but provide a notification ex post factum. I.e., if they haven't *yet* tried to exercise the authorization, they get notified. If they are in the process of using it, they may or may not succeed (depends on how the race is won). And, if they don't *care*, they can ignore the notification and wait until they try to ues the authorization, later! <shrug> It *seems* like the most bang for the least buck.
> System-wide synchronous revocation is impractical, but revocation can > be done asynchronously if master capabilities are versioned and > derived capabilities indicate which version of the master was in force > when they were issued.
No need for versioning. Handles are unique -- not "reused" (until all references to it are known to be gone). As they can't be duplicated (without kernels involvement), it knows when it is safe to reuse a stale Handle. (a task can *try* to hold onto it but the kernel that serves that task *knows* it doesn't exist anymore. "File descriptor 27 is no longer attached to a file -- regardless of what you may *think*!"
>> I.e., in Amoeba's case, the kernel never knows who is holding which >> (copies!) of a particular ticket (derived from some other ticket, etc.). >> So, there is no wy for it to know who to notify AT THE TIME OF >> REVOCATION. Instead, it has to rely on the Holder(s) noticing that >> fact when they *eventually* try to use their capabilities >> (tickets/keys). > > So? In your system host kernel's exchange capabilities and proxy for > one another. How are you going to notify a host that's powered down?
The tasks running on that host (whose Handles are held *in* that host!) are dead. They can't access anything even if they wanted to! The handles in *other* hosts that reference objects *backed* by tasks in that host are told that the other end has come unplugged. So, all of *those* Handles cease to exist (and they are notified). If tasks on the down host referenced objects on these "up" hosts, the Handlers for each of those objects are told that the connection is broken and they need no longer expect requests on those Handles. The problem is more one of *recovery* after the fact. How do you rebuild these connections? I currently have no notion of persistence in the system. Once it goes down, it reboots from scratch -- anything in progress is lost (unless the agents doing the work deliberately elected to create persistent objects from which they could resume operations)
>> And, you are never sure when every ticket has been "discovered" to be >> voided -- a task can have a copy of a ticket (you can hold multiple >> copies of any ticket!) that he just hasn't got around to trying! >> >> Sort of like finding a bunch of keys in a desk drawer and not discarding >> them because you're not quite sure you *want* to discard them *maybe >> they still FIT something!) > > The analogy is semi-flawed: capabilities shouldn't be thought of as > student key cards that open some subset of the doors on campus. > > Properly a capability opens only one lock [i.e. addresses one object].
Yes. "Set of keys" implies "set of locks". If keys can be freely copied, there is no way to know where every copy resides. No way to *notify* the holder that a particular key no longer works: "The lock has been changed"
> A rejected capability is known to be useless, so there's no point to > keeping it.
Assumes you have *tried* the Handle and discovered it to be useless. Or, been notified (see above) that it has been revoked (rendered useless). My point was that a set of 64 (or 256) bit values in memory tells you nothing about whether you should keep them -- or not. You'd have to go around "trying your keys" to see which ones are worth keeping. Mch like finding a set of keys in a desk drawer: you try them on every lock you can think of. The ones that work, you set aside. The ones that don't, you decide if they are worth discarding (Hmmm... are there any locks I have forgotten to test??) OTOH, if you don't want to test them (now), the only "safe bet" is to hold onto them -- just in case!
> The "one lock" principle is applicable to replicated services: every > instance of a particular service should answer to the same set of > capabilities. > > Obviously a capability system *can* provide key card functionality, > but you need to look at the situation in the opposite way: i.e. the > student's key card doesn't open a group of locks, but rather a group > of locks share capability to admit the card.
The kernel doesn't care about this. It's up to the Handler for the objects in question to make his implementation choice. E.g., two Handles (in the same or different tasks) can map onto the same object. A Handle can map onto multiple objects -- if a proxy handling the Handle acts on your behalf ("The phone only rings in one location. If you want ot be able to call two people, you need two phone numbers and the ability to dial both/either). Two file descriptors in different (or same) process can reference the same file. If you want to reference *two* files, you need to have a proxy that knows how to interpret your request(s) for each file (said proxy having two file descriptors). Or, do it yourself as two fd's.
> Semantics ... but important semantics. > >> In my case, kernels are the only things that *hold* capabilities. >> So, all kernels can be notified that a particular capability has been >> revoked and they all *are* revoked. Just like if your kernel >> chooses to delete a file descriptor (remembering that it is now >> a zombie), any future references by you (the task) to that fd can >> throw an error assuming you ignore the signal sent to notify you >> that it has been destroyed). > > But hosts may be offline: powered down or network partitioned. How > long do you keep the "expiration" of a capability? That just clutters > up your store.
When host comes back up, local Handle doesn't exist. Memory is empty. Local kernel has no knowledge of what happened before the lights went out. If you are incommunicado for "too long" (whatever that means), others come to the conclusion that you are powered off. Anything "wired" into you is invalidated. Come back on-line and *claim* you've been running all this time regardless of how it looks? "Gee, that's too bad. We though you had moved out and sold all your stuff..."
> At some point, you have to accept that a remote host may try to use a > capability the resource's host no longer honors. > >> Yes. My "factory" publishes Handles for key services that tasks may >> want to avail themselves of. These are accessed by a single "Service >> Locator" Handle that is given to each task (task == process == resource >> container) as the task is created. [Conceivably, the Handle for this >> service given to Task A can differ from Task B if the authorizations >> between A and B are to be different!]. >> : >> The task can then contact the Handler behind that Handle -- i.e., the >> service in question -- and make whatever requests it is authorized >> to make (based on its Handle). > > But who decides what permissions A and B have wrt the service?
How do you decide that task A should be able to turn the motor on but not task B? You MAKE THAT DECISION and then you put it in the code. Unless the code gets rewritten (or bug), B simply never thinks about talking to the motor.
>> More importantly, the creating task can do all of this for the "child" >> cramming the appropriate Handles for the Objects (incl Services) that >> the child will need AND THEN DELETING THAT INSTANCE OF THE SERVICE >> LOCATOR handle to effectively sandbox the child. I.e., these are >> the resources you can use and operate on -- nothing more! > > That's a nice feature. Amoeba didn't have this, but other capability > systems did.
I think it is important for things like init -- to be able to go away (free up its resources AND IT'S UTMOST PRIVILEGE LEVELS!)
>> If I were to tag Handles with "rightful owners", then proxies would >> be more apparent. But, how do you validate a proxy's request for a >> Handle on behalf of another? ("Please give me Bob's door keys...") > > Again, this is a scenario of replicated service: local proxies should > be considered an instance of the remote service. The user's > capability to access the service lets it access the proxy. The proxy > itself should have a separate capability to access the remote service > so that the chain of trust remains valid.
Exactly. A on host 1 doesn't talk to the Handle for B on host 2. A, instead, talks to a proxy on host 1. The kernels have conspired to wire this proxy to another proxy (actually, a part of the remote kernel) on host 2 that, in turn connects to B. So, when host 2 dies, the proxy on host 1 sees that (because the kernel on 1 loses contact with kernel 2 -- anything that is "wired" to that remote kernel is now notified of the failure. That in turn is propagated up to A, et al. Never instantaneous. But, anything "in the works" when the host goes down fails to see a completion code so knows it has been unceremoniously aborted "in progress". (see why I htink async notifications ex post factum are the only realistic solutions?) Now, to see if news server bellyaches about length of this post... <cringe>