Managing "capabilities" for security| page 3

Reply by Don Y ●November 5, 20132013-11-05

Hi Tom,

On 11/5/2013 4:03 PM, Tom Gardner wrote:
> On 05/11/13 22:19, Don Y wrote:
>>> The new processor architecture will, it is claimed,
>>> work well with existing code, with roughly an order
>>> of magnitude speedup. They've managed to get DSP
>>> performance!
>>
>> I can't see how this speedup is a consequence of the capabilities
>> themselves -- "with existing code".
>
> Correct, it isn't. CAP is a topic that came up as
> part of the non-objectives of the new architecture.

Ah, OK.

> Example of just how different the architecture is:
> it doesn't have registers and isn't a stack machine.
> Internal micro-op work with a use-it-or-lose-it
> "belt" where an "register" address is of the form
> "the fifth to last aritnmetic result".

Some of the old Burroughs machines were wonky like that.
Makes you wonder why those mechanisms disappeared over time...
(illusion that increased speed renders cleverness less important?)

Reply by Tom Gardner ●November 5, 20132013-11-05

On 05/11/13 23:31, Don Y wrote:
> Hi Tom,
>
> On 11/5/2013 4:03 PM, Tom Gardner wrote:
>> On 05/11/13 22:19, Don Y wrote:
>>>> The new processor architecture will, it is claimed,
>>>> work well with existing code, with roughly an order
>>>> of magnitude speedup. They've managed to get DSP
>>>> performance!
>>>
>>> I can't see how this speedup is a consequence of the capabilities
>>> themselves -- "with existing code".
>>
>> Correct, it isn't. CAP is a topic that came up as
>> part of the non-objectives of the new architecture.
>
> Ah, OK.
>
>> Example of just how different the architecture is:
>> it doesn't have registers and isn't a stack machine.
>> Internal micro-op work with a use-it-or-lose-it
>> "belt" where an "register" address is of the form
>> "the fifth to last aritnmetic result".
>
> Some of the old Burroughs machines were wonky like that.
> Makes you wonder why those mechanisms disappeared over time...
> (illusion that increased speed renders cleverness less important?)

Ivan Godard wrote the Burroughs DCAlgol compiler :)

Have a look at http://ootbcomp.com/docs/index.html

"The Mill is a new CPU architecture designed for very high single-thread performance within a very small power envelope. It achieves DSP-like power/performance on general purpose codes, without 
reprogramming. The Mill is a wide-issue, statically scheduled design with exposed pipeline. High-end Mills can decode, issue, and execute over thirty MIMD operations per cycle, sustained. The pipeline 
is very short, with a mispredict penalty of only four cycles."

Reply by Don Y ●November 5, 20132013-11-05

Hi Tom,

On 11/5/2013 6:09 PM, Tom Gardner wrote:
> On 05/11/13 23:31, Don Y wrote:

>>> Example of just how different the architecture is:
>>> it doesn't have registers and isn't a stack machine.
>>> Internal micro-op work with a use-it-or-lose-it
>>> "belt" where an "register" address is of the form
>>> "the fifth to last aritnmetic result".
>>
>> Some of the old Burroughs machines were wonky like that.
>> Makes you wonder why those mechanisms disappeared over time...
>> (illusion that increased speed renders cleverness less important?)
>
> Ivan Godard wrote the Burroughs DCAlgol compiler :)

OK, then his mind is already "sufficiently warped" in this regard.

> Have a look at http://ootbcomp.com/docs/index.html
>
> "The Mill is a new CPU architecture designed for very high single-thread
> performance within a very small power envelope. It achieves DSP-like
> power/performance on general purpose codes, without reprogramming. The
> Mill is a wide-issue, statically scheduled design with exposed pipeline.
> High-end Mills can decode, issue, and execute over thirty MIMD
> operations per cycle, sustained. The pipeline is very short, with a
> mispredict penalty of only four cycles."

Ah, that explains all the posts (that I "killed" :< ) mentioning
"Mills"!  :<  I  will have to see how to "unkill" them in Tbird...

Reply by Richard Damon ●November 6, 20132013-11-06

On 11/5/13, 3:58 PM, Don Y wrote:
> Hi Richard,
> 
> [attrs elided]
> 
> On 11/4/2013 9:45 PM, Richard Damon wrote:
>> All questions to be decided at design phase, with no "generic answer".
>> Presumably, if there is a deadline for when the acknowledgement can be
>> given, then presumably this spec is applied when designing such a real
>> time system.
> 
> But that's the problem.  When is the design phase "over" for an open
> system?  Someone (third party) adds a "feature" a year after product
> release.  Does he get to claim the design phase extended to a
> period MONTHS after "initial release" -- because that was when *he*
> was working on the design of *his* feature?
> 

Hopefully the design phase for the first release is over before the
first release is released! (How else can you see if it is correct?)

And, yes, the third party can, if he wishes, open back up the spec and
change it, but then HE takes on the responsibility to verify that all
previous code can work with the new spec, and if he can't then he can't
change the spec! Generally later modifications want to be backwards
compatible to avoid this problem.

> [of course not]
> 
> At some point, you say, "this is the environment for which you have to
> design".  Every mechanism that you make available is a mechanism that
> has to be maintained and utilized.  And, also acts as a *constraint*
> on the system and its evolution:  "Crap!  I have to notify each Holder
> of a pending capability revocation 100ms before revocation.  But, my
> satellite transmission path is twice that!  I guess I just can't use
> satellites (or, can't revoke capabilities)"
> 
Actually, in this case there isn't really a problem for the revoker. If
the spec is that I need to give the Holder 100ms to reply, and it takes
200 ms to send (and presumably receive) a message, then I just need to
send the notice 500ms before I actually revoke the privilege, that will
give the Holder the required time to respond.

And yes, if a system is built on certain assumptions, trying to move to
a less capable environment often requires looking at lots of the system.
What you really want to do is think when you first make the assumptions
as to what you really need as assumptions, and what you don't need to
assume. In this case, I would likely have made the time allowed to
notify as something configurable/negotiated, and if the grantor really
needs that 100ms, then yes, you can't use the satellites, but if it
doesn't then perhaps the grantor can be told that the link is slow so it
needs to be patient.

> E.g., I handle physical resource revocation asynchronously BECAUSE
> I HAVE NO CONTROL OVER EXTERNAL EVENTS.  If I wrap the resources
> in a capability, now I suddenly have to provide different semantics?
> ("Hey, you can't revoke the 'sunlight' capability!")

First, I never said that all revocations need to have notification! It
sounds like in your case here, because there is a real chance of
resources going away asynchronously from external causes, asynchronously
removing permissions should not cause significant issues. My comment was
just that this is not always the case, so there are some situations
where asynchronous revocation is not the right way to do things.
> 
>>> So, as you acknowledge below, your app design must be able to handle
>>> this case -- which is essentially the asynchronous case.
>>>
>>> I currently manage *physical* resources asynchronously (though with
>>> notification after the fact) -- because they *can* disappear even
>>> without my explicit control (e.g., power failure, drop in water
>>> pressure, etc.).  So, this same sort of reaasoning would at least
>>> be *consistent*.
>>>
>>> I.e., do an operation and *check* to see if it completed as expected
>>> (just like checking return value of malloc).
>>
>> Some operations do not make checking at each operation so easy.
> 
> Life isn't guaranteed to be easy!  :>
> 
>> What if
>> the resource is access to some memory, do you check for an "error" after
>> every access? This presumes that the system even gives you an
>> application level ability to continue pass this sort of error. What do
>> you do about cooperative "authorization" to access parts of structures
>> for things like synchronization where there isn't a hardware/OS
>> capability to stop you?
> 
> If "backing store" could go away while it was being used, then
> your "system" would obviously need a way of detecting that and
> informing the "holder" of that resource that this has, in fact,
> happened.  The holder would also ned to be aware of what resources
> could "disappear" and code to accommodate those possibilities.
> 
> If I am driving a motor, power to the motor driver/translator
> could fail while I am in the middle of an operation.  Even if I
> have a backup power supply, the motor driver itself could fail.
> Even if I have a redundant motor driver, the *motor* could
> fail.  Or, a gearbox, mechanism, *sensor*, etc.
> 
> Shit Happens.
> 
> If you don't plan to accommodate the (likely/consequential)
> failures, you have a bug.
> 
I wasn't talking about the memory physically going away, but some
process t first granting another process the right to access some chunk
of memory and then suddenly and without warning revoking that permission
and removing the access rights. Since the normal result of this would be
aborting that process, this can be very bad. The only way that process
can reliably operate would be to use some form of operation that
atomically checks the rights, and does the access and returns an error
flag that needs to be tested. This will very likely greatly slow down
the process defending itself from privilege revocation just because the
grantor is unwilling to first send notice and wait a reasonable time
before actually revoking the right.

>> In your case, since the operation do have the
>> capability of suddenly starting to fail, an asynchronous revocation
>> likely doesn't cause problems that you didn't need to handle anyway, as
>> long as the system structure to allow it.
> 
> That's the point!  You (developer) know shit CAN happen.  Anything that
> you are "holding" can be revoked.  Plan on it.  (Heck, I can "kill -9"
> *you* without giving any advanced warning!  Gee, *then* what?)

Many things are very unlikely to "just happen" at random. Presumably the
grantor of the privilege is doing so because there is a reason to grant
the privilege. It doesn't make sense to burden the process being granted
the privilege with unneeded problems.

> 
>>>> Yes, sometimes just doing an asynchronous revocation may make sense,
>>>> and
>>>> in many cases having it as a fall back if the cooperation method fails
>>>> to complete in a needed time is needed, but that doesn't mean that
>>>> asynchronous is generally preferred.
>>>>
>>>> As to the transitively granting, the same method could be used to relay
>>>> the request to revoke.
>>>
>>> This is a tougher call (though I think I have a solution that addresses
>>> these issues).  Who does the relaying?  The actor who delegated
>>> the capability?  (what if he is now a zombie?)  Or, does the kernel
>>> track "derived capabilities" and treat them as part of the original
>>> capability?
>>
>> I would generally say that the actor who was given a permission is
>> responsible for relaying the revocations to those it relayed to. If it
>> has shared a right that it might have revoked from it, it needs to
>> maintain a way to do that.
> 
> The actor may be gone!  BY DESIGN!  I.e., he has done <whatever> *he*
> needed to do (with "greater privilege") and is now leaving *you* to
> clean up (with some reduced capability).

Generally if you grant a privilege to an actor, and it is subject to a
revocation request, they will reply back that they are done with the
privilege (a "self revoke"), perhaps because there may be a limit to how
many people this privilege will be given to at a time. You also can
learn that they aren't there anymore when you signal them that you are
preparing to revoke.

> 
> E.g., he can turn motor on, set direction and turn off.  He starts
> motor in right direction, then delegates the "off" capability to you
> (your role being to watch a limit switch and turn off the motor at
> that time -- or, when some timeout is exceeded) and exits.  (no need
> for him to hang around consuming ALL the resources that he originally
> needed to determine how the motor should be operated)
> 
> However, since my capabilities reside in the kernel, I can opt to
> have the kernel track derivations and cascade revocations.  But, this
> means all derived capabilities must come from a single "parent"
> 
>>> As I began my original post:
>>>    "... i.e., how best to differentiate the examples where
>>>    X should be allowed vs X should be prohibited."
>>> you can come up with examples where /each/ approach is "right"
>>> and the others *wrong*.  :<
>>>
>>> Engineering:  finding the least wrong solution to a problem.
>>>
>>> <frown>   But, at least its interesting!  :>
>>
>> This is why I object to the statement that it SHOULD ALWAYS be
>> asynchronous. The only real answer is that "it depends", and lists can
>> be made of what it depends on. Some examples include:
>>
>> If the authorization even remotely revokable? (Sometimes it isn't)
> 
> You obviously can't revoke authorization for a fait accompli.
> But, what other authorizations, once granted, can't be rescinded?
> Some may leave you in a predicament (e.g., never being able
> to turn off the power) but expecting the capability system to
> know about these sorts of dependencies is, I think, too much.

You normally can't revoke access to a file once the other process has
opened it.

Many times privilege is managed, not by force of kernel, but by
cooperation of the actors (this presumes that the system can be assumed
free of hostile actors). Actors ask for permission, not because they
could do the operation without it, but because the permission is needed
to do it correctly.

Of course there are catastrophic conditions like loss of power where the
crashing of a given task is minor compared to other effects that are
happening, and many normal promises aren't going to be met, hopefully
the emergency recovery system will work to minimize the damage.
> 
>> What is the effect on the requesting task if the authorization goes away
>> unexpectedly?
> 
> The designer of the holding task would have to consider that in how
> the tasks actions and recoveries are structured.  What would it have
> done had the authorization not been granted in the first place?
> 
If the holder really needed the permission, then it would have waited
until it got it. Many operations get MUCH more complicated to have to
worry continuously about every possible failure mode. To casually
convert and error condition that normally would be indicative of a major
hardware failure (and thus major software failure isn't unreasonable) to
something that really might happen and should be dealt with REALLY make
programs much hard to right correctly and even harder to test to make
sure they are correct. All this because the designer figures it is ok to
define that authorization carries no promises that it will continue?

>> What is the effect of delaying the revocation?
> 
> The big problem with "being considerate" is that it encourages
> others to be exploitative.  There is no downside to their
> "selfishness" so, "why not?"!  "Heads, I win; tails, you lose"
> 
I pity your team if this is how you think of them. First, you should
only be granting permission for things that you are will to give it for.

If the system is theirs, they have the right to be greedy, and if it
causes problems, it is their problems.

If the system is yours, why are you giving them permission in the first
place, if they aren't giving you the value you want, then kill them.

If they are paying for the access, make sure you charge them for their
usage (and shame on you for not requiring them to meet the design
requirements for their pieces).

> OTOH, if you take a heavy-handed approach (unilaterally revoking
> capabilities) then sloppy coders pay a price -- by having thier code
> *crash* (presumably, users will then opt to avoid applications from
> those "developers")
> 
> [There's no other pressure I can bring to bear on them to "do the
> right thing"]


Then you also should consider that you are making your "friends" bear
much higher costs to do what you want them to.

Reply by Don Y ●November 6, 20132013-11-06

Hi Richard,

On 11/5/2013 9:58 PM, Richard Damon wrote:
> On 11/5/13, 3:58 PM, Don Y wrote:
>> On 11/4/2013 9:45 PM, Richard Damon wrote:
>>> All questions to be decided at design phase, with no "generic answer".
>>> Presumably, if there is a deadline for when the acknowledgement can be
>>> given, then presumably this spec is applied when designing such a real
>>> time system.
>>
>> But that's the problem.  When is the design phase "over" for an open
>> system?  Someone (third party) adds a "feature" a year after product
>> release.  Does he get to claim the design phase extended to a
>> period MONTHS after "initial release" -- because that was when *he*
>> was working on the design of *his* feature?
>
> Hopefully the design phase for the first release is over before the
> first release is released! (How else can you see if it is correct?)

That was my point!  The design phase *is* done (from your standpoint)
before the third party starts adding that new feature.  From *his*
standpoint, he would like to think the system can accommodate *his*
goals, as well.  *He* wants the design phase to overlap *his*
activities so he has a "say".  (sorry, too late.  sooner or later,
you've got to "shoot the engineer")

> And, yes, the third party can, if he wishes, open back up the spec and
> change it, but then HE takes on the responsibility to verify that all
> previous code can work with the new spec, and if he can't then he can't
> change the spec! Generally later modifications want to be backwards
> compatible to avoid this problem.

In reality, that's not practical.  Will Apple let you revise iOS
to suit your needs?  Or, will they say, "sorry, too late"?

So, you want to address *likely* needs without dragging in the
kitchen sink (only to discover that no one *uses* the sink
anyways!)

>> At some point, you say, "this is the environment for which you have to
>> design".  Every mechanism that you make available is a mechanism that
>> has to be maintained and utilized.  And, also acts as a *constraint*
>> on the system and its evolution:  "Crap!  I have to notify each Holder
>> of a pending capability revocation 100ms before revocation.  But, my
>> satellite transmission path is twice that!  I guess I just can't use
>> satellites (or, can't revoke capabilities)"
>
> Actually, in this case there isn't really a problem for the revoker. If
> the spec is that I need to give the Holder 100ms to reply, and it takes
> 200 ms to send (and presumably receive) a message, then I just need to
> send the notice 500ms before I actually revoke the privilege, that will
> give the Holder the required time to respond.

That assumes *you* can delay when you revoke it!  Or, can tell in
advance when you will need to so you can give the early warning.
It also assumes you *know* that the transport delay will be as
long as it happens to be -- which you might not be aware of until
after the message is actually sent (the route a network message
takes can vary over time based on availability, bandwidth, error
conditions, etc.)

> And yes, if a system is built on certain assumptions, trying to move to
> a less capable environment often requires looking at lots of the system.
> What you really want to do is think when you first make the assumptions
> as to what you really need as assumptions, and what you don't need to

Exactly.  What can you (reasonably) *expect* your users/developers
to accommodate -- given the other design criteria that you have
to address.

> assume. In this case, I would likely have made the time allowed to
> notify as something configurable/negotiated, and if the grantor really
> needs that 100ms, then yes, you can't use the satellites, but if it
> doesn't then perhaps the grantor can be told that the link is slow so it
> needs to be patient.

But all this adds complexity.  And, at the end of the day, the
holder will still have to be able to deal with the case where
his authorizations "don't work" (e.g., what if the object is
deleted??)

See what I'm saying?  If you're going to have to deal with this
possibility anyway, then why complicate things with other mechanisms
that might not work?

Mach has a concept called a "port".  It's a communication mechanism
that is probably the singlemost important item/concept in the design.
I.e., it's not some "afterthought" shoehorned in at the 11th hour.

Mach includes a provision whereby you can request a given port be
"renamed".  Sort of like saying, "I want file descriptor 2 to
hereafter be called 73".  There are some potential advantages to
allowing a client to have such a change made on its behalf.

But, the request isn't guaranteed to be handled.  It may simply not
be possible -- today.  As a result, the client has to be able to
live with the port having its original "name" (presumably, there
was a reason the client did NOT want to do this!).  So, all of the
code that is in place to exploit accessing *renamed* ports sits
idle -- and the code to handle unrenamed ports remains in play.

THEN WHY IS THIS FACILITY IMPLEMENTED?

>> E.g., I handle physical resource revocation asynchronously BECAUSE
>> I HAVE NO CONTROL OVER EXTERNAL EVENTS.  If I wrap the resources
>> in a capability, now I suddenly have to provide different semantics?
>> ("Hey, you can't revoke the 'sunlight' capability!")
>
> First, I never said that all revocations need to have notification! It
> sounds like in your case here, because there is a real chance of
> resources going away asynchronously from external causes, asynchronously
> removing permissions should not cause significant issues. My comment was
> just that this is not always the case, so there are some situations
> where asynchronous revocation is not the right way to do things.

There's never a perfect fit for all cases.  But, if you try to include
provisions that make *every* case "easy", you end up with a more
complex system/implementation.  And, questionable "returns".

E.g., the Mach system call is a privileged operation.  It's more
expensive to implement things "in the kernel" (kernel gets bigger,
mistakes can have dramatic consequences, etc.).  If you can't
be guaranteed that it will be usable, why go to this effort?

>>> What if
>>> the resource is access to some memory, do you check for an "error" after
>>> every access? This presumes that the system even gives you an
>>> application level ability to continue pass this sort of error. What do
>>> you do about cooperative "authorization" to access parts of structures
>>> for things like synchronization where there isn't a hardware/OS
>>> capability to stop you?
>>
>> If "backing store" could go away while it was being used, then
>> your "system" would obviously need a way of detecting that and
>> informing the "holder" of that resource that this has, in fact,
>> happened.  The holder would also ned to be aware of what resources
>> could "disappear" and code to accommodate those possibilities.
>>
>> If I am driving a motor, power to the motor driver/translator
>> could fail while I am in the middle of an operation.  Even if I
>> have a backup power supply, the motor driver itself could fail.
>> Even if I have a redundant motor driver, the *motor* could
>> fail.  Or, a gearbox, mechanism, *sensor*, etc.

> I wasn't talking about the memory physically going away, but some
> process t first granting another process the right to access some chunk
> of memory and then suddenly and without warning revoking that permission
> and removing the access rights. Since the normal result of this would be
> aborting that process, this can be very bad.

Why does that have to be the "normal result"?  The consumer could
check to see if his operation "succeeded" at the end of his use
of that region.  Or not.  Then, roll back whatever part of his
activities has (may have been) compromised in the process.

If you are dealing with something as basic as memory, then you
would presumably have hardware supporting memory objects.

If it's just a mutex governing a shared object, that's below the
granularity of what I am discussing.  How long you hold a lock
isn't the same issue.

If, OTOH, I opt to protect all of *my* files and deny you access
to them (filesystem analogy), I should be able to enforce this
restriction even if you currently have several of them "open"
(i.e., the permission need not ONLY be implemented "on open()"
but on *any* reference to the object).  I could, conceivably,
remove the *image* of each open file that you are actually
operating on at the time (e.g., un-mmap() them)

> The only way that process
> can reliably operate would be to use some form of operation that
> atomically checks the rights, and does the access and returns an error
> flag that needs to be tested. This will very likely greatly slow down
> the process defending itself from privilege revocation just because the
> grantor is unwilling to first send notice and wait a reasonable time
> before actually revoking the right.

Why does it have to "check the rights"?  Just *do* what you intended
to do.  Your request will either succeed (which implicitly tells you
that you *held* a valid capability and that the capability was still
valid while your request was being processed) or fail (which tells
you that you either had a bad capability *or* that it was revoked
before your request could be completed).

[Remember, you have to present the capability in order to perform
*any* operation on the object.  Everything is mediated by the
object's "Handler"]

>>> In your case, since the operation do have the
>>> capability of suddenly starting to fail, an asynchronous revocation
>>> likely doesn't cause problems that you didn't need to handle anyway, as
>>> long as the system structure to allow it.
>>
>> That's the point!  You (developer) know shit CAN happen.  Anything that
>> you are "holding" can be revoked.  Plan on it.  (Heck, I can "kill -9"
>> *you* without giving any advanced warning!  Gee, *then* what?)
>
> Many things are very unlikely to "just happen" at random. Presumably the
> grantor of the privilege is doing so because there is a reason to grant
> the privilege. It doesn't make sense to burden the process being granted
> the privilege with unneeded problems.

But a privilege (capability) can be granted *hours* or days before
it is ever used!  There's no "freshness seal" imposed on capabilities.
Surely you don't want the holder to have to periodically check to
see that the capability is "still valid".

Likewise, if you force a client to defer requesting a capability
until just before it is needed, then the client risks a delay in
beginning his activity as the capabilities are negotiated, etc.

Often, a task is spawned with the knowledge of what it is intended
to do.  It makes sense to endow it with the capabilities that it
is going to eventually need when it is created -- instead of having
it remain connected to its parent *just* so it can request those
capabilities when they are needed.

>>> I would generally say that the actor who was given a permission is
>>> responsible for relaying the revocations to those it relayed to. If it
>>> has shared a right that it might have revoked from it, it needs to
>>> maintain a way to do that.
>>
>> The actor may be gone!  BY DESIGN!  I.e., he has done <whatever> *he*
>> needed to do (with "greater privilege") and is now leaving *you* to
>> clean up (with some reduced capability).
>
> Generally if you grant a privilege to an actor, and it is subject to a
> revocation request, they will reply back that they are done with the
> privilege (a "self revoke"), perhaps because there may be a limit to how
> many people this privilege will be given to at a time. You also can
> learn that they aren't there anymore when you signal them that you are
> preparing to revoke.

That's not an assumption you can make.  You are assuming a parent
always hangs around to watch its children die.  It might, instead,
create its offspring and *then* die -- knowing they have the tools
that they need to perform their tasks (what other role does the
parent have -- cheerleader?)

>>> This is why I object to the statement that it SHOULD ALWAYS be
>>> asynchronous. The only real answer is that "it depends", and lists can
>>> be made of what it depends on. Some examples include:
>>>
>>> If the authorization even remotely revokable? (Sometimes it isn't)
>>
>> You obviously can't revoke authorization for a fait accompli.
>> But, what other authorizations, once granted, can't be rescinded?
>> Some may leave you in a predicament (e.g., never being able
>> to turn off the power) but expecting the capability system to
>> know about these sorts of dependencies is, I think, too much.
>
> You normally can't revoke access to a file once the other process has
> opened it.

In the systems *you* may be familiar with.  That doesn't mean its a
*rule*!  (see above example).  As all accesses to the object (file)
have to involve the capability/ticket/key/Handle, I can choose
to not let you read/write another sector/byte.  If I maintain the
backing store for the file system, I can opt to replace that page
with a page full of 0x00.  The *file server* defines the contract
for the files that it handles.  Clients use its services with this
in mind.

> Many times privilege is managed, not by force of kernel, but by
> cooperation of the actors (this presumes that the system can be assumed
> free of hostile actors). Actors ask for permission, not because they
> could do the operation without it, but because the permission is needed
> to do it correctly.

I'm using capabilities for the express purpose of preventing
rogue/malfunctioning actors from "doing things they shouldn't".
That includes "doing things they have been TRICKED into doing".

See my email_address_t example.

> Of course there are catastrophic conditions like loss of power where the
> crashing of a given task is minor compared to other effects that are
> happening, and many normal promises aren't going to be met, hopefully
> the emergency recovery system will work to minimize the damage.
>
>>> What is the effect on the requesting task if the authorization goes away
>>> unexpectedly?
>>
>> The designer of the holding task would have to consider that in how
>> the tasks actions and recoveries are structured.  What would it have
>> done had the authorization not been granted in the first place?
>
> If the holder really needed the permission, then it would have waited
> until it got it. Many operations get MUCH more complicated to have to
> worry continuously about every possible failure mode. To casually
> convert and error condition that normally would be indicative of a major
> hardware failure (and thus major software failure isn't unreasonable) to
> something that really might happen and should be dealt with REALLY make
> programs much hard to right correctly and even harder to test to make
> sure they are correct. All this because the designer figures it is ok to
> define that authorization carries no promises that it will continue?

Let the holder wait until he needs it.  He asks.  And is told "no".
Now what?

He is told *yes*, presents the capability for his first access to
the resource.  All is well.  A moment later, presents same capability
for second access and the request *fails*.  (capability revoked;
resource deleted; service unavailable; etc.)  Now what?

You have to expect these sorts of failures -- especially in complex
systems.  "Network is down; try again later"

>>> What is the effect of delaying the revocation?
>>
>> The big problem with "being considerate" is that it encourages
>> others to be exploitative.  There is no downside to their
>> "selfishness" so, "why not?"!  "Heads, I win; tails, you lose"
>
> I pity your team if this is how you think of them. First, you should
> only be granting permission for things that you are will to give it for.

Not my "team".  Rather, folks who will maintain this after me.
If there is an easy and a right way to do things, I'm willing
to bet "easy" is going to win out.  And, all it has to do is
win out *once* and it will invariably have consequences that
make lots of other "right" decisions harder.  It's *really* hard
going into an existing "mess" after-the-fact and trying to fix it...
especially when you've been tasked with doing something else,
entirely!

My goal is to make the "easy" way the *right* way.

> If the system is theirs, they have the right to be greedy, and if it
> causes problems, it is their problems.

No!  It's not *theirs* any more than the systems you design belong to
*you*!

> If the system is yours, why are you giving them permission in the first
> place, if they aren't giving you the value you want, then kill them.

Shooting people is frowned upon in the FOSS world -- just because
someone's code isn't up to snuff.  :>

> If they are paying for the access, make sure you charge them for their
> usage (and shame on you for not requiring them to meet the design
> requirements for their pieces).
>
>> OTOH, if you take a heavy-handed approach (unilaterally revoking
>> capabilities) then sloppy coders pay a price -- by having thier code
>> *crash* (presumably, users will then opt to avoid applications from
>> those "developers")
>>
>> [There's no other pressure I can bring to bear on them to "do the
>> right thing"]
>
> Then you also should consider that you are making your "friends" bear
> much higher costs to do what you want them to.

"Friends" can see having *two* ways of doing something as "extra
work":  "Oh, great!  Now I have to code for the early notification
case *and* the asynchronous revocation case..."

You can't force people to be good designers.  But, you can put tools
in place that make it a lot more likely that you get the results
that you want.

SWMBO tracked construction expenses at a large local hospital.
"The Guys" (construction/maintenance staff) would complain that
when they needed "a few things", they had to fill out a lot of
paperwork which took a lot of time:  "The bathroom is flooded
*now*!  We can't wait for purchasing to approve the supplies
to repair the leak!"

So, they created a policy whereby they could use credit cards
issued in the hospital's name.

Suddenly, there are no more formal purchase orders -- even for
the long term, *big* projects!  And, everyone has thousands of
dollars on their credit cards each month.  Needless to say,
the folks in purchasing are pissed -- cuz they have been cut out
of the loop; accounting is pissed because they have no control
over the monies; management is pissed because they have no idea
what sort of progress and budgetary constraints have been applied.
The only guys who are happy are The Guys (construction/maintenance).

Hmmm... can't disallow the credit cards as there will always be
"emergencies".  So, take their initial complaints and fold them
back on themselves:

"OK, you can sidestep the paperwork process and the delays that
are associated with it by using the credit cards.  However, you
have to file the paperwork *after* the purchase in order for
*your* card to be paid off -- forget to file, and your credit is
automatically turned off.  And, so that 'we' know why you chose to
use the expedited credit card approach instead of the normal
purchasing procedure, we need you to prepare an analysis of
the factors that went into this decision and include that with the
abovementioned paperwork.  Surely, that ALL makes sense, right?"

Suddenly, credit cards see a lot less usage (construction workers,
plumbers, electricians, etc.) aren't real keen on writing up
"reports".  Much easier to just fill out a purchase order for
"normal stuff" and let it go through normal channels!  The
"right" way is now the *easy* way!

Reply by Don Y ●November 6, 20132013-11-06

A followup...

On 11/6/2013 3:45 AM, Don Y wrote:

> Suddenly, credit cards see a lot less usage (construction workers,
> plumbers, electricians, etc.) aren't real keen on writing up
> "reports".  Much easier to just fill out a purchase order for
> "normal stuff" and let it go through normal channels!  The
> "right" way is now the *easy* way!

One of the goals of the automation system I am deploying here
is to address the needs of folks with (various) "disabilities"
(whatever that means).

So, the UI is very abstract.  Unlike most systems, it isn't
implicitly assumed to be "visual" (my preferred means of
interacting with it is aural -- so I can keep my eyes and
hands free to do other things!  I'd hate to have to set down
something I was carrying *just* so I could pick up a "display"
to ask for the lights in the room to be turned on!).

But, I expect others will write more (user specific) code
than I.  I've invested a lot in the infrastructure and core
services (along with hardware/firmware).  How do I "entice"
others to embrace this same "neutral UI" approach that I
have adopted?  I suspect their first inclination is to
"draw" some pretty control panel... then figure out how
they will resize it for different output devices, etc.

Along the way, support for other non-visual interfaces will
go away.  Because folks tend to focus on *their* needs first
(and often "move on" thereafter).  Will they "back fill" the
support for audio interfaces?  haptic ones?  etc.  (wanna
bet the answer is "I don't have the time... besides, the
visual one is really COOL looking!  I've even got flying
toasters in the background!!")

In my case, I don't provide *any* tools that make visual
displays easy to create.  Instead, the user interface is
just a set of available commands for any particular situation
that are presented in whatever output modality is selected.
I.e., for a visual display, this may just be a bunch of large
rectangular buttons with text legends.  For an aural display,
it may be a spoken menu.  etc.

I can't *prevent* someone from developing a fancy GUI.  But,
they'll then find that the rest of the system doesn't "fit"
into that.  So, they'll also have to reengineer the existing
UI's.  etc.

Easier to just follow the (code) templates that I create and
*know* that the system will present them to the user in whatever
form the user needs!

The "easy" way is the "right" way.  Exploit laziness.

Reply by Richard Damon ●November 7, 20132013-11-07

Don,
I started to do a point by point rebuttal, but realized that we were
losing the forest by classifying every tree.

My complaint was to your statement that the ONLY proper way to revoke a
permission is asynchronously. My position is that you can't make such a
statement and that you need to apply design to the situation, and that
in some conditions revocation should have a defined notification before
it is to be revoked.

Let me put a real world example, generally ones right to drive a vehicle
is a privilege granted by the government, and the government has the
power to revoke this privilege, but if it does so, there are
notification requirements so that you do know your privilege is being
revoked. This means that once your have gone through the procedures to
get the privilege to drive, you can safely do so, knowing that if for
some reason your privilege is revoked, you will be given sufficient
notice so that you don't get in trouble.

Imagine instead, that the government reserved the right to revoke your
privilege without notice (but did give you a way for you to check if
your right has been revoked), also check points were established at
random to check that you DO have current privilege to drive, and that
driving without privilege was a capital offense. Would you want to
drive? IF you did have to, I bet you would want to spend a lot of effort
checking that you haven't been revoked.

This is exactly like the case that can happen for some forms of
privilege, like access to shared memory, if this sort of access is to be
revoked asynchronously, it generally means that process doing it will be
aborted, or the process needs to not treat it as shared memory but use
some sort of kernel call to check the permission and do the access
atomically (instead of just accessing the memory).

I agree, that in SOME cases, the asynchronous revocation is a good
model, but not all. In most cases where a notification/cooperative
revocation system makes sense, for reliability concerns, a backup
asynchronous method make sense, to allow you to revoke a malfunctioning
process, but at that point, since it is already malfunctioning (since it
didn't complete the cooperative revocation method), the problems imposed
on the task are likely reasonable. This doesn't mean that the
cooperative method was worthless.

Also, non-backwards compatible specification changes ARE expensive. That
is just the way things work, at least if you want to be able to talk
about software having correctness. This does mean that you do want to
put some effort into defining your requirements, to put into them the
things you need to  verify/prove correctness, but not things that you
don't need that add unneeded future limitations.

Reply by Don Y ●November 7, 20132013-11-07

Hi Richard,

On 11/6/2013 10:43 PM, Richard Damon wrote:

> My complaint was to your statement that the ONLY proper way to revoke a
> permission is asynchronously.

Sorry, that wasn't my intent.  What I was trying to address was the
*practical* aspect of all this.

*I* have to create the mechanisms that will ultimately be used
throughout the system.  Run the thought experiment(s):

-Imagine I make a system that notifies, waits "some" time, then revokes.
-Imagine I make a system that just revokes -- and notifies after the
  fact.

I then tried to present possible scenarios for what *might* happen
in each case.  I.e., notification gets lost/delayed/ignored -- or
he was "blocked" while the notification came in.  In each case, how
does that affect the eventual actions of the "holder"?  Ans:  he
has to deal with NOT having the capability when he opts to use it.
I.e., he can't blindly *assume* it will "work".

So, he has to code for both cases:  that he received the notification
and is going to try to comply in an orderly fashion; and, that he
didn't have enough warning (or *any* warning) when the notification
arrived and has effectively *lost* the resource before/during its
intended use.

My *opinion* is that this extra complexity -- both "in the system"
and in the "applications" -- will end up wasted.  That to be
effective, it would require even *more* mechanism than we have
discussed (e.g., negotiating a "early warning" interval, deciding
how to handle the case when that interval can't be met, etc.).

Given that "holders" will have to tolerate the case of the capability
"going away", it seems easier to just handle that case and make
folks aware of it in the API.

Remember, these are "exceptional conditions".  You *expect* to
be able to hold a capability that you have requested and been
granted.  I'm just not willing to make that a *guarantee*.  So,
I need a way to "change my mind" -- BECAUSE I HAVE A GOOD REASON
FOR DOING SO, NOW (just like I can preempt your execution if I
have a good reason!).

> My position is that you can't make such a
> statement and that you need to apply design to the situation, and that
> in some conditions revocation should have a defined notification before
> it is to be revoked.

What if I can't *guarantee* that notification arrives sufficiently
early for you to do anything about it?  If you will be able to
cope with this (by implementing your algorithm differently -- even
if it requires a complete rollback), then why shouldn't I opt for
this as the normal behavior?

If you *insist* on this, then I may need other concessions from
you to ensure the level of performance is met.  E.g., maybe you
can only hold a capability for a fixed period of time -- that
way, *I* know all I have to do is wait and I get it back
automatically?  But, this complicates your work in other ways...

> Let me put a real world example, generally ones right to drive a vehicle
> is a privilege granted by the government, and the government has the
> power to revoke this privilege, but if it does so, there are
> notification requirements so that you do know your privilege is being
> revoked. This means that once your have gone through the procedures to
> get the privilege to drive, you can safely do so, knowing that if for
> some reason your privilege is revoked, you will be given sufficient
> notice so that you don't get in trouble.

Ah, but they will only *try* to give you notification!  If that
notification doesn't make it to you (you've moved, were out of
town, etc.) and you later encounter a police officer, you're in
the same situation as if you *had* been notified and chose to
ignore it.

[Sure, you could go to court and hope you get a rational judge
but it's not The State's responsibility to ensure you have been
notified -- only that they "made a concerted attempt".]

> Imagine instead, that the government reserved the right to revoke your
> privilege without notice (but did give you a way for you to check if
> your right has been revoked), also check points were established at
> random to check that you DO have current privilege to drive, and that
> driving without privilege was a capital offense. Would you want to
> drive? IF you did have to, I bet you would want to spend a lot of effort
> checking that you haven't been revoked.

But you're assuming there *is* some "really bad consequence".
What if you rarely drive?  What if there are few police officers
in the parts of town that you frequent?

What if you are approached by a cop while *walking* and he asks
for an ID.  He sees that your license is expired and confiscates
it.

Or, you go to cash a check at a bank and the bank officer does
this on behalf of The State?

I.e., any time you would *normally* use that credential you run
the risk of it NOT being honored -- even if you aren;t "punished"
for this (your "punishment" is not being allowed to USE it)

> This is exactly like the case that can happen for some forms of
> privilege, like access to shared memory, if this sort of access is to be
> revoked asynchronously, it generally means that process doing it will be
> aborted, or the process needs to not treat it as shared memory but use
> some sort of kernel call to check the permission and do the access
> atomically (instead of just accessing the memory).

Protect shared memory with a mutex.  Hold it as long as you want.
If I want to control that with a capability, I can wrap the
mutex access with the capability:  so, you can't *take* the
lock without permission but, once held, I can't interfere with
your holding the lock.

It's a capability.  I can make it "control" whatever I choose.
And, implement whatever *else* I choose to ensure that this
control makes sense.

E.g., if I revoke access to a piece of memory, I could opt to
*suspend* your process at the same time.  Then, make a copy of
the memory while someone else accesses it.  Then, restore the
original before resuming your process (and restoring your
capability).

I.e., you are *always* at the mercy of the kernel.  I just have to
ensure that I uphold any contracts that I have agreed to with you.
And vice versa (of course, if *you* cheat, I can bitch-slap you!  :> )

> I agree, that in SOME cases, the asynchronous revocation is a good
> model, but not all. In most cases where a notification/cooperative
> revocation system makes sense, for reliability concerns, a backup
> asynchronous method make sense, to allow you to revoke a malfunctioning
> process, but at that point, since it is already malfunctioning (since it
> didn't complete the cooperative revocation method), the problems imposed
> on the task are likely reasonable. This doesn't mean that the
> cooperative method was worthless.

I'm not claiming "one is good" and "the other is bad".  I'm just trying
to look at the realistic consequences of each approach.  How to
balance complexity, resources, etc. against "convenience" (for want
of a better word  :< )  I suspect most folks will just code as if
they could lose a resource prior to using it or *while* using it.
I imagine the result code from accessing the service/resource will
be *all* they look at.  And, that any signal handler for "resource
revocation" will simply be undefined.  It's just the least effort
approach (it should be obvious that I expect folks to be lazy in
their implementations!).

When faced with this sort of condition, I *also* expect these
folks to just report "FAIL" for their activities and not even
*try* to get things "right" (i.e., "as good as possible in
the circumstances")

> Also, non-backwards compatible specification changes ARE expensive. That
> is just the way things work, at least if you want to be able to talk
> about software having correctness. This does mean that you do want to
> put some effort into defining your requirements, to put into them the
> things you need to  verify/prove correctness, but not things that you
> don't need that add unneeded future limitations.

This is why I am spending the effort *now* considering how variuous
scenarios are likely to be handled.  I don't want to have to make
a change down the road because I "discovered" something that "can't
work".

I'm not vain enough to think I can come up with the Right way to
handle every situation.  But, I *do* think I can come up with a
practical way that handles most situations economically and *all*
situations "properly", even if not efficiently.

I can always decide *not* to revoke a capability!  Then, *none* of
the mechanism gets invoked.

Reply by George Neuner ●November 7, 20132013-11-07

Hi Don,

On Tue, 05 Nov 2013 12:54:15 -0700, Don Y <This.is@not.Me> wrote:

>Amoeba's "ticket" is far more efficient than my approach.  It can be
>copied, moved, etc. "for the cost of a long long" (IIRC).

In the original version yes ... later they went to a 256-bit ticket to
include more end-point information and better crypto-signing.

>In my case, a trap to the kernel is required for each operation on a
>"Handle" -- because it's a kernel structure that is being manipulated
>(or referenced).
>
>I can still give user-land services the final say in what a Handle
>*means* (along with the "authorities" that it conveys to its bearer).
>But, you have to go *through* the kernel to get back to userland.
>
>A subtle difference [vs Amoeba]: if "task" (again, forgetting lexicon
>differences) A decides to manipulate object H backed by service B, 
>in Amoeba's case, >B does all the work for each attempt A makes.  
>EVEN IF THE ATTEMPT IS DISALLOWED by H's authorizations.
>B's resources are consumed even though A has no authority to use B's
>object (H)!

In your case, kernel resources are consumed.  6 of one ...

And unless you can prevent A from even connecting to B there will be
"wasted" effort on B's part anyway.

I may be misunderstanding, but ISTM that you're trying to pack too
much into the meaning of capabilities [or possibly too much stock into
prior authorization].

Regardless of how capabilities are implemented (user vs kernel), every
system I have read about would divide the credentials and
authorizations involved in this problem among multiple capabilities:

  - X(H) is a legal operation on H
  - B administers H
  - A can perform X(H)
  - A can connect to B
  - B can perform X(H) as a proxy
  - B can perform X(H) as proxy for A

etc.

It seems as if you want to go straight to the final one - but the
question is: how do you get there?

Who grants to A that final capability that implies all the others?  To
get that capability presumes that A can talk to B (or some other
granting authority) in the first place ... which you seem to want to
prevent.

Obviously, B can tell the kernel that B administers H ... but how does
the kernel know what A wants with B?  How can A try to access H
directly?  "URN: A doesn't know about B."  Ok, but then can B act as a
proxy for anyone, or just for "authorized" users?  Who decides A is
authorized for H?  B?  How does B (or anyone else) know A wants access
to H if A can't even ask?

Amoeba and others solve the problem by letting B administrate.  A
connects to B, asks for access to H. A can present a ticket for H if
it has one, or B can issue a ticket to A if A is allowed but doesn't
have one.
[Amoeba servers have a public access API which anyone can connect to
ask for a ticket granting specific access to a managed object. After
first getting the ticket, they can connect to actually perform the
allowed operations.  Getting access then is a 2 step process.]

None of this requires free roaming user-space capabilities ... it all
can be with handles referencing secure capabilities kept by the kernel
or another credential server (Kerberos model).

>In my case, if A tries to use one of B's resources (H), it first must
>truly *be* one of B's resources (not just a long long that A *claims*
>is managed by B).  If not, the kernel disallows the transaction.

How does the kernel know H belongs to B?  
How does A know to ask for H in the first place?

>If H truly *is* backed ("handled") by B, then the kernel allows the
>transaction -- calling on B to enforce any finer grained authorities
>(besides "access").  I.e., B knows which authorities are available
>*in* H and can verify that the action requested is one of those allowed.

What "transaction"?  The set of possible objects and the actions that
might need to be performed on them both are unbounded.  

A generic "do-it" kernel API that can evaluate every possible action
on any object is a major bottleneck and a PITA to work with.  Even if
the high level programmer has a sweet wrapper API, the low level
programmer has to deal with absolutely anything that can be pushed
through the nonspecific interface.

For decades, Unix has been moving toward more verbose APIs and away
from trying to cram everything into ioctl().  [How many options do
sockets have now?  And how many different parameter blocks?]  

Linux, OTOH, went back-ass-wards with its new driver model in which
every operation is performed by reading/writing some special file.

>(File systems are bad examples because they are so commonly used to
>implement namespaces and not just "files")

A common directory service is fine, but I'm not particularly a fan of
uniform "file" interfaces.  I rather like the idea of being able to
ask an object (or its managing proxy) what functions are available.

Unfortunately, doing this generically is a PITA (so no one does it).
If you are familiar with COM or Corba, it amounts to the server
returning an IDL specification, and the program [somehow] being able
to interpret/use the IDL spec to make specific requests.

>> ... revoking a master capability must also revoke any other
>> capabilities derived from it [even if located on another host].
>
>This means "something" must track history/relationships.  

Yes.  However it is necessary.  If you no longer trust Q, then, by
transitivity, you no longer trust anyone Q may have delegated to.

>It also says nothing about *when* the revocation takes place 
>(effectively) and when notification of that event occurs.

Yes.  But as you said to someone else, every program must deal with
the possibility of permission being denied.  Under those
circumstances, notification can be deferred until attempted use.

System-wide synchronous revocation is impractical, but revocation can
be done asynchronously if master capabilities are versioned and
derived capabilities indicate which version of the master was in force
when they were issued.

It suffices for the owner/manager to be able to say "all capabilities
for H  [or better, X(H)] issued prior to CurVer(H) are no good".

It also can be done with time stamping, but that presupposes a system
wide synchronized notion of time.  In practice, versioning is simpler.

>I.e., in Amoeba's case, the kernel never knows who is holding which
>(copies!) of a particular ticket (derived from some other ticket, etc.).
>So, there is no wy for it to know who to notify AT THE TIME OF
>REVOCATION.  Instead, it has to rely on the Holder(s) noticing that
>fact when they *eventually* try to use their capabilities
>(tickets/keys).

So?  In your system host kernel's exchange capabilities and proxy for
one another.  How are you going to notify a host that's powered down?

>And, you are never sure when every ticket has been "discovered" to be
>voided -- a task can have a copy of a ticket (you can hold multiple
>copies of any ticket!) that he just hasn't got around to trying!
>
>Sort of like finding a bunch of keys in a desk drawer and not discarding
>them because you're not quite sure you *want* to discard them  *maybe
>they still FIT something!)

The analogy is semi-flawed:  capabilities shouldn't be thought of as
student key cards that open some subset of the doors on campus.

Properly a capability opens only one lock [i.e. addresses one object].
A rejected capability is known to be useless, so there's no point to
keeping it.

The "one lock" principle is applicable to replicated services: every
instance of a particular service should answer to the same set of
capabilities.

Obviously a capability system *can* provide key card functionality,
but you need to look at the situation in the opposite way: i.e. the
student's key card doesn't open a group of locks, but rather a group
of locks share capability to admit the card.

Semantics ... but important semantics.

>In my case, kernels are the only things that *hold* capabilities.
>So, all kernels can be notified that a particular capability has been
>revoked and they all *are* revoked.   Just like if your kernel
>chooses to delete a file descriptor (remembering that it is now
>a zombie), any future references by you (the task) to that fd can
>throw an error assuming you ignore the signal sent to notify you
>that it has been destroyed).

But hosts may be offline: powered down or network partitioned. How
long do you keep the "expiration" of a capability?  That just clutters
up your store.

At some point, you have to accept that a remote host may try to use a
capability the resource's host no longer honors.

>Yes.  My "factory" publishes Handles for key services that tasks may
>want to avail themselves of.  These are accessed by a single "Service
>Locator" Handle that is given to each task (task == process == resource
>container) as the task is created.  [Conceivably, the Handle for this
>service given to Task A can differ from Task B if the authorizations
>between A and B are to be different!].
>  :
>The task can then contact the Handler behind that Handle -- i.e., the
>service in question -- and make whatever requests it is authorized
>to make (based on its Handle).

But who decides what permissions A and B have wrt the service?

>More importantly, the creating task can do all of this for the "child"
>cramming the appropriate Handles for the Objects (incl Services) that
>the child will need AND THEN DELETING THAT INSTANCE OF THE SERVICE
>LOCATOR handle to effectively sandbox the child.   I.e., these are
>the resources you can use and operate on -- nothing more!

That's a nice feature.  Amoeba didn't have this, but other capability
systems did.

>If I were to tag Handles with "rightful owners", then proxies would
>be more apparent.  But, how do you validate a proxy's request for a
>Handle on behalf of another?  ("Please give me Bob's door keys...")

Again, this is a scenario of replicated service: local proxies should
be considered an instance of the remote service.  The user's
capability to access the service lets it access the proxy.  The proxy
itself should have a separate capability to access the remote service
so that the chain of trust remains valid.

>--don

George

Reply by Don Y ●November 7, 20132013-11-07

Hi George,

[eliding a lot for fear of hitting upper message length limit]

On 11/7/2013 2:27 PM, George Neuner wrote:
> In the original version yes ... later they went to a 256-bit ticket to
> include more end-point information and better crypto-signing.

OK.  But that just changes the size of the copy.  It still allows you
to create as many copies as you want -- without anyone knowing about
them.  And, makes "a certain bit pattern" effectively the same as
another copy of that capability!

>> A subtle difference [vs Amoeba]: if "task" (again, forgetting lexicon
>> differences) A decides to manipulate object H backed by service B,
>> in Amoeba's case,>B does all the work for each attempt A makes.
>> EVEN IF THE ATTEMPT IS DISALLOWED by H's authorizations.
>> B's resources are consumed even though A has no authority to use B's
>> object (H)!
>
> In your case, kernel resources are consumed.  6 of one ...

Yes.  No free lunch.  *Big* limitation but, I'm hoping, one with
worthwhile tradeoffs!

> And unless you can prevent A from even connecting to B there will be
> "wasted" effort on B's part anyway.
>
> I may be misunderstanding, but ISTM that you're trying to pack too
> much into the meaning of capabilities [or possibly too much stock into
> prior authorization].

A user (task) somehow gets a set of "authorizations" to a particular
object (an object may actually be a service, another task/thread, etc.).
This could come from a "parent" task handing the authorizations and
object reference -- together called a Handle, in my lexicon -- to
the task.  Or, from the task requesting that (object,authorization)
from some chain of "directory" services -- ultimately terminating
at a service that is responsible (and capable!) of satisfying this
request.

The user then wants to invoke a method supported by that object.
The Handle (which indicates the object and the authorizations thereof
FR THIS INSTANCE OF THE HANDLE) is presented to the kernel in an
IPC/RPC request (wrapper for the method to be invoked).

If the user doesn't have the *right* to connect to the "service" that
implements that object, then the RPC fails before it gets started.
I.e., a task can't talk to anything that it doesn't have the *right*
to talk to (this is a more fundamental "permission" than the
"authorizations" implemented in the capability/Handle).

I.e., I can disconnect your Handle from the service that backs it
and you're just a spoiled brat crying in a sandbox.  Nothing you
can do about it -- even if you *had* the authorizations to do
grand and wonderful things!  I've just "unplugged" the cable
tying you to that service.

Once the kernel has decided that you *can* "talk" to that service
(the one that backs the object in question), the IPC/RPC proceeds
(marshall arguments, push the message across the comm media, await
reply, etc.).

On the receiving end, the service sees your request come in.  Knows
the object to which it applies (because of which "wire" it came in on),
identifies the action you want to perform (becasue of the IPC/RPC
payload) and *decides* if you have been allowed to do that!

It does so by noting what permissions it has *recorded* for your
Handle when it *gave* you that Handle (or, when someone else gave it
to you on its behalf).  If the recorded permissions/authorizations
allow the action that you have requested to proceed, then the service
implements those actions and completes the IPC/RPC accordingly
(possibly returning ERROR if some OTHER, non-permission-related aspect
of the action fails).

As the Handler makes the *final* determinationas to whether or not
it wants to *do* whatever you've asked it to do to the referenced
object, it is free to define any number of such actions -- and any
number of arbitrary constraints on them!

E.g., it may let *you* write numbers into a file but someone else
can only write *letters* -- to that same file!  (I have no idea
why this would be important  :> )  So, unlike AMoeba and other
ticket-based systems, the number of "authorizations" isn't defined
by a bitfield *in* the "ticket/key".  Rather, its whatever the
Handler considers to be important.

"I'll let you send a message to this email_address_t -- but, it has
to be a short one."

"I'll let you send a message to this email_address_t -- but it can't
have any attachments!"

"I'll let you send a message to this email_address_t -- but it can't
contain any profanity"

"I'll..."

Much of the implementation is Mach-inspired.  Think of Handles as
port+authorizations.  Handles that don't implicitly have *send*
rights to the receiving port (which is held by the "Handler")
can't reference it (remembering that send rights can be revoked.
I.e., the holding task can be "disconnected" if the Handler decides
he is being abusive, etc.)

> Regardless of how capabilities are implemented (user vs kernel), every
> system I have read about would divide the credentials and
> authorizations involved in this problem among multiple capabilities:
>
>    - X(H) is a legal operation on H

I.e., there is an IDL for X(H)

>    - B administers H

... and task B holds the receive rights for the port that references H
(so, any references to H USING THAT HANDLE will end up in B's lap)

>    - A can perform X(H)

... because "someone" told B to allow those permissions for requests
coming in on the port assigned (given) to A by which it can access
object H

>    - A can connect to B

... because A (still) holds a send right to the port for which B is
the receiver

>    - B can perform X(H) as a proxy

... because it is B's job to implement X on H (or, to know how to
get *other* agents to perform portions of that operation)
A doesn't know *how* to "read a file", "turn on a motor", etc.
I.e., the methods associated with H

>    - B can perform X(H) as proxy for A

As above.

> It seems as if you want to go straight to the final one - but the
> question is: how do you get there?
>
> Who grants to A that final capability that implies all the others?  To
> get that capability presumes that A can talk to B (or some other
> granting authority) in the first place ... which you seem to want to
> prevent.

In the Beginning, ...  :>

> Obviously, B can tell the kernel that B administers H ... but how does
> the kernel know what A wants with B?

Kernel doesn't *care* what A's intentions are!  Doesn't *want* to care!
It wants *H* to determine what can be done -- on H!  Expects "someone"
(task) to implement those actions -- call him B, Q or Elephant.

All kernel does is let these two parties talk to each other.  And,
prevent others from talking that don't have the "right" (deliberate
choice of word) to talk to each other.  The Handler for an object
ultimately implements the permission(s) and actions ("Sorry, I
don't want to do that for you and you can't make me!")

> How can A try to access H directly?

A has no knowledge of who is "backing" H.  A starts with a *name* for an
object (assuming it isn't trying to *create* a yet-to-be-named object).
It consults a namespace (another Object that has been created for it
and, to which, it has been given access "authorizations" -- of some
degree) that has been created for its use.  Only things that are
referenced in that namespace "exist", as far as A is concerned!

Think of it as chroot($HOME) -- /etc/passwd doesn't exist in that
context unless *you* happen to have coincidentally created your own
"object" and named it such.

The namespace, like any other object, is "backed" (handled) by some
active entity.  When you use the Handle that you have been pre-endowed
with (by init?) to access (and operate on!) that namespace, you can
ask the namespace to resolve a name... however "names" are defined
in your namespace (e.g., names might be simple integers, or 8000
character strings, or binary numbers, or...).  You obviously must have
some agreed upon convention WITH THE ENTITY THAT CREATED YOUR NAMESPACE
about how names are defined -- and possibly *used* -- in that namespace.

That convention may be different for some other namespace -- even if
that other namespace is handled by the same active entity!  All that
matters is the agreed upon syntax of the API -- as evidenced in the
IDL for that "method" -- and the conventions you agree to (when your
code was written).

When you "lookup" a name, the namespace service (for that namespace,
yada yada yada) gives you a Handle to the *object* that is paired
with the name you provided.  Or, "ERROR_NOT_FOUND", etc.

Again, by convention, you know the type of the object that you have
just been granted a "reference" to.  So, you know what methods
you can *potentially* ask to be performed by that "object" on
your behalf.

The Handler that backs that object (referenced in your Handle),
holds the receive right (Mach-speak) for that "port".  (You now
hold a *send* right to it).  When that Handle is used in an IPC/RPC,
the identifier of the particular IPC/RPC "method" of interest,
along with any arguments involved, will be delivered to the
Handler holding that receive right FOR THAT PORT (meaning the
*object* associated with that port/Handle).

If, for example, "H" is the file system, then you might be
asking B to "create a new file" in that filesystem.  Where in
the *real* filesystem it actually resides may be hidden from you.
All you care is that you will subsequently be able to access it
using the name "foo" -- that you provided (presumably avoiding
any conflict with other names IN YOUR NAMESPACE -- because the
Handler for your namespace won't let you create a "new name"
that conflicts with an "old name" (part of the convention
that you adhere to when you interact with a Namespace object!)

Presumably, you will put something in this file. Or, perhaps
not.  Maybe your role was just to create it, prevent its
deletion and place it into a *new* namespace that you will
pass onto one of your "offspring" -- so *it* can fill it
with content!

> "URN: A doesn't know about B."  Ok, but then can B act as a
> proxy for anyone, or just for "authorized" users?  Who decides A is
> authorized for H?  B?  How does B (or anyone else) know A wants access
> to H if A can't even ask?

Who decides that UID "don" can access ~don but not ~george?

> Amoeba and others solve the problem by letting B administrate.  A
> connects to B, asks for access to H. A can present a ticket for H if
> it has one, or B can issue a ticket to A if A is allowed but doesn't
> have one.

Same thing, here.

> [Amoeba servers have a public access API which anyone can connect to
> ask for a ticket granting specific access to a managed object. After
> first getting the ticket, they can connect to actually perform the
> allowed operations.  Getting access then is a 2 step process.]

Same sort of approach.  But, the kernel has no explicit knowledge of
what that "specific access" entails.  It just routes messages between
endpoints after ensuring that you have the "right" to use a particular
endpoint!

> None of this requires free roaming user-space capabilities ... it all
> can be with handles referencing secure capabilities kept by the kernel
> or another credential server (Kerberos model).

User-space capailities allow the kernel to get out of the loop.
But, mean that the kernel can't *do* anything to control the
proliferation of copies, etc.

>> In my case, if A tries to use one of B's resources (H), it first must
>> truly *be* one of B's resources (not just a long long that A *claims*
>> is managed by B).  If not, the kernel disallows the transaction.
>
> How does the kernel know H belongs to B?

It doesn't.  It just pushes a message down that "pipe" and... Gee, look,
B is suddenly READY to execute, again!  How'd that happen?  :>

> How does A know to ask for H in the first place?

Convention.  How do you know to ask for ~/.profile when a user logs in?
Why not /foo/biguns?

>> If H truly *is* backed ("handled") by B, then the kernel allows the
>> transaction -- calling on B to enforce any finer grained authorities
>> (besides "access").  I.e., B knows which authorities are available
>> *in* H and can verify that the action requested is one of those allowed.
>
> What "transaction"?  The set of possible objects and the actions that
> might need to be performed on them both are unbounded.

Yes.  Kernel cares not about *what* A is asking B to do on H.
Does your UNIX box care if you push "ABCD" down a particular
named pipe to some random process on the other end?  All it does
is make the mechanism available to you as an AUTHORIZED USER
of that mechanism.  The fact that ABCD causes the reciving process
to erase every odd byte on /dev/rdsk is no concern of the kernel!

> A generic "do-it" kernel API that can evaluate every possible action
> on any object is a major bottleneck and a PITA to work with.  Even if
> the high level programmer has a sweet wrapper API, the low level
> programmer has to deal with absolutely anything that can be pushed
> through the nonspecific interface.

Handlers and Holders conspire as to what actions they want/need to
support.  If you want to be able to erase every odd byte on the raw
disk device, then *someone* has to write the code to do that!
If you want to ensure this action isn't casually initiated, then
someone has to enforce some "rules" as to who can use it -- and
even *how*/when (e.g., you might have authorization to do this,
but the Handler only lets it happen on Fridays at midnight).
Let the Handler and Holder decide what makes sense to them!

I wanted to keep the kernel out of the "policy" issues and just
let it provide/enforce "mechanism".

Unfortunately, it makes the kernel a bottleneck as all IPC/RPC
has to be authenticated there.  But, it gives me a stranglehold
on "who can do what".  It also gives Handlers the ability to
decide what constitutes abuse of privilege -- *its* privilege!
And, provides far more refined ideas of what those privleges
actually *are*.

E.g., the email example (that I seem to have become obsessed with).
I can have "something" put textual representations of email
addresses in the RDBMS.  Something (else?) can pull them out,
wrap them in a "method" and hand them to "consumers".  Those
consumers can invoke the method (".sendmail") on the object
(address) and never anything more.  If I later want to ensure they
can;t continue to use that object (email address), I can revoke
their "authorization" to use that method on that instance of that
object.  (Or, I can "unwire" the Handle completely -- so, any
future operation throws an error)

> For decades, Unix has been moving toward more verbose APIs and away
> from trying to cram everything into ioctl().  [How many options do
> sockets have now?  And how many different parameter blocks?]

My approach is more like pushing untyped data through a function
interface and knowing that the thing on the other end will make
sense of it.  The IDL lets "humans" agree on just what any particular
set of data on a particular interface are LIKELY to mean!

> Linux, OTOH, went back-ass-wards with its new driver model in which
> every operation is performed by reading/writing some special file.

This is the Inferno way, as well.  In some aspects, its nice.  But,
its also tedious.

>> (File systems are bad examples because they are so commonly used to
>> implement namespaces and not just "files")
>
> A common directory service is fine, but I'm not particularly a fan of
> uniform "file" interfaces.  I rather like the idea of being able to
> ask an object (or its managing proxy) what functions are available.

I don't have a filesystem.  I have *namespaces*.  *Multiple*
namespaces.  Filesystems traditionally bound names (and containers)
to "magnetic domains on a medium".  Then, to "drivers" for particular
devices.

In my case, a namespace binds a name to a Handler.  What that Handler
does and how it does it can have absolutely nothing in common with
any other Handler in the system.

The *namespace* "object" has operations that can be performed on
it (methods defined in the IDL that can be applied to any Handle
that references that particular *flavor* of namespace).  E.g.,
resolve(), create(), delete(), etc.  But, it has no sense of
reading/writing *to* the Handles that it manages.

> Unfortunately, doing this generically is a PITA (so no one does it).
> If you are familiar with COM or Corba, it amounts to the server
> returning an IDL specification, and the program [somehow] being able
> to interpret/use the IDL spec to make specific requests.

I don't implement a full-fledged factory.  Rather, I assume you know
everything there is to know about the objects with which you are
interacting.  That you and their Handlers have conspired beforehand
to some set of agreed upon methods (abilities?  trying to avoid
using the word "capabilities").

So, when you decide to revoke the "move motor left at high speed"
authorization from a Handle that previously *had* that authorization,
*you* and the Handler know what this means.  The kernel doesn't care!
If, tomorrow, you decided to implement a "reduce motor operating
current until full stall" authorization, so be it.  Kernel never
changes.  None of the other "tasks" change.  Just users of that IDL
(and, specifically, this new method added to it)

>> It also says nothing about *when* the revocation takes place
>> (effectively) and when notification of that event occurs.
>
> Yes.  But as you said to someone else, every program must deal with
> the possibility of permission being denied.  Under those
> circumstances, notification can be deferred until attempted use.

I'm trying to find a middle ground.

I don't want a Holder to have "poll" to see if an authorization is
still valid (or, that even the *object* to which that authorization
applied still exists!).

Nor do I want to prenotify before revoking authorizations (or
deleting objects or unwiring connections or...).

I figured the best compromise (noun:  a situation where EVERYONE
gets screwed) is to allow asynchronous revocation but provide
a notification ex post factum.  I.e., if they haven't *yet*
tried to exercise the authorization, they get notified.  If they
are in the process of using it, they may or may not succeed
(depends on how the race is won).  And, if they don't *care*,
they can ignore the notification and wait until they try to
ues the authorization, later!

<shrug>  It *seems* like the most bang for the least buck.

> System-wide synchronous revocation is impractical, but revocation can
> be done asynchronously if master capabilities are versioned and
> derived capabilities indicate which version of the master was in force
> when they were issued.

No need for versioning.  Handles are unique -- not "reused" (until all
references to it are known to be gone).  As they can't be duplicated
(without kernels involvement), it knows when it is safe to reuse a
stale Handle.  (a task can *try* to hold onto it but the kernel that
serves that task *knows* it doesn't exist anymore.  "File descriptor
27 is no longer attached to a file -- regardless of what you may
*think*!"

>> I.e., in Amoeba's case, the kernel never knows who is holding which
>> (copies!) of a particular ticket (derived from some other ticket, etc.).
>> So, there is no wy for it to know who to notify AT THE TIME OF
>> REVOCATION.  Instead, it has to rely on the Holder(s) noticing that
>> fact when they *eventually* try to use their capabilities
>> (tickets/keys).
>
> So?  In your system host kernel's exchange capabilities and proxy for
> one another.  How are you going to notify a host that's powered down?

The tasks running on that host (whose Handles are held *in* that host!)
are dead.  They can't access anything even if they wanted to!

The handles in *other* hosts that reference objects *backed* by tasks
in that host are told that the other end has come unplugged.  So,
all of *those* Handles cease to exist (and they are notified).

If tasks on the down host referenced objects on these "up" hosts, the
Handlers for each of those objects are told that the connection is
broken and they need no longer expect requests on those Handles.

The problem is more one of *recovery* after the fact.  How do you
rebuild these connections?  I currently have no notion of
persistence in the system.  Once it goes down, it reboots from
scratch -- anything in progress is lost (unless the agents doing
the work deliberately elected to create persistent objects from
which they could resume operations)

>> And, you are never sure when every ticket has been "discovered" to be
>> voided -- a task can have a copy of a ticket (you can hold multiple
>> copies of any ticket!) that he just hasn't got around to trying!
>>
>> Sort of like finding a bunch of keys in a desk drawer and not discarding
>> them because you're not quite sure you *want* to discard them  *maybe
>> they still FIT something!)
>
> The analogy is semi-flawed:  capabilities shouldn't be thought of as
> student key cards that open some subset of the doors on campus.
>
> Properly a capability opens only one lock [i.e. addresses one object].

Yes.  "Set of keys" implies "set of locks".  If keys can be freely
copied, there is no way to know where every copy resides.  No way
to *notify* the holder that a particular key no longer works:
"The lock has been changed"

> A rejected capability is known to be useless, so there's no point to
> keeping it.

Assumes you have *tried* the Handle and discovered it to be useless.
Or, been notified (see above) that it has been revoked (rendered
useless).

My point was that a set of 64 (or 256) bit values in memory tells
you nothing about whether you should keep them -- or not.  You'd have
to go around "trying your keys" to see which ones are worth keeping.

Mch like finding a set of keys in a desk drawer:  you try them on
every lock you can think of.  The ones that work, you set aside.
The ones that don't, you decide if they are worth discarding
(Hmmm... are there any locks I have forgotten to test??)

OTOH, if you don't want to test them (now), the only "safe bet" is
to hold onto them -- just in case!

> The "one lock" principle is applicable to replicated services: every
> instance of a particular service should answer to the same set of
> capabilities.
>
> Obviously a capability system *can* provide key card functionality,
> but you need to look at the situation in the opposite way: i.e. the
> student's key card doesn't open a group of locks, but rather a group
> of locks share capability to admit the card.

The kernel doesn't care about this.  It's up to the Handler for the
objects in question to make his implementation choice.

E.g., two Handles (in the same or different tasks) can map onto the
same object.

A Handle can map onto multiple objects -- if a proxy handling the
Handle acts on your behalf ("The phone only rings in one location.
If you want ot be able to call two people, you need two phone
numbers and the ability to dial both/either).

Two file descriptors in different (or same) process can reference
the same file.  If you want to reference *two* files, you need to
have a proxy that knows how to interpret your request(s) for each
file (said proxy having two file descriptors).

Or, do it yourself as two fd's.

> Semantics ... but important semantics.
>
>> In my case, kernels are the only things that *hold* capabilities.
>> So, all kernels can be notified that a particular capability has been
>> revoked and they all *are* revoked.   Just like if your kernel
>> chooses to delete a file descriptor (remembering that it is now
>> a zombie), any future references by you (the task) to that fd can
>> throw an error assuming you ignore the signal sent to notify you
>> that it has been destroyed).
>
> But hosts may be offline: powered down or network partitioned. How
> long do you keep the "expiration" of a capability?  That just clutters
> up your store.

When host comes back up, local Handle doesn't exist.  Memory is empty.
Local kernel has no knowledge of what happened before the lights went
out.

If you are incommunicado for "too long" (whatever that means), others
come to the conclusion that you are powered off.  Anything "wired"
into you is invalidated.  Come back on-line and *claim* you've been
running all this time regardless of how it looks?  "Gee, that's too
bad.  We though you had moved out and sold all your stuff..."

> At some point, you have to accept that a remote host may try to use a
> capability the resource's host no longer honors.
>
>> Yes.  My "factory" publishes Handles for key services that tasks may
>> want to avail themselves of.  These are accessed by a single "Service
>> Locator" Handle that is given to each task (task == process == resource
>> container) as the task is created.  [Conceivably, the Handle for this
>> service given to Task A can differ from Task B if the authorizations
>> between A and B are to be different!].
>>   :
>> The task can then contact the Handler behind that Handle -- i.e., the
>> service in question -- and make whatever requests it is authorized
>> to make (based on its Handle).
>
> But who decides what permissions A and B have wrt the service?

How do you decide that task A should be able to turn the motor on
but not task B?  You MAKE THAT DECISION and then you put it in the
code.  Unless the code gets rewritten (or bug), B simply never
thinks about talking to the motor.

>> More importantly, the creating task can do all of this for the "child"
>> cramming the appropriate Handles for the Objects (incl Services) that
>> the child will need AND THEN DELETING THAT INSTANCE OF THE SERVICE
>> LOCATOR handle to effectively sandbox the child.   I.e., these are
>> the resources you can use and operate on -- nothing more!
>
> That's a nice feature.  Amoeba didn't have this, but other capability
> systems did.

I think it is important for things like init -- to be able to go away
(free up its resources AND IT'S UTMOST PRIVILEGE LEVELS!)

>> If I were to tag Handles with "rightful owners", then proxies would
>> be more apparent.  But, how do you validate a proxy's request for a
>> Handle on behalf of another?  ("Please give me Bob's door keys...")
>
> Again, this is a scenario of replicated service: local proxies should
> be considered an instance of the remote service.  The user's
> capability to access the service lets it access the proxy.  The proxy
> itself should have a separate capability to access the remote service
> so that the chain of trust remains valid.

Exactly.  A on host 1 doesn't talk to the Handle for B on host 2.
A, instead, talks to a proxy on host 1.  The kernels have conspired
to wire this proxy to another proxy (actually, a part of the remote
kernel) on host 2 that, in turn connects to B.

So, when host 2 dies, the proxy on host 1 sees that (because the kernel
on 1 loses contact with kernel 2 -- anything that is "wired" to that
remote kernel is now notified of the failure.  That in turn is
propagated up to A, et al.

Never instantaneous.  But, anything "in the works" when the host goes
down fails to see a completion code so knows it has been unceremoniously
aborted "in progress".

(see why I htink async notifications ex post factum are the only
realistic solutions?)

Now, to see if news server bellyaches about length of this post...

<cringe>