On 7/15/2020 11:02 PM, George Neuner wrote:
> On Tue, 14 Jul 2020 11:48:14 -0700, Don Y
> <blockedofcourse@foo.invalid> wrote:
> 
>> On 7/14/2020 3:21 AM, George Neuner wrote:
>>> On Mon, 13 Jul 2020 13:30:16 -0700, Don Y
>>> <blockedofcourse@foo.invalid> wrote:

>>>> The ONE question I am posing here is "are there any?"
>>>
>>> I'm not sure what you're getting at here: what would be a "legal"
>>> attempt to execute NX protected memory?
>>
>> The same sort of thing that would be a "legal" attempt to read
>> "no read" memory.  In THAT case, deliberately exploiting the protection
>> mechanism to detect references to memory that hasn't yet been mapped!
>>
>> A similar case exists for trapping writes to "no write" memory (e.g., CoW).
>>
>> Both are examples of "legal" accesses that have been preempted to
>> implement FEATURES.  So, code that relies on these hooks would have to
>> <panic> if the hooks were not available.
> 
> Not really. There are legitimate reasons to read-protect readable
> memory: to enable hardware assisted GC for instance.

You're just reinforcing my distinction of cases where a particular protection
is REQUIRED by the code vs. DESIRED by the developer.  If your code RELIED
on hardware assist for GC (note my emphasis on RELIED!), then, if the hosting
processor didn't provide that support, your code would fail/panic.

> But you don't GC executable code ... at least not *while* it still is
> executable ... so the analogy to protecting readable data doesn't
> hold.

Marking a part of memory as "no execute" -- KNOWING that you will
eventually want to execute <something> that (will) reside there -- is
comparable to marking a part of memory as "no read/access" KNOWING that
you will eventually want to read something that WILL reside there,
after it is "loaded" (or "something else") from some other mechanism
(which may or may not be backing store).

Marking it as "no execute" because it's an embedded JPEG image is entirely
different -- your code SHOULDN'T be trying to execute a JPEG; you
wouldn't rely on this mechanism for your code's correctness.

> The idea that you might try to verify "right to execute" doesn't wash
> either, because once a code is unprotected anyone can execute it - not
> just one process that somehow has been "verifed".

It's not a question of "verifying right to execute" -- any more than
your GC example is "verifying right to read".  You develop the algorithm
(GC in your case) with a reliance (or not) on having a particular
mechanism at your disposal.  In the absence of that mechanism (note my
"ONE question"!), you either <panic> or fall back on an alternative
algorithm (i.e., if you don't want to code and maintain an alternative
algorithm -- or, if that algorithm doesn't satisfy the same requirements
as the first -- then you can only <panic>)

And, you can only execute code that is made available to you (process).
UNavailable means "not in YOUR address space" *or* "not marked as
executable".  Just like data that isn't in your address space (or is
marked as no-write) isn't writable even if someone ELSE can write it!

[I can't prevent individual THREADS in the same process container as the
INTENDED thread from executing that code.  But, that's not a loss of
control as the "intended" thread could just as easily decide to do
whatever the other threads would!]

> Do this kind of
> thing requires segmentation (or real capabilities), not just page NX
> (unless the protection somehow is virtualized individually for all
> processes - which is not the case with any existing CPU).

Huh?  I can set a "no execute" permission at page granularity.
And, set that protection ONLY in the translation table for process A.
Process B's page table may define it as R/W.  And, process C's may
define it as "no access", at all!  So, process B could, potentially,
be building the executable image while A is potentially tied up in a
trap (if it had tried to execute the page's contents before they were
ready)

Do you expect ALL processes to share a single address space?

I'm designing the SPARC implementation, presently.  The smallest
page size is a bit larger than I'd like -- but, then again, I don't
plan on deploying a SPARC box (but, they are excellent for testing code
portability, endianness tolerance, on-the-fly data format conversions,
etc.)

It also illustrates the problems encountered when designing the "protection"
API -- SPARC only lets me support R/O, R/W, R/X, R/W/X, X/O and "no access".
After that, I'll tackle the ARM port.

I'm hoping to demo a heterogeneous system for the upcoming off-site.
Showing real-time redundancy support among dissimilar hosting nodes
should be impressive! (i.e., unplugging a SPARC and watching the
services that it was supplying migrate to an ARM or x86 -- and
/vice versa/ -- while still satisfying any timeliness constraints
that were in effect at the time of the interruption)

[Beyond that, as I said to Theo, up-thread, I'm not keen on discussing what
I'm doing or how I'm doing it!  The magic is left as an exercise for the
reader -- or, someone sufficiently interested to explore how to ACTUALLY
do it (beyond an armchair interest)]

On Tue, 14 Jul 2020 11:48:14 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 7/14/2020 3:21 AM, George Neuner wrote:
>> On Mon, 13 Jul 2020 13:30:16 -0700, Don Y
>> <blockedofcourse@foo.invalid> wrote:
>> 
>>> For no-execute, I don't see any use beyond that of enhancing
>>> confidence in the codebase; I don't see any *features* that
>>> rely on detecting "legal" attempts to execute particular
>>> regions of memory (assuming execute-access is treated as
>>> a special case of read-access).
>>>
>>> The ONE question I am posing here is "are there any?"
>> 
>> I'm not sure what you're getting at here: what would be a "legal"
>> attempt to execute NX protected memory?
>
>The same sort of thing that would be a "legal" attempt to read
>"no read" memory.  In THAT case, deliberately exploiting the protection
>mechanism to detect references to memory that hasn't yet been mapped!
>
>A similar case exists for trapping writes to "no write" memory (e.g., CoW).
>
>Both are examples of "legal" accesses that have been preempted to
>implement FEATURES.  So, code that relies on these hooks would have to
><panic> if the hooks were not available.

Not really. There are legitimate reasons to read-protect readable
memory: to enable hardware assisted GC for instance.

But you don't GC executable code ... at least not *while* it still is
executable ... so the analogy to protecting readable data doesn't
hold.

The idea that you might try to verify "right to execute" doesn't wash
either, because once a code is unprotected anyone can execute it - not
just one process that somehow has been "verifed".  Do this kind of
thing requires segmentation (or real capabilities), not just page NX
(unless the protection somehow is virtualized individually for all
processes - which is not the case with any existing CPU).

>OTOH, if you use the protections to detect errant accesses ("can't happen"),
>then you are relying on it to reinforce your belief that your code is robust.
>The absence of the hooks might be troubling (if you lack confidence in
>your testing regime) but shouldn't interfere with your code's proper operation.
>
>> Obviously something like a JIT compiler needs to be able to switch off
>> NX for the code buffer so a program can execute it ... but there's no
>> point to allowing access before the code is ready
>
>Unless you *truly* JIT the code "as needed" (i.e., a memory region at
>a time) *or* are JIT'ing in parallel with execution.  The executing thread
>could be accessing "already JIT'ed" portions of the code without having
>to wait for the JIT compiler to "finish".

Demand JIT is no problem, even with multiple threads, as long as it is
done function by function, and function calls are indirected through a
jump table.  The initial table entry traps into the runtime so callers
get blocked until the compile finishes, then the table entry is
changed to point to the new code [and waiting callers are unblocked].

Runtime/OS trap, yes.  MMU fault, no!

There are parallel JIT systems that initially compile with little/no
optimization, and then monitor demand for the native code.  When a
code path is found to be in "high" demand (for some definition) then
it gets recompiled with better optimization.

But that requires allowing multiple versions of the code to exist
simultaneously, so as not to disrupt programs that may be USING the
existing version while a new one is being created.  Doing this sort of
thing requires significant resources in the host.

And in some cases, such systems retain ALL versions of the code and
each process is allowed to trap individually into optimized code
paths.  This lets the runtime focus on high demand code paths and not
waste resources optimizing low demand paths, e.g., low frequency
exception handlers.

>[Think about how a code image would be built, in detail, if it spanned
>multiple memory allocation units AND YOU DIDN'T HAVE A BACKING STORE!]

I certainly would *NOT* try to compile in parallel with execution
under those circumstances:  regardless of whether the "distribution"
source image remains available somewhere else, I would do a batch
compile to create the native execution image.

And I certainly would not overwrite the source image as I compiled it.
At best, I would compile it in convenient chunks, building up the
execution image piece by piece, and discarding source chunk by chunk
only when no longer needed.

Android has done this for years, since v5 "Lollipop" when they ditched
the Dalvik machine for the ART runtime.  When you install an app mow,
the bytecode "distribution" version immediately gets translated into
native code.  The native version is stored on the device and the
bytecode version is discarded.

George

On 7/14/2020 3:21 AM, George Neuner wrote:
> On Mon, 13 Jul 2020 13:30:16 -0700, Don Y
> <blockedofcourse@foo.invalid> wrote:
> 
>> For no-execute, I don't see any use beyond that of enhancing
>> confidence in the codebase; I don't see any *features* that
>> rely on detecting "legal" attempts to execute particular
>> regions of memory (assuming execute-access is treated as
>> a special case of read-access).
>>
>> The ONE question I am posing here is "are there any?"
> 
> I'm not sure what you're getting at here: what would be a "legal"
> attempt to execute NX protected memory?

The same sort of thing that would be a "legal" attempt to read
"no read" memory.  In THAT case, deliberately exploiting the protection
mechanism to detect references to memory that hasn't yet been mapped!

A similar case exists for trapping writes to "no write" memory (e.g., CoW).

Both are examples of "legal" accesses that have been preempted to
implement FEATURES.  So, code that relies on these hooks would have to
<panic> if the hooks were not available.

OTOH, if you use the protections to detect errant accesses ("can't happen"),
then you are relying on it to reinforce your belief that your code is robust.
The absence of the hooks might be troubling (if you lack confidence in
your testing regime) but shouldn't interfere with your code's proper operation.

> Obviously something like a JIT compiler needs to be able to switch off
> NX for the code buffer so a program can execute it ... but there's no
> point to allowing access before the code is ready

Unless you *truly* JIT the code "as needed" (i.e., a memory region at
a time) *or* are JIT'ing in parallel with execution.  The executing thread
could be accessing "already JIT'ed" portions of the code without having
to wait for the JIT compiler to "finish".

[Think about how a code image would be built, in detail, if it spanned
multiple memory allocation units AND YOU DIDN'T HAVE A BACKING STORE!]

On 7/14/2020 2:40 AM, Theo wrote:
> Don Y <blockedofcourse@foo.invalid> wrote:
>> On 7/13/2020 1:30 PM, Don Y wrote:
>>> The ONE question I am posing here is "are there any?"
>>
>> A colleague has come up with a few "non-contrived" examples
>> so the answer is "yes", apparently!  OK, so, that will make
>> things quite a bit easier...
> 
> Do you feel like sharing them?

One (or more) threads WITHIN a task (sharing the SAME logical
address space as the thread that may, eventually, try to EXECUTE
"marked" code) can be building/modifying that code image asynchronously
wrt the thread that wants to execute it.  Think of the no execute
provision as a form of implicit lock -- instead of having to
(prematurely and possibly unnecessarily!) EXPLICITLY block the thread(s)
that might want to fetch opcodes from that region.

The same sorts of things apply inside the (multithreaded) kernel.

I.e., don't restrict your thinking to kernel vs. user activities
in a given memory region.

[Note that this can include activities beyond JIT'ing -- but I don't care to
expound on those]

As all threads (in a task/kernel) share the same address mappings/protections,
there's no way to ensure that thread X can diddle with the page while thread
Y can't (yet) execute its contents (unless you modify the protections on a
per THREAD basis AND have a uniprocessor).

OTOH, if you exploit the fact that X's accesses will be "read/write" (diddling)
while Y accesses as "execute" -- different permissions enable/restrict the
threads implicitly.

My interest is in the "protection" API... NOT wanting to have to special-case
execute permissions differently than read or write.  If the developer's code
can tolerate NOT having a read or write permission that he had "hoped for",
then he should be able to tolerate not having an execute permission, too!
Conversely, if the developer RELIED on having read or write traps then he
should be able to express that dependency in the execute permission of the API.

I.e., if your *desire* for a particular set of permission(s) at a point in
your code is just to enhance reliability, then you can <shrug> off not having
ANY of the desired set.  OTOH, if your desire stems out of a NEED for a
particular set of permissions -- cuz your algorithm relies on them -- then
you will have to <panic>.  The API has to expose these different cases to
the developer.

[Or, more properly, your task won't be admitted to a host that doesn't
advertise those capabilities as being available; you'll have to be
installed/migrated onto a host that DOES!  Just like if your code
RELIES on floating point hardware being present (because the emulation
is too slow)]

Am 14.07.2020 um 12:21 schrieb George Neuner:
> On Mon, 13 Jul 2020 13:30:16 -0700, Don Y
>> For no-execute, I don't see any use beyond that of enhancing
>> confidence in the codebase; I don't see any *features* that
>> rely on detecting "legal" attempts to execute particular
>> regions of memory (assuming execute-access is treated as
>> a special case of read-access).
>>
>> The ONE question I am posing here is "are there any?"
> 
> I'm not sure what you're getting at here: what would be a "legal"
> attempt to execute NX protected memory?
> 
> Obviously something like a JIT compiler needs to be able to switch off
> NX for the code buffer so a program can execute it ... but there's no
> point to allowing access before the code is ready.

The "+1" of a JIT compiler would be a virtualisation solution that has
paged in the whole guest system already, but wants to analyze or
redirect attempts to execute code.

As I understand, older virtualisation solutions for x86 worked that way
because x86 has some instructions that you want to virtualize but that
do not trap on older processors (e.g. sgdt, str). You had to play some
extra tricks because x86 originally didn't have per-page NX bits.


  Stefan

On Mon, 13 Jul 2020 13:30:16 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>For no-execute, I don't see any use beyond that of enhancing
>confidence in the codebase; I don't see any *features* that
>rely on detecting "legal" attempts to execute particular
>regions of memory (assuming execute-access is treated as
>a special case of read-access).
>
>The ONE question I am posing here is "are there any?"

I'm not sure what you're getting at here: what would be a "legal"
attempt to execute NX protected memory?

Obviously something like a JIT compiler needs to be able to switch off
NX for the code buffer so a program can execute it ... but there's no
point to allowing access before the code is ready.

George

Don Y <blockedofcourse@foo.invalid> wrote:
> On 7/13/2020 1:30 PM, Don Y wrote:
> > The ONE question I am posing here is "are there any?"
> 
> A colleague has come up with a few "non-contrived" examples
> so the answer is "yes", apparently!  OK, so, that will make
> things quite a bit easier...

Do you feel like sharing them?

I'm assuming you're discounting all the usual demand-paging things that
happen with read access - ie you're interested in execute rather than things
that would happen when reading the same page.

One I can think of is a form of dynamic linking.  You don't load in all the
shared libraries at the beginning, the branch instructions simply point to
pages not backed by physical RAM.  When you take a branch you trap, and then
dynamic linker goes in, loads the library, and fixes up all the branch
instructions to point to the actual symbols in the dynamic library.
That's roughly demand-paging but with additional fixing up afterwards.

Another is if you want to enforce some kind of preconditions when executing
a particular piece of code.  Perhaps we want to make sure you have the
secret token in a register before you're allowed to execute, or have called
the code from a permitted context.  So the kernel maps the pages
non-executable and, when the trap is taken, it checks the preconditions
before deciding whether to allow execution to proceed.

A final one might be a form of JIT, for example in binary translation.  You
don't want to translate the whole executable in one go, so you don't bother
translating anything pointed to by a branch, you just write a branch to
nowhere.  If the branch to nowhere is taken, you then translate a new block
of code and fill in the branch to point to it.  This might apply to higher
level language translation too - you can put 'branches to nowhere' to point
anywhere there's a route of control flow that you haven't yet handled, and
fix them up if that route ever gets taken.  Or maybe you could decide to
interpret a particular code path instead of translating it - perhaps (for
whatever reason) you decide latency is more important than speed.

I'm sure we could think of some more...

Theo

On 7/13/2020 1:30 PM, Don Y wrote:
> The ONE question I am posing here is "are there any?"

A colleague has come up with a few "non-contrived" examples
so the answer is "yes", apparently!  OK, so, that will make
things quite a bit easier...

Conceptually, memory (or a memory region) can be thought of as
being readable, writeable or executable -- and any combination
of these.  Hardware *may* enforce these protections -- to
varying degrees.

Usually, you opt to PREVENT some set of these operations
(i.e., no write, no read, no execute).  In many cases, this
acts as an invariant -- "this is data; nothing should be
mucking with it so mark it not-writeable; furthermore, I
know that it is not intended to be representative of
program opcodes so mark it as non-executable... and tell
me if something has been overlooked, at run-time, that
causes one of these types of accesses to transpire!"

So, it's a belts-and-braces mechanism.

But, for no-read and no-write, there are real uses
that rely on the hardware protections for successful
implementation (e.g., CoW, zero-copy semantics, etc.).

A developer expecting to use these hooks simply to
give enhanced confidence in his code (i.e., nothing
ever overflowing the stack) COULD, possibly, live
without them.

OTOH, a developer relying on them to implement specific
*features* REQUIRES their presence.

So, if, at run-time, you requested a particular set of
protections on a region of memory and they weren't
granted (e.g., because they aren't supported on this
processor instance), you might <shrug> in some cases
and <panic> in others.

For no-execute, I don't see any use beyond that of enhancing
confidence in the codebase; I don't see any *features* that
rely on detecting "legal" attempts to execute particular
regions of memory (assuming execute-access is treated as
a special case of read-access).

The ONE question I am posing here is "are there any?"