EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Memory Protections

Started by Don Y July 13, 2020
Conceptually, memory (or a memory region) can be thought of as
being readable, writeable or executable -- and any combination
of these.  Hardware *may* enforce these protections -- to
varying degrees.

Usually, you opt to PREVENT some set of these operations
(i.e., no write, no read, no execute).  In many cases, this
acts as an invariant -- "this is data; nothing should be
mucking with it so mark it not-writeable; furthermore, I
know that it is not intended to be representative of
program opcodes so mark it as non-executable... and tell
me if something has been overlooked, at run-time, that
causes one of these types of accesses to transpire!"

So, it's a belts-and-braces mechanism.

But, for no-read and no-write, there are real uses
that rely on the hardware protections for successful
implementation (e.g., CoW, zero-copy semantics, etc.).

A developer expecting to use these hooks simply to
give enhanced confidence in his code (i.e., nothing
ever overflowing the stack) COULD, possibly, live
without them.

OTOH, a developer relying on them to implement specific
*features* REQUIRES their presence.

So, if, at run-time, you requested a particular set of
protections on a region of memory and they weren't
granted (e.g., because they aren't supported on this
processor instance), you might <shrug> in some cases
and <panic> in others.

For no-execute, I don't see any use beyond that of enhancing
confidence in the codebase; I don't see any *features* that
rely on detecting "legal" attempts to execute particular
regions of memory (assuming execute-access is treated as
a special case of read-access).

The ONE question I am posing here is "are there any?"
On 7/13/2020 1:30 PM, Don Y wrote:
> The ONE question I am posing here is "are there any?"
A colleague has come up with a few "non-contrived" examples so the answer is "yes", apparently! OK, so, that will make things quite a bit easier...
Don Y <blockedofcourse@foo.invalid> wrote:
> On 7/13/2020 1:30 PM, Don Y wrote: > > The ONE question I am posing here is "are there any?" > > A colleague has come up with a few "non-contrived" examples > so the answer is "yes", apparently! OK, so, that will make > things quite a bit easier...
Do you feel like sharing them? I'm assuming you're discounting all the usual demand-paging things that happen with read access - ie you're interested in execute rather than things that would happen when reading the same page. One I can think of is a form of dynamic linking. You don't load in all the shared libraries at the beginning, the branch instructions simply point to pages not backed by physical RAM. When you take a branch you trap, and then dynamic linker goes in, loads the library, and fixes up all the branch instructions to point to the actual symbols in the dynamic library. That's roughly demand-paging but with additional fixing up afterwards. Another is if you want to enforce some kind of preconditions when executing a particular piece of code. Perhaps we want to make sure you have the secret token in a register before you're allowed to execute, or have called the code from a permitted context. So the kernel maps the pages non-executable and, when the trap is taken, it checks the preconditions before deciding whether to allow execution to proceed. A final one might be a form of JIT, for example in binary translation. You don't want to translate the whole executable in one go, so you don't bother translating anything pointed to by a branch, you just write a branch to nowhere. If the branch to nowhere is taken, you then translate a new block of code and fill in the branch to point to it. This might apply to higher level language translation too - you can put 'branches to nowhere' to point anywhere there's a route of control flow that you haven't yet handled, and fix them up if that route ever gets taken. Or maybe you could decide to interpret a particular code path instead of translating it - perhaps (for whatever reason) you decide latency is more important than speed. I'm sure we could think of some more... Theo
On Mon, 13 Jul 2020 13:30:16 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>For no-execute, I don't see any use beyond that of enhancing >confidence in the codebase; I don't see any *features* that >rely on detecting "legal" attempts to execute particular >regions of memory (assuming execute-access is treated as >a special case of read-access). > >The ONE question I am posing here is "are there any?"
I'm not sure what you're getting at here: what would be a "legal" attempt to execute NX protected memory? Obviously something like a JIT compiler needs to be able to switch off NX for the code buffer so a program can execute it ... but there's no point to allowing access before the code is ready. George
Am 14.07.2020 um 12:21 schrieb George Neuner:
> On Mon, 13 Jul 2020 13:30:16 -0700, Don Y >> For no-execute, I don't see any use beyond that of enhancing >> confidence in the codebase; I don't see any *features* that >> rely on detecting "legal" attempts to execute particular >> regions of memory (assuming execute-access is treated as >> a special case of read-access). >> >> The ONE question I am posing here is "are there any?" > > I'm not sure what you're getting at here: what would be a "legal" > attempt to execute NX protected memory? > > Obviously something like a JIT compiler needs to be able to switch off > NX for the code buffer so a program can execute it ... but there's no > point to allowing access before the code is ready.
The "+1" of a JIT compiler would be a virtualisation solution that has paged in the whole guest system already, but wants to analyze or redirect attempts to execute code. As I understand, older virtualisation solutions for x86 worked that way because x86 has some instructions that you want to virtualize but that do not trap on older processors (e.g. sgdt, str). You had to play some extra tricks because x86 originally didn't have per-page NX bits. Stefan
On 7/14/2020 2:40 AM, Theo wrote:
> Don Y <blockedofcourse@foo.invalid> wrote: >> On 7/13/2020 1:30 PM, Don Y wrote: >>> The ONE question I am posing here is "are there any?" >> >> A colleague has come up with a few "non-contrived" examples >> so the answer is "yes", apparently! OK, so, that will make >> things quite a bit easier... > > Do you feel like sharing them?
One (or more) threads WITHIN a task (sharing the SAME logical address space as the thread that may, eventually, try to EXECUTE "marked" code) can be building/modifying that code image asynchronously wrt the thread that wants to execute it. Think of the no execute provision as a form of implicit lock -- instead of having to (prematurely and possibly unnecessarily!) EXPLICITLY block the thread(s) that might want to fetch opcodes from that region. The same sorts of things apply inside the (multithreaded) kernel. I.e., don't restrict your thinking to kernel vs. user activities in a given memory region. [Note that this can include activities beyond JIT'ing -- but I don't care to expound on those] As all threads (in a task/kernel) share the same address mappings/protections, there's no way to ensure that thread X can diddle with the page while thread Y can't (yet) execute its contents (unless you modify the protections on a per THREAD basis AND have a uniprocessor). OTOH, if you exploit the fact that X's accesses will be "read/write" (diddling) while Y accesses as "execute" -- different permissions enable/restrict the threads implicitly. My interest is in the "protection" API... NOT wanting to have to special-case execute permissions differently than read or write. If the developer's code can tolerate NOT having a read or write permission that he had "hoped for", then he should be able to tolerate not having an execute permission, too! Conversely, if the developer RELIED on having read or write traps then he should be able to express that dependency in the execute permission of the API. I.e., if your *desire* for a particular set of permission(s) at a point in your code is just to enhance reliability, then you can <shrug> off not having ANY of the desired set. OTOH, if your desire stems out of a NEED for a particular set of permissions -- cuz your algorithm relies on them -- then you will have to <panic>. The API has to expose these different cases to the developer. [Or, more properly, your task won't be admitted to a host that doesn't advertise those capabilities as being available; you'll have to be installed/migrated onto a host that DOES! Just like if your code RELIES on floating point hardware being present (because the emulation is too slow)]
On 7/14/2020 3:21 AM, George Neuner wrote:
> On Mon, 13 Jul 2020 13:30:16 -0700, Don Y > <blockedofcourse@foo.invalid> wrote: > >> For no-execute, I don't see any use beyond that of enhancing >> confidence in the codebase; I don't see any *features* that >> rely on detecting "legal" attempts to execute particular >> regions of memory (assuming execute-access is treated as >> a special case of read-access). >> >> The ONE question I am posing here is "are there any?" > > I'm not sure what you're getting at here: what would be a "legal" > attempt to execute NX protected memory?
The same sort of thing that would be a "legal" attempt to read "no read" memory. In THAT case, deliberately exploiting the protection mechanism to detect references to memory that hasn't yet been mapped! A similar case exists for trapping writes to "no write" memory (e.g., CoW). Both are examples of "legal" accesses that have been preempted to implement FEATURES. So, code that relies on these hooks would have to <panic> if the hooks were not available. OTOH, if you use the protections to detect errant accesses ("can't happen"), then you are relying on it to reinforce your belief that your code is robust. The absence of the hooks might be troubling (if you lack confidence in your testing regime) but shouldn't interfere with your code's proper operation.
> Obviously something like a JIT compiler needs to be able to switch off > NX for the code buffer so a program can execute it ... but there's no > point to allowing access before the code is ready
Unless you *truly* JIT the code "as needed" (i.e., a memory region at a time) *or* are JIT'ing in parallel with execution. The executing thread could be accessing "already JIT'ed" portions of the code without having to wait for the JIT compiler to "finish". [Think about how a code image would be built, in detail, if it spanned multiple memory allocation units AND YOU DIDN'T HAVE A BACKING STORE!]
On Tue, 14 Jul 2020 11:48:14 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 7/14/2020 3:21 AM, George Neuner wrote: >> On Mon, 13 Jul 2020 13:30:16 -0700, Don Y >> <blockedofcourse@foo.invalid> wrote: >> >>> For no-execute, I don't see any use beyond that of enhancing >>> confidence in the codebase; I don't see any *features* that >>> rely on detecting "legal" attempts to execute particular >>> regions of memory (assuming execute-access is treated as >>> a special case of read-access). >>> >>> The ONE question I am posing here is "are there any?" >> >> I'm not sure what you're getting at here: what would be a "legal" >> attempt to execute NX protected memory? > >The same sort of thing that would be a "legal" attempt to read >"no read" memory. In THAT case, deliberately exploiting the protection >mechanism to detect references to memory that hasn't yet been mapped! > >A similar case exists for trapping writes to "no write" memory (e.g., CoW). > >Both are examples of "legal" accesses that have been preempted to >implement FEATURES. So, code that relies on these hooks would have to ><panic> if the hooks were not available.
Not really. There are legitimate reasons to read-protect readable memory: to enable hardware assisted GC for instance. But you don't GC executable code ... at least not *while* it still is executable ... so the analogy to protecting readable data doesn't hold. The idea that you might try to verify "right to execute" doesn't wash either, because once a code is unprotected anyone can execute it - not just one process that somehow has been "verifed". Do this kind of thing requires segmentation (or real capabilities), not just page NX (unless the protection somehow is virtualized individually for all processes - which is not the case with any existing CPU).
>OTOH, if you use the protections to detect errant accesses ("can't happen"), >then you are relying on it to reinforce your belief that your code is robust. >The absence of the hooks might be troubling (if you lack confidence in >your testing regime) but shouldn't interfere with your code's proper operation. > >> Obviously something like a JIT compiler needs to be able to switch off >> NX for the code buffer so a program can execute it ... but there's no >> point to allowing access before the code is ready > >Unless you *truly* JIT the code "as needed" (i.e., a memory region at >a time) *or* are JIT'ing in parallel with execution. The executing thread >could be accessing "already JIT'ed" portions of the code without having >to wait for the JIT compiler to "finish".
Demand JIT is no problem, even with multiple threads, as long as it is done function by function, and function calls are indirected through a jump table. The initial table entry traps into the runtime so callers get blocked until the compile finishes, then the table entry is changed to point to the new code [and waiting callers are unblocked]. Runtime/OS trap, yes. MMU fault, no! There are parallel JIT systems that initially compile with little/no optimization, and then monitor demand for the native code. When a code path is found to be in "high" demand (for some definition) then it gets recompiled with better optimization. But that requires allowing multiple versions of the code to exist simultaneously, so as not to disrupt programs that may be USING the existing version while a new one is being created. Doing this sort of thing requires significant resources in the host. And in some cases, such systems retain ALL versions of the code and each process is allowed to trap individually into optimized code paths. This lets the runtime focus on high demand code paths and not waste resources optimizing low demand paths, e.g., low frequency exception handlers.
>[Think about how a code image would be built, in detail, if it spanned >multiple memory allocation units AND YOU DIDN'T HAVE A BACKING STORE!]
I certainly would *NOT* try to compile in parallel with execution under those circumstances: regardless of whether the "distribution" source image remains available somewhere else, I would do a batch compile to create the native execution image. And I certainly would not overwrite the source image as I compiled it. At best, I would compile it in convenient chunks, building up the execution image piece by piece, and discarding source chunk by chunk only when no longer needed. Android has done this for years, since v5 "Lollipop" when they ditched the Dalvik machine for the ART runtime. When you install an app mow, the bytecode "distribution" version immediately gets translated into native code. The native version is stored on the device and the bytecode version is discarded. George
On 7/15/2020 11:02 PM, George Neuner wrote:
> On Tue, 14 Jul 2020 11:48:14 -0700, Don Y > <blockedofcourse@foo.invalid> wrote: > >> On 7/14/2020 3:21 AM, George Neuner wrote: >>> On Mon, 13 Jul 2020 13:30:16 -0700, Don Y >>> <blockedofcourse@foo.invalid> wrote:
>>>> The ONE question I am posing here is "are there any?" >>> >>> I'm not sure what you're getting at here: what would be a "legal" >>> attempt to execute NX protected memory? >> >> The same sort of thing that would be a "legal" attempt to read >> "no read" memory. In THAT case, deliberately exploiting the protection >> mechanism to detect references to memory that hasn't yet been mapped! >> >> A similar case exists for trapping writes to "no write" memory (e.g., CoW). >> >> Both are examples of "legal" accesses that have been preempted to >> implement FEATURES. So, code that relies on these hooks would have to >> <panic> if the hooks were not available. > > Not really. There are legitimate reasons to read-protect readable > memory: to enable hardware assisted GC for instance.
You're just reinforcing my distinction of cases where a particular protection is REQUIRED by the code vs. DESIRED by the developer. If your code RELIED on hardware assist for GC (note my emphasis on RELIED!), then, if the hosting processor didn't provide that support, your code would fail/panic.
> But you don't GC executable code ... at least not *while* it still is > executable ... so the analogy to protecting readable data doesn't > hold.
Marking a part of memory as "no execute" -- KNOWING that you will eventually want to execute <something> that (will) reside there -- is comparable to marking a part of memory as "no read/access" KNOWING that you will eventually want to read something that WILL reside there, after it is "loaded" (or "something else") from some other mechanism (which may or may not be backing store). Marking it as "no execute" because it's an embedded JPEG image is entirely different -- your code SHOULDN'T be trying to execute a JPEG; you wouldn't rely on this mechanism for your code's correctness.
> The idea that you might try to verify "right to execute" doesn't wash > either, because once a code is unprotected anyone can execute it - not > just one process that somehow has been "verifed".
It's not a question of "verifying right to execute" -- any more than your GC example is "verifying right to read". You develop the algorithm (GC in your case) with a reliance (or not) on having a particular mechanism at your disposal. In the absence of that mechanism (note my "ONE question"!), you either <panic> or fall back on an alternative algorithm (i.e., if you don't want to code and maintain an alternative algorithm -- or, if that algorithm doesn't satisfy the same requirements as the first -- then you can only <panic>) And, you can only execute code that is made available to you (process). UNavailable means "not in YOUR address space" *or* "not marked as executable". Just like data that isn't in your address space (or is marked as no-write) isn't writable even if someone ELSE can write it! [I can't prevent individual THREADS in the same process container as the INTENDED thread from executing that code. But, that's not a loss of control as the "intended" thread could just as easily decide to do whatever the other threads would!]
> Do this kind of > thing requires segmentation (or real capabilities), not just page NX > (unless the protection somehow is virtualized individually for all > processes - which is not the case with any existing CPU).
Huh? I can set a "no execute" permission at page granularity. And, set that protection ONLY in the translation table for process A. Process B's page table may define it as R/W. And, process C's may define it as "no access", at all! So, process B could, potentially, be building the executable image while A is potentially tied up in a trap (if it had tried to execute the page's contents before they were ready) Do you expect ALL processes to share a single address space? I'm designing the SPARC implementation, presently. The smallest page size is a bit larger than I'd like -- but, then again, I don't plan on deploying a SPARC box (but, they are excellent for testing code portability, endianness tolerance, on-the-fly data format conversions, etc.) It also illustrates the problems encountered when designing the "protection" API -- SPARC only lets me support R/O, R/W, R/X, R/W/X, X/O and "no access". After that, I'll tackle the ARM port. I'm hoping to demo a heterogeneous system for the upcoming off-site. Showing real-time redundancy support among dissimilar hosting nodes should be impressive! (i.e., unplugging a SPARC and watching the services that it was supplying migrate to an ARM or x86 -- and /vice versa/ -- while still satisfying any timeliness constraints that were in effect at the time of the interruption) [Beyond that, as I said to Theo, up-thread, I'm not keen on discussing what I'm doing or how I'm doing it! The magic is left as an exercise for the reader -- or, someone sufficiently interested to explore how to ACTUALLY do it (beyond an armchair interest)]
> One I can think of is a form of dynamic linking. You don't load in all > the shared libraries at the beginning, the branch instructions simply > point to pages not backed by physical RAM. When you take a branch you > trap, and then dynamic linker goes in, loads the library, and fixes up > all the branch instructions to point to the actual symbols in the dynamic > library. That's roughly demand-paging but with additional fixing up afterwards.
The A5 world!

The 2024 Embedded Online Conference