EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

64-bit embedded computing is here and now

Started by James Brakefield June 7, 2021
Am 09.06.2021 um 10:40 schrieb David Brown:
> On 09/06/2021 06:16, George Neuner wrote:
>> Since (at least) the Pentium 4 x86 really are a CISC decoder bolted >> onto the front of what essentially is a load/store RISC.
... and at about that time they also abandoned the last traces of their original von-Neumann architecture. The actual core is quite strictly Harvard now, treating the external RAM banks more like mass storage devices than an actual combined code+data memory.
> Absolutely. But from the user viewpoint, it is the ISA that matters -
That depends rather a lot on who gets to be called the "user". x86 are quite strictly limited to the PC ecosystem these days: boxes and laptops built for Mac OS or Windows, some of them running Linux instead. There the "user" is somebody buying hardware and software from completely unrelated suppliers. I.e. unlike in the embedded world we discuss here, the persons writing software for those things had no say at all what type of CPU is used. They're thus not really the "user." If they were, they probably wouldn't be using an x86. ;-) The actual x86 users couldn't care less about the ISA --- the overwhelming majority of them haven't the slightest idea what an ISA even is. Some of them used to have a vague idea that there was some 32bit vs. a 64bit whatchamacallit somewhere in there, but even that has surely faded away by now, as users no longer even face the decision between them.
On 6/9/2021 12:58 PM, Paul Rubin wrote:
> Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> writes: >> But if you're using a RasPi or Beaglebone or something like that, you >> need a reasonably well-upholstered Linux distro, which has to be >> patched regularly. At very least it'll need a kernel, and kernel >> patches affecting security are not exactly rare. > > You're in the same situation with almost anything else connected to the > internet. Think of the notorious "smart light bulbs".
No, that's only if you didn't adequately prepare for such "exposure". How many Linux/Windows boxes are running un-NEEDED services? Have ports open that shouldn't be? How much emphasis was spent on ekeing out a few percent extra performance from the network stack that could have, instead, been spent on making it more robust? How many folks RUNNING something like Linux/Windows in their product actually know much of anything about what's under the hood? Do they even know how to BUILD a kernel, let alone sort out what it's doing (wrong)? Exposed to the 'net you always are at the mercy of DoS attacks consuming your inbound bandwidth (assuming you have no contrtol of upstream traffic/routing). But, even a saturated network connection doesn't have to crash your device. OTOH, if your box is dutifully trying to respond to incoming packets that may be malicious, then you'd better hope that response is "correct" (or at least SAFE) in EVERY case. For any of these mainstream OS's, an adversary can play with an exact copy of yours 24/7/365 to determine its vulnerabilities before ever approaching your device. And, even dig through the sources (of some) to see how a potential attack could unfold. Your device will likely advertise exactly what version of the kernel (and network stack) it is running. [An adversary can also BUY one of YOUR devices and do the same off-line analysis -- but the analysis will only apply to YOUR device (if you have a proprietary OS/stack) and not a multitude of other exposed devices]
> On the other hand, you are in reasonable shape if the raspberry pi > running your fish tank is only reachable through a LAN or VPN. > Non-networked low end linux boards are also a thing.
Exactly. But that limits utility/accessibility. If you only need moderate/occasional access, you can implement a "stealth mode" that lets the server hide, "unprotected". Or, require all accesses to be initiated from that server (*to* the remote client) -- similar to a call-back modem. And, of course, you can place constraints on what can be done over that connection instead of just treating it as "God Mode". [No, you can't set the heat to 105 degrees in the summer time; I don't care if you happen to have appropriate credentials! And, no, you can't install an update without my verifying you and the update through other mechanisms...]
On 6/9/2021 10:34 AM, Paul Rubin wrote:
> Theo <theom+news@chiark.greenend.org.uk> writes: >>> Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a >>> remote web browser. There's your 64 bit embedded system. >> I suppose there's a question of what embedded tasks intrinsically require >>> 4GiB RAM, and those that do so because it makes programmers' lives easier? > > You can buy a Raspberry Pi 4 with up to 8gb of ram, but the most common > configuration is 2gb. The cpu is 64 bit anyway because why not?
Exactly. Are they going to give you a *discount* for a 32b version? (Here, you can have this one for half of 'FREE'...)
>> There are obviously plenty of computer systems doing that, but the >> question I don't know is what applications can be said to be >> 'embedded' but need that kind of RAM. > > Lots of stuff is using 32 bit cpus with a few KB of ram these days. 32 > bits is displacing 8 bits in the MCU world. > > Is 64 bit displacing 32 bit in application processors like the Raspberry > Pi, even when less than 4GB of ram is involved? I think yes, at least > to some extent, and it will continue. My fairly low end mobile phone > has 2GB of ram and a 64 bit 4-core processor, I think. > > Will 64 bit MCU's displace 32 bit MCUs? I don't know, maybe not.
Some due to need but, I suspect, most due to pricing or other features not available in the 32b world. Just like you don't find PMMUs on 8/16b devices nor in-built NICs.
> Are application processors displacing MCU's in embedded systems? Not > much in portable and wearable stuff (other than phones) at least for > now, but in larger devices I think yes, at least somewhat for now, and > probably more going forward. Even if you're not using networking, it > makes software and UI development a heck of a lot easier.
This -------------------------------^^^^^^^^^^^^^^^^^^^^^^ Elbow room always takes some of the stress out of design. You don't worry (as much) about bumping into limits and, instead, concentrate on solving the problem at hand. The idea of packing 8 'bools' into a byte (cuz I only had a hundred or so of them available) is SO behind me, now! Just use something "more convenient"... eight of them! I pass pages between processes as an efficiency hack -- even if I'm only using a fraction of the page. In smaller processors, I'd be "upset" by this blatant "waste". Instead, I shrug it off and note that it gives me a uniform way of moving data around (instead of having to tweek interfaces to LIMIT the amount of data that I move; or "massage" the data JUST for transport). My "calculator service" uses BigRationals -- because its easier than trying to explain to users writing scripts that arithmetic can overflow, suffer rounding errors, that order of operations is important, etc.
On 6/9/2021 10:07 AM, Theo wrote:
> Paul Rubin <no.email@nospam.invalid> wrote: >> James Brakefield <jim.brakefield@ieee.org> writes: >>> Am trying to puzzle out what a 64-bit embedded processor should look like. >> >> Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a >> remote web browser. There's your 64 bit embedded system. > > I suppose there's a question of what embedded tasks intrinsically require >> 4GiB RAM, and those that do so because it makes programmers' lives easier? > > In other words, you /can/ write a function to detect if your fish tank is > hot or cold in Javascript that runs in a web app on top of Chromium on top > of Linux. Or you could make it out of a 6502, or a pair of logic gates. > > That's complexity that's not fundamental to the application. OTOH > maintaining a database that's larger than 4GB physically won't work without > that amount of memory (or storage, etc). > > There are obviously plenty of computer systems doing that, but the question > I don't know is what applications can be said to be 'embedded' but need that > kind of RAM.
Transcoding multiple video sources (for concurrent clients) in a single appliance? I have ~30 cameras, here. Had I naively designed with them all connected to a "camera processor", I suspect memory would be the least of my concerns (motion and scene recognition in 30 places simultaneously?) Instead, it was "easier" to give each camera its own processor. And, gain extended "remotability" as part of the process. Remember, the 32b address space has to simultaneously hold EVERYTHING that will need to be accessible to your application -- the OS, it's memory requirements, the application(s) tasks, the stacks/heaps for the threads they contain, the data to be processed (in and out), the memory-mapped I/Os consumed by the SoC itself, etc. When you HAVE a capability/resource, it somehow ALWAYS gets used! ;-)
On 6/9/2021 10:56 AM, Dimiter_Popoff wrote:
> On 6/9/2021 4:29, Don Y wrote: >> On 6/8/2021 3:01 PM, Dimiter_Popoff wrote: >> >>>> Am trying to puzzle out what a 64-bit embedded processor should look like. >>>> At the low end, yeah, a simple RISC processor. And support for complex >>>> arithmetic >>>> using 32-bit floats? And support for pixel alpha blending using quad >>>> 16-bit numbers? >>>> 32-bit pointers into the software? >>> >>> The real value in 64 bit integer registers and 64 bit address space is >>> just that, having an orthogonal "endless" space (well I remember some >>> 30 years ago 32 bits seemed sort of "endless" to me...). >>> >>> Not needing to assign overlapping logical addresses to anything >>> can make a big difference to how the OS is done. >> >> That depends on what you expect from the OS. If you are >> comfortable with the possibility of bugs propagating between >> different subsystems, then you can live with a logical address >> space that exactly coincides with a physical address space. > > So how does the linear 64 bt address space get in the way of > any protection you want to implement? Pages are still 4 k and > each has its own protection attributes governed by the OS, > it is like that with 32 bit processors as well (I talk power, I am > not interested in half baked stuff like ARM, risc-v etc., I don't > know if there could be a problem like that with one of these).
With a linear address space, you typically have to link EVERYTHING as a single image to place each thing in its own piece of memory (or use segment based addressing). I can share code between tasks without conflicting addressing; the "data" for one instance of the app is isolated from other instances while the code is untouched -- the code doesn't even need to know that it is being invoked on different "data" from one timeslice to the next. In a flat address space, you'd need the equivalent of a "context pointer" that you'd have to pass to the "shared code". And, have to hope that all of your context could be represented in a single such reference! (I can rearrange physical pages so they each appear "where expected" to a bit of const CODE). Similarly, the data passed (or shared) from one task (process) to another can "appear" at entirely different logical addresses "at the same time" as befitting the needs of each task WITHOUT CONCERN (or awareness) of the existence of the other task. Again, I don't need to pass a pointer to the data; the address space has been manipulated to make sure it's where it should be. The needs of a task can be met by resources "harvested" from some other task. E.g., where is the stack for your TaskA? How large is it? How much of it is in-use *now*? How much can it GROW before it bumps into something (because that something occupies space in "its" address space). I start a task (thread) with a single page of stack. And, a limit on how much it is allowed to consume during its execution. Then, when it pushes something "off the end" of that page, I fault a new page in and map it at the faulting address. This continues as the task's stack needs grow. When I run out of available pages, I do a GC cycle to reclaim pages from (other?) tasks that are no longer using them. In this way, I can effectively SHARE a stack (or heap) between multiple tasks -- without having to give any consideration for where, in memory, they (or the stacks!) reside. I can move a page from one task (full of data) to another task at some place that the destination task finds "convenient". I can import a page from another network device or export one *to* another device. Because each task's address space is effectively empty/sparse, mapping a page doesn't require much effort to find a "free" place for it. I can put constraints on each such mapping -- and then runtime checks to ensure "things are as I expect": "Why is this NIC buffer residing in this particular portion of the address space?" With a task bound to a semicontiguous portion of memory, it can deal with that region as if it was a smaller virtual region. I can store 32b pointers to things if I know that my addresses are based from 0x000 and the task never extends beyond a 4GB region. If available, I can exploit "shorter" addressing modes.
> There is *nothing* to gain on a 64 bit machine from segmentation, assigning > overlapping address spaces to tasks etc.
What do you gain by NOT using it? You're still dicking with the MMU. (if you aren't then what value the MMU in your "logical" space? map each physical page to a corresponding logical page and never talk to the MMU again; store const page tables and let your OS just tweek the base pointer for the TLBs to use for THIS task) You still have to "position" physical resources in particular places (and you have to deal with the constraints of all tasks, simultaneously, instead of just those constraints imposed by the "current task")
> Notice I am talking *logical* addresses, I was explicit about > that.
On 09/06/2021 22:52, Hans-Bernhard Br&ouml;ker wrote:
> Am 09.06.2021 um 10:40 schrieb David Brown: >> On 09/06/2021 06:16, George Neuner wrote: > >>> Since (at least) the Pentium 4 x86 really are a CISC decoder bolted >>> onto the front of what essentially is a load/store RISC. > > ... and at about that time they also abandoned the last traces of their > original von-Neumann architecture.&nbsp; The actual core is quite strictly > Harvard now, treating the external RAM banks more like mass storage > devices than an actual combined code+data memory. > >> Absolutely.&nbsp; But from the user viewpoint, it is the ISA that matters - > > That depends rather a lot on who gets to be called the "user". >
I meant "the person using the ISA" - i.e., the programmer. And even then, I meant low-level programmers who have to understand things like memory models, cache thrashing, coding for vectors and SIMD, etc. These are the people who see the ISA. I was not talking about the person wiggling the mouse and watching youtube!
On 6/10/2021 3:12, Don Y wrote:
> On 6/9/2021 10:56 AM, Dimiter_Popoff wrote: >> On 6/9/2021 4:29, Don Y wrote: >>> On 6/8/2021 3:01 PM, Dimiter_Popoff wrote: >>> >>>>> Am trying to puzzle out what a 64-bit embedded processor should >>>>> look like. >>>>> At the low end, yeah, a simple RISC processor.&nbsp; And support for >>>>> complex arithmetic >>>>> using 32-bit floats?&nbsp; And support for pixel alpha blending using >>>>> quad 16-bit numbers? >>>>> 32-bit pointers into the software? >>>> >>>> The real value in 64 bit integer registers and 64 bit address space is >>>> just that, having an orthogonal "endless" space (well I remember some >>>> 30 years ago 32 bits seemed sort of "endless" to me...). >>>> >>>> Not needing to assign overlapping logical addresses to anything >>>> can make a big difference to how the OS is done. >>> >>> That depends on what you expect from the OS.&nbsp; If you are >>> comfortable with the possibility of bugs propagating between >>> different subsystems, then you can live with a logical address >>> space that exactly coincides with a physical address space. >> >> So how does the linear 64 bt address space get in the way of >> any protection you want to implement? Pages are still 4 k and >> each has its own protection attributes governed by the OS, >> it is like that with 32 bit processors as well (I talk power, I am >> not interested in half baked stuff like ARM, risc-v etc., I don't >> know if there could be a problem like that with one of these). > > With a linear address space, you typically have to link EVERYTHING > as a single image to place each thing in its own piece of memory > (or use segment based addressing).
Nothing could be further from the truth. What kind of crippled environment can make you think that? Code can be position independent on processors which are not dead by design nowadays. When I started dps some 27 years ago I allowed program modules to demand a fixed address on which they would reside. This exists to this day and has been used 0 (zero) times. Same about object descriptors, program library modules etc., the first system call I wrote is called "allocm$", allocate memory. You request a number of bytes and you get back an address and the actual number of bytes you were given (it comes rounded by the memory cluster size, typically 4k (a page). This was the *first* thing I did. And yes, all allocation is done using worst fit strategy, sometimes enhanced worst fit - things the now popular OS-s have yet to get to, they still have to defragment their disks, LOL.
> > I can share code between tasks without conflicting addressing; > the "data" for one instance of the app is isolated from other > instances while the code is untouched -- the code doesn't even > need to know that it is being invoked on different "data" > from one timeslice to the next.&nbsp; In a flat address space, > you'd need the equivalent of a "context pointer" that you'd > have to pass to the "shared code".&nbsp; And, have to hope that > all of your context could be represented in a single such > reference!&nbsp; (I can rearrange physical pages so they each > appear "where expected" to a bit of const CODE). > > Similarly, the data passed (or shared) from one task (process) to > another can "appear" at entirely different logical addresses > "at the same time" as befitting the needs of each task WITHOUT > CONCERN (or awareness) of the existence of the other task. > Again, I don't need to pass a pointer to the data; the address > space has been manipulated to make sure it's where it should be.
So how do you pass the offset from the page beginning if you do not pass an address. And how is page manipulation simpler and/or safer than just passing an address, sounds like a recipe for quite a mess to me. In a 64 bit address space there is nothing stopping you to pass addresses or not passing them and allow access to areas you want to and disallow it elsewhere. Other than that there is nothing to be gained by a 64 bit architecture really, on 32 bit machines you do have FPUs, vector units etc. doing calculation probably faster than the integer unit of a 64 bit processor. The *whole point* of a 64 bit core is the 64 bit address space.
> > The needs of a task can be met by resources "harvested" from > some other task.&nbsp; E.g., where is the stack for your TaskA? > How large is it?&nbsp; How much of it is in-use *now*?&nbsp; How much > can it GROW before it bumps into something (because that something > occupies space in "its" address space).
This is the beauty of 64 bit logical address space. You allocate enough logical memory and then you allocate physical on demand, this is what MMUs are there for. If you want to grow your stack indefinitely - the messy C style - you can just allocate it a few gigabytes of logical memory and use the first few kilobytes of it to no waste of resources. Of course there are much slicker ways to deal with memory allocation.
> > I start a task (thread) with a single page of stack.&nbsp; And, a > limit on how much it is allowed to consume during its execution. > Then, when it pushes something "off the end" of that page, > I fault a new page in and map it at the faulting address. > This continues as the task's stack needs grow.
This is called "allocate on demand" and has been around for times immemorial, check my former paragraph.
> > When I run out of available pages, I do a GC cycle to > reclaim pages from (other?) tasks that are no longer using > them.
This is called "memory swapping", also for times immemorial. For the case when there is no physical memory to reclaim, that is. The first version of dps - some decades ago - ran on a CPU32 (a 68340). It had no MMU so I implemented "memory blocks", a task can declare a piece a swap-able block and allow/disallow its swapping. Those blocks would then be shared or written to disk when more memory was needed etc., memory swapping without an MMU. Worked fine, must be still working for code I have not touched since on my power machines, all those decades later.
> > In this way, I can effectively SHARE a stack (or heap) > between multiple tasks -- without having to give any > consideration for where, in memory, they (or the stacks!) > reside.
You can do this in a linear address space, too - this is what the MMU is for.
> > I can move a page from one task (full of data) to another > task at some place that the destination task finds "convenient". > I can import a page from another network device or export > one *to* another device.
So instead of simply passing an address you have to switch page translation entries, adjust them on each task switch, flush and sync whatever it takes - does not sound very efficient to me.
> > Because each task's address space is effectively empty/sparse, > mapping a page doesn't require much effort to find a "free" > place for it.
This is the beauty of having the 64 bit address space, you always have enough logical memory. The "64 bit address space per task" buys you *nothing*. Dimiter ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:

[attrs elided]

>>>>> Not needing to assign overlapping logical addresses to anything >>>>> can make a big difference to how the OS is done. >>>> >>>> That depends on what you expect from the OS. If you are >>>> comfortable with the possibility of bugs propagating between >>>> different subsystems, then you can live with a logical address >>>> space that exactly coincides with a physical address space. >>> >>> So how does the linear 64 bt address space get in the way of >>> any protection you want to implement? Pages are still 4 k and >>> each has its own protection attributes governed by the OS, >>> it is like that with 32 bit processors as well (I talk power, I am >>> not interested in half baked stuff like ARM, risc-v etc., I don't >>> know if there could be a problem like that with one of these). >> >> With a linear address space, you typically have to link EVERYTHING >> as a single image to place each thing in its own piece of memory >> (or use segment based addressing). > > Nothing could be further from the truth. What kind of crippled > environment can make you think that? Code can be position > independent on processors which are not dead by design nowadays. > When I started dps some 27 years ago I allowed program modules > to demand a fixed address on which they would reside. This exists > to this day and has been used 0 (zero) times. Same about object > descriptors, program library modules etc., the first system call > I wrote is called "allocm$", allocate memory. You request a number > of bytes and you get back an address and the actual number of > bytes you were given (it comes rounded by the memory cluster > size, typically 4k (a page). This was the *first* thing I did. > And yes, all allocation is done using worst fit strategy, sometimes > enhanced worst fit - things the now popular OS-s have yet to get to, > they still have to defragment their disks, LOL.
You missed my point -- possibly because this issue was raised BEFORE pointing out how much DYNAMIC management of the MMU (typically an OS delegated acticity) "buys you": "That depends on what you expect from the OS." If you can ignore the MMU *completely*, then the OS is greatly simplified. YOU (developer) take on the responsibilites of remembering what is where, etc. EVERYTHING is visible to EVERYONE and at EVERYTIME. The OS doesn't have to get involved in the management of objects/tasks/etc. That's YOUR responsibility to ensure your taskA doesn't go dicking around with taskB's resources. Welcome to the 8/16b world! The next step up is to statically deploy the MMU. You build a SINGLE logical address space to suit your liking. Then, map the underlying physical resources to it as best fits. And, this never needs to change -- memory doesn't "move around", it doesn't change characteristics (readable, writeable, exeuctable, accessable-by-X, etc.)! But, you can't then change permissions based on which task is executing -- unless you want to dick with the MMU dynamically (or swap between N discrete sets of STATIC page tables that define the many different ways M tasks can share permissions) So, you *just* use the MMU as a Memory Protection Unit; you mark sections of memory that have CODE in them as no-write, you mark regions with DATA as no-execute, and everything else as no-access. And that's the way it stays for EVERY task! This lets you convert RAM to ROM and prevents "fetches" from "DATA" memory. It ensures your code is never overwritten and that the processor never tries to execute out of "data memory" and NOTHING tries to access address regions that are "empty"! You've implemented a 1980's vintage protection scheme (this is how we designed arcade pieces, back then, as you wanted your CODE and FRAME BUFFER to occupy the same limited range of addresses) <yawn> Once you start using the MMU to dynamically *manage* memory (which includes altering protections and re-mapping), then the cost of the OS increases -- because these are typically things that are delegated *to* the OS. Whether or not you have overlapping address spaces or a single flat address space is immaterial -- you need to dynamically manage separate page tables for each task in either scheme. You can't argue that the OS doesn't need to dick with the MMU "because it's a flat address space" -- unless you forfeit those abilities (that I illustrated in my post). If you want to compare a less-able OS to one that is more featured, then its disingenuous to blame that on overlapping address spaces; the real "blame" lies in the support of more advanced features. The goal of an OS should be to make writing *correct* code easier by providing features as enhancements. It's why the OS typically reads disk files instead of replicating that file system and driver code into each task that needs to do so. Or, why it implements delays/timers -- so each task doesn't reinvent the wheel (with its own unique set of bugs). You can live without an OS. But, typically only for a trivial application. And, you're not likely to use a 64b processor just to count characters received on a serial port! Or as an egg timer!
>> I can share code between tasks without conflicting addressing; >> the "data" for one instance of the app is isolated from other >> instances while the code is untouched -- the code doesn't even >> need to know that it is being invoked on different "data" >> from one timeslice to the next. In a flat address space, >> you'd need the equivalent of a "context pointer" that you'd >> have to pass to the "shared code". And, have to hope that >> all of your context could be represented in a single such >> reference! (I can rearrange physical pages so they each >> appear "where expected" to a bit of const CODE). >> >> Similarly, the data passed (or shared) from one task (process) to >> another can "appear" at entirely different logical addresses >> "at the same time" as befitting the needs of each task WITHOUT >> CONCERN (or awareness) of the existence of the other task. >> Again, I don't need to pass a pointer to the data; the address >> space has been manipulated to make sure it's where it should be. > > So how do you pass the offset from the page beginning if you do > not pass an address.
YOU pass an object to the OS and let the OS map it where *it* wants, with possible hints from the targeted task (logical address space). I routinely pass multiple-page-sized objects around the system. "Here's a 20MB telephone recording, memory mapped (to wherever YOU, its recipient, want it). Because it is memory mapped and has its own pager, the actual amount of physical memory that is in use at any given time can vary -- based on the resource allocation you've been granted and the current resource availability in the system. E.g., there may be as little as one page of physical data present at any given time -- and that page may "move" to back a different logical address based on WHERE you are presently looking! Go through and sort out when Bob is speaking and when Tom is speaking. "Return" an object of UNKNOWN length that lists each of these time intervals along with the speaker assumed to be talking in each. Tell me where you (the OS) decided it would best fit into my logical address space, after consulting the hint I provided (but that you may not have been able to honor because the result ended up *bigger* than the "hole" I had imagined it fitting into). No need to tell me how big it really is as I will be able to parse it (cuz I know how you will have built that list) and the OS will track the memory that it uses so all I have to do is free() it (it may be built out of 1K pages, 4K pages, 16MB pages)!" How is this HARDER to do when a single task has an entire 64b address space instead of when it has to SHARE *a* single address space among all tasks/objects?
> And how is page manipulation simpler and/or safer than just passing > an address, sounds like a recipe for quite a mess to me.
The MMU has made that mapping a "permanent" part of THIS task's address space. It isn't visible to any other task -- why *should* it be? Why does the pointer need to indirectly reflect the fact that portions of that SINGLE address space are ineligible to contain said object because of OTHER unrelated (to this task) objects??
> In a 64 bit address space there is nothing stopping you to > pass addresses or not passing them and allow access to areas > you want to and disallow it elsewhere.
And I can't do that in N overlapping 64b address spaces? The only "win" you get is by exposing everything to everyone. That's not the way software is evolving. Compartmentalization (to protect from other actors), opacity (to hide implementation details), accessors (instead of exposing actual data), etc. This comes at a cost -- in performance as well as OS design. But, *seems* to be worth the effort, given how "mainstream" development is heading.
> Other than that there is nothing to be gained by a 64 bit architecture > really, on 32 bit machines you do have FPUs, vector units etc. > doing calculation probably faster than the integer unit of a > 64 bit processor. > The *whole point* of a 64 bit core is the 64 bit address space.
No, the whole point of a 64b core is the 64b registers. You can package a 64b CPU so that only 20! address lines are bonded out. This limits the physical address space to 20b. What value to making the logical address space bigger -- so you can leave gaps for expansion between objects??
>> The needs of a task can be met by resources "harvested" from >> some other task. E.g., where is the stack for your TaskA? >> How large is it? How much of it is in-use *now*? How much >> can it GROW before it bumps into something (because that something >> occupies space in "its" address space). > > This is the beauty of 64 bit logical address space. You allocate > enough logical memory and then you allocate physical on demand, > this is what MMUs are there for. If you want to grow your stack > indefinitely - the messy C style - you can just allocate it > a few gigabytes of logical memory and use the first few kilobytes > of it to no waste of resources. Of course there are much slicker > ways to deal with memory allocation.
Again, how is this any harder with "overlapping" 64b address spaces? Or, how is it EASIER with nonoverlap?
>> I start a task (thread) with a single page of stack. And, a >> limit on how much it is allowed to consume during its execution. >> Then, when it pushes something "off the end" of that page, >> I fault a new page in and map it at the faulting address. >> This continues as the task's stack needs grow. > > This is called "allocate on demand" and has been around > for times immemorial, check my former paragraph.
I'm not trying to be "novel". Rather, showing that these features come from the MMU -- not a "nonoverlapping" (or overlapping!) address space. I.e., the take away from all this is the MMU is the win AND the cost for the OS. Without it, the OS gets simpler... and less capable!
>> When I run out of available pages, I do a GC cycle to >> reclaim pages from (other?) tasks that are no longer using >> them. > > This is called "memory swapping", also for times immemorial. > For the case when there is no physical memory to reclaim, that > is. > The first version of dps - some decades ago - ran on a CPU32 > (a 68340). It had no MMU so I implemented "memory blocks", > a task can declare a piece a swap-able block and allow/disallow > its swapping. Those blocks would then be shared or written to disk when > more memory was needed etc., memory swapping without an MMU. > Worked fine, must be still working for code I have not > touched since on my power machines, all those decades later.
There's no disk involved. The amount of physical memory is limited to what's on-board (unless I try to move resources to another node or -- *gack* -- use a scratch table in the RDBMS as a backing store). Recovering "no longer in use" portions of stack is "low hanging fruit"; look at the task's stack pointer and you know how much allocated stack is no longer in use. Try to recover it (of course, the task may immediately fault another page back into play but that's an optimization issue). If there is no "low hanging fruit", then I ask tasks to voluntarily relinquish memory. Some tasks may have requested "extra" memory in order to precompute results for future requests/activities. If it was available -- and if the task wanted to "pay" for it -- then the OS would grant the allocation (knowing that it could eventually revoke it!) They could relinquish those resources at the expense of having to recompute those things at a later date ("on demand" *or* when memory is again available). If I can't recover enough resources "voluntarily", then I *take* memory away from a (selected) task and inform it (raise an exception that it will handle as soon as it gets a timeslice) of that "theft". It will either recover from the loss (because it was being greedy and didn't elect to forfeit excess memory that it had allocated when I asked, earlier) *or* it will crash. <shrug> When you run out of resources, SOMETHING has to give (and the OS is better suited to determining WHAT than the individual tasks are... they ALL think *they* are important!) Again, "what do you expect from your OS?"
>> In this way, I can effectively SHARE a stack (or heap) >> between multiple tasks -- without having to give any >> consideration for where, in memory, they (or the stacks!) >> reside. > > You can do this in a linear address space, too - this is what > the MMU is for.
Yes, see? There's nothing special about a flat address space!
>> I can move a page from one task (full of data) to another >> task at some place that the destination task finds "convenient". >> I can import a page from another network device or export >> one *to* another device. > > So instead of simply passing an address you have to switch page > translation entries, adjust them on each task switch, flush and > sync whatever it takes - does not sound very efficient to me.
It's not intended to be fast/efficient. It's intended to ensure that the recipient -- AND ONLY THE RECIPIENT -- is *now* granted access to that page's contents. depending on semantics, it can create a copy of an object or "move" the object, leaving a "hole" in the original location. [I.e., if move semantics, then the original owner shouldn't be trying to access something that he's "given away"! Any access, by him, to that memory region should signal a fatal exception!] If you don't care who sees what, then you don't need the MMU! And we're back to my initial paragraph of this reply! :>
>> Because each task's address space is effectively empty/sparse, >> mapping a page doesn't require much effort to find a "free" >> place for it. > > This is the beauty of having the 64 bit address space, you always > have enough logical memory. The "64 bit address space per task" > buys you *nothing*.
If "always having enough logical memory" is such a great thing, isn't having MORE logical memory (because you've moved other things into OVERLAPPING portions of that memory space) an EVEN BETTER thing? Again, what does your flat addressing BUY the OS in terms of complexity reduction? (your initial assumption) "...a big difference to how the OS is done"
On 6/10/2021 16:55, Don Y wrote:
> On 6/10/2021 3:45 AM, Dimiter_Popoff wrote: > > [attrs elided]
> Don, this becomes way too lengthy and repeating itself. You keep on saying that a linear 64 bit address space means exposing everything to everybody after I explained this is not true at all. You keep on claiming this or that about how I do things without bothering to understand what I said - like your claim that I use the MMU for "protection only". NO, this is not true either. On 32 bit machines - as mine in production are - mapping 4G logical space into say 128M of physical memory goes all the way through page translation, block translation for regions where page translation would be impractical etc. You sound the way I would have sounded before I had written and built on for years what is now dps. The devil is in the detail :-). You pass "objects", pages etc. Well guess what, it *always* boils down to an *address* for the CPU. The rest is generic talk. And if you choose to have overlapping address spaces when you pass a pointer from one task to another the OS has to deal with this at a significant cost. In a linear address space, you pass the pointer *as is* so the OS does not have to deal with anything except access restrictions. In dps, you can send a message to another task - the message being data the OS will copy into that tasks memory, the data being perfectly able to be an address of something in another task's memory. If a task accesses an address it is not supposed to the user is notified and allowed to press CR to kill that task. Then there are common data sections for groups of tasks etc., it is pretty huge really. The concept "one entire address space to all tasks" is from the 60-s if not earlier (I just don't know and don't care to check now) and it has done a good job while it was necessary, mostly on 16 bit CPUs. For today's processors this means just making them run with the handbrake on, *nothing* is gained because of that - no more security (please don't repeat that "expose everything" nonsense), just burning more CPU power, constantly having to remap addresses etc. Dimiter ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
On 6/10/2021 8:32 AM, Dimiter_Popoff wrote:
> On 6/10/2021 16:55, Don Y wrote: >> On 6/10/2021 3:45 AM, Dimiter_Popoff wrote: >> >> [attrs elided] > > > Don, this becomes way too lengthy and repeating itself. > > You keep on saying that a linear 64 bit address space means exposing > everything to everybody after I explained this is not true at all.
Task A has built a structure -- a page worth of data residing at 0x123456. It wants to pass this to TaskB so that TaskB can perform some operations on it. Can TaskB acccess the data at 0x123456 *before* TaskA has told it to do so? Can TaskB access the data at 0x123456 WHILE TaskA is manipulating it? Can TaskA alter the data at 0x123456 *after* it has "passed it along" to TaskB -- possibly while TaskB is still using it?
> You keep on claiming this or that about how I do things without > bothering to understand what I said - like your claim that I use the MMU > for "protection only".
I didn't say that YOU did that. I said that to be able to ignore the MMU after setting it up, you can ONLY use it to protect code from alteration, data from execution, etc. The "permissions" that it applies have to be invariant over the execution time of ALL of the code. So, if you DON'T use it "for protection only", then you are admitting to having to dynamically tweek it. *THIS* is the cost that the OS incurs -- and having a flat address space doesn't make it any easier! If you aren't incurring that cost, then you're not protecting something.
> NO, this is not true either. On 32 bit machines - as mine in > production are - mapping 4G logical space into say 128M of physical > memory goes all the way through page translation, block translation > for regions where page translation would be impractical etc. > You sound the way I would have sounded before I had written and > built on for years what is now dps. The devil is in the detail :-). > > You pass "objects", pages etc. Well guess what, it *always* boils > down to an *address* for the CPU. The rest is generic talk.
Yes, the question is "who manages the protocol for sharing". Since forever, you could pass pointers around and let anyone access anything they wanted. You could impose -- but not ENFORCE -- schemes that ensured data was shared properly (e.g., so YOU wouldn't be altering data that *I* was using). [Monitors can provide some structure to that sharing but are costly when you consider the number of things that may potentially need to be shared. And, you can still poke directly at the data being shared, bypassing the monitor, if you want to (or have a bug)] But, you had to rely on programming discipline to ensure this worked. Just like you have to rely on discipline to ensure code is "bugfree" (how's that worked for the industry?)
> And if you choose to have overlapping address spaces when you > pass a pointer from one task to another the OS has to deal with this > at a significant cost.
How does your system handle the above example? How do you "pass" the pointer from TaskA to TaskB -- if not via the OS? Do you expose a shared memory region that both tasks can use to exchange data and hope they follow some rules? Always use synchronization primitives for each data exchange? RELY on the developer to get it right? ALWAYS? Once you've passed the pointer, how does TaskB access that data WITHOUT having to update the MMU? Or, has TaskB had access to the data all along? What happens when B wants to pass the modified data to C? Does the MMU have to be updated (C's tables) to grant that access? Or, like B, has C had access all along? And, has C had to remain disciplined enough not to go mucking around with that region of memory until A *and* B have done modifying it? I don't allow anyone to see anything -- until the owner of that thing explicitly grants access. If you try to access something before it's been made available for your access, the OS traps and aborts your process -- you've violated the discipline and the OS is going to enforce it! In an orderly manner that doesn't penalize other tasks that have behaved properly.
> In a linear address space, you pass the pointer *as is* so the OS does > not have to deal with anything except access restrictions. > In dps, you can send a message to another task - the message being > data the OS will copy into that tasks memory, the data being > perfectly able to be an address of something in another task's
So, you don't use the MMU to protect TaskA's resources from TaskB (or TaskC!) access. You expect LESS from your OS.
> memory. If a task accesses an address it is not supposed to > the user is notified and allowed to press CR to kill that task.
What are the addresses "it's not supposed to?" Some *subset* of the addresses that "belong" to other tasks? Perhaps I can access a buffer that belongs to TaskB but not TaskB's code? Or, some OTHER buffer that TaskB doesn't want me to see? Do you explicitly have to locate ("org") each buffer so that you can place SOME in protected portions of the address space and others in shared areas? How do you change these distinctions dynamically -- or, do you do a lot of data copying from "protected" space to "shared" space?
> Then there are common data sections for groups of tasks etc., > it is pretty huge really.
Again, you expose things by default -- even if only a subset of things. You create shared memory regions where there are no protections and then rely on your application to behave and not access data (that has been exposed for its access) until it *should*. Everybody does this. And everyone has bugs as a result. You are relying on the developer to *repeatedly* implement the sharing protocol -- instead of relying on the OS to enforce that for you. It's like putting tons of globals in your application -- to make data sharing easier (and, thus, more prone to bugs). You expect less of your OS. My tasks are free to do whatever they want in their own protection domain. They KNOW that nothing can SEE the data they are manipulating *or* observe HOW they are manipulating it or *influence* their manipulation of it. Until they want to expose that data. And, then, only to those entities that they think SHOULD see it. They can give (hand-off) data to another entity -- much like call-by-value semantics -- and have the other entity know that NOTHING that the original "donor" can do AFTER that handoff will affect the data that has been "passed" to them. Yet, they can still manipulate that data -- update it or reuse that memory region -- for the next "client". The OS enforces these guarantees. Much more than just passing along a pointer to the data! Trying to track down the donor's alteration of data while the recipient is concurrently accessing it (multiple tasks, multiple cores, multiple CPUs) is a nightmare proposition. And, making an *unnecessary* copy of it is a waste of resources (esp if the two parties actually ARE well-behaved)
> The concept "one entire address space to all tasks" is from the 60-s > if not earlier (I just don't know and don't care to check now) and it > has done a good job while it was necessary, mostly on 16 bit CPUs. > For today's processors this means just making them run with the > handbrake on, *nothing* is gained because of that - no more security > (please don't repeat that "expose everything" nonsense), just > burning more CPU power, constantly having to remap addresses etc.
Remapping is done in hardware. The protection overhead is a matter of updating page table entries. *You* gain nothing by creating a flat address space because *you* aren't trying to compartmentalize different tasks and subsystems. You likely protect the kernel's code/data from direct interference from "userland" (U/S bit) but want the costs of sharing between tasks to be low -- at the expense of forfeiting protections between them. *Most* of the world consists of imperfect coders. *Most* of us have to deal with colleagues (of varying abilities) before, after and during our tenure running code on the same CPU as our applications. "The bug is (never!) in my code! So, it MUST be in YOURS!" You can either stare at each other, confident in the correctness of your own code. Or, find the bug IN THE OTHER GUY'S CODE (you can't prove yours is correct anymore than he can; so you have to find the bug SOMEWHERE to make your point), effectively doing his debugging *for* him. Why do you think desktop OS's go to such lengths to compartmentalize applications? Aren't the coders of application A just as competent as those who coded application B? Why would you think application A might stomp on some resource belonging to application B? Wouldn't that be a violation of DISCIPLINE (and outright RUDE)? You've been isolated from this for far too long. So, don't see what it's like to have to deal with another(s)' code impacting the same product that *you* are working on. Encapsulation and opacity are the best ways to ensure all interactions to your code/data are through permitted interfaces. "Who overwrote my location 0x123456? I know *I* didn't..." "Who turned on power to the motor? I'm the only one who should do so!" "Who deleted the log file?" There's a reason we eschew globals! I can ensure TaskB can't delete the log file -- by simply denying him access to logfile.delete(). But, letting him use logfile.append() as much as he wants! At the same time, allowing TaskA to delete or logfile.rollover() as it sees fit -- because I've verified that TaskA does this appropriately as part of its contract. And, there's no NEED for TaskB to ever do so -- it's not B's responsibility (so why allow him the opportunity to ERRONEOUSLY do so -- and then have to chase down how this happened?) If TaskB *tries* to access logfile.delete(), I can trap to make his violation obvious: "Reason for process termination: illegal access" And, I don't need to do this with pointers or hardware protection of the pages in which logfile.delete() resides! I just don't let him invoke *that* method! I *expect* my OS to provide these mechanisms to the developer to make his job easier AND the resulting code more robust. There is a cost to all this. But, *if* something misbehaves, it leaves visible evidence of its DIRECT actions; you don't have to wonder WHEN (in the past) some datum was corrupted that NOW manifests as an error in some, possibly unrelated, manner. Of course, you don't need any of this if you're a perfect coder. You don't expose the internals of your OS to your tasks, do you? Why? Don't you TRUST them to observe proper discipline in their interactions with it? You trust them to observe those same disciplines when interacting with each other... Why can't TaskA see the preserved state for TaskB? Don't you TRUST it to only modify it if it truly knows what it's doing? Not the result of resolving some errant pointer? Welcome to the 70's!

The 2024 Embedded Online Conference