EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

64-bit embedded computing is here and now

Started by James Brakefield June 7, 2021
On 6/8/2021 23:18, David Brown wrote:
> On 08/06/2021 16:46, Theo wrote: >> ...... > >> Memory bus/cache width > > No, that is not a common way to measure cpu "width", for many reasons. > A chip is likely to have many buses outside the cpu core itself (and the > cache(s) may or may not be considered part of the core). It's common to > have 64-bit wide buses on 32-bit processors, it's also common to have > 16-bit external databuses on a microcontroller. And the cache might be > 128 bits wide.
I agree with your points and those of Theo, but the cache is basically as wide as the registers? Logically, that is; a cacheline is several times that, probably you refer to that. Not that it makes much of a difference to the fact that 64 bit data buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are useless to me) are unlikely to attract much interest, nothing of significance to be gained as you said. To me 64 bit CPUs are of interest of course and thankfully there are some available, but this goes somewhat past what we call "embedded". Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered there is some real mess with their out of order execution, one needs to do... hmmmm.. "sync", whatever they call it, all the time and there is a huge performance cost because of that. Anybody heard anything about it? (I only know what I was told). Dimiter
On Tuesday, June 8, 2021 at 3:11:24 PM UTC-5, David Brown wrote:
> On 08/06/2021 21:38, James Brakefield wrote: > > Could you explain your background here, and what you are trying to get > at? That would make it easier to give you better answers. > > The only thing that will take more than 4GB is video or a day's worth of photos. > No, video is not the only thing that takes 4GB or more. But it is, > perhaps, one of the more common cases. Most embedded systems don't need > anything remotely like that much memory - to the nearest percent, 100% > of embedded devices don't even need close to 4MB of memory (ram and > flash put together). > > So there is likely to be some embedded aps that need a > 32-bit address space. > Some, yes. Many, no. > > Cost, size or storage capacity are no longer limiting factors. > Cost and size (and power) are /always/ limiting factors in embedded systems. > > > > Am trying to puzzle out what a 64-bit embedded processor should look like. > There are plenty to look at. There are ARMs, PowerPC, MIPS, RISC-V. > And of course there are some x86 processors used in embedded systems. > > At the low end, yeah, a simple RISC processor. > Pretty much all processors except x86 and brain-dead old-fashioned 8-bit > CISC devices are RISC. Not all are simple. > > And support for complex arithmetic > > using 32-bit floats? > A 64-bit processor will certainly support 64-bit doubles as well as > 32-bit floats. Complex arithmetic is rarely needed, except perhaps for > FFT's, but is easily done using real arithmetic. You can happily do > 32-bit complex arithmetic on an 8-bit AVR, albeit taking significant > code space and run time. I believe the latest gcc for the AVR will do > 64-bit doubles as well - using exactly the same C code you would on any > other processor. > > And support for pixel alpha blending using quad 16-bit numbers? > You would use a hardware 2D graphics accelerator for that, not the > processor. > > 32-bit pointers into the software? > > > With 64-bit processors you usually use 64-bit pointers.
|> Could you explain your background here, and what you are trying to get at? Am familiar with embedded systems, image processing and scientific applications. Have used a number of 8, 16, 32 and ~64bit processors. Have also done work in FPGAs. Am semi-retired and when working was always trying to stay ahead of new opportunities and challenges. Some of my questions/comments belong over at comp.arch
On 6/8/2021 22:38, James Brakefield wrote:
> On Tuesday, June 8, 2021 at 2:39:29 AM UTC-5, Don Y wrote: >> On 6/7/2021 10:59 PM, David Brown wrote: >>> 8-bit microcontrollers are still far more common than 32-bit devices in >>> the embedded world (and 4-bit devices are not gone yet). At the other >>> end, 64-bit devices have been used for a decade or two in some kinds of >>> embedded systems. >> I contend that a good many "32b" implementations are really glorified >> 8/16b applications that exhausted their memory space. I still see lots >> of designs build on a small platform (8/16b) and augment it -- either >> with some "memory enhancement" technology or additional "slave" >> processors to split the binaries. Code increases in complexity but >> there doesn't seem to be a need for the "work-per-unit-time" to. >> >> [This has actually been the case for a long time. The appeal of >> newer CPUs is often in the set of peripherals that accompany the >> processor, not the processor itself.] >>> We'll see 64-bit take a greater proportion of the embedded systems that >>> demand high throughput or processing power (network devices, hard cores >>> in expensive FPGAs, etc.) where the extra cost in dollars, power, >>> complexity, board design are not a problem. They will probably become >>> more common in embedded Linux systems as the core itself is not usually >>> the biggest part of the cost. And such systems are definitely on the >>> increase. >>> >>> But for microcontrollers - which dominate embedded systems - there has >>> been a lot to gain by going from 8-bit and 16-bit to 32-bit for little >> I disagree. The "cost" (barrier) that I see clients facing is the >> added complexity of a 32b platform and how it often implies (or even >> *requires*) a more formal OS underpinning the application. Where you >> could hack together something on bare metal in the 8/16b worlds, >> moving to 32 often requires additional complexity in managing >> mechanisms that aren't usually present in smaller CPUs (caches, >> MMU/MPU, DMA, etc.) Developers (and their organizations) can't just >> play "coder cowboy" and coerce the hardware to behaving as they >> would like. Existing staff (hired with the "bare metal" mindset) >> are often not equipped to move into a more structured environment. >> >> [I can hack together a device to meet some particular purpose >> much easier on "development hardware" than I can on a "PC" -- simply >> because there's too much I have to "work around" on a PC that isn't >> present on development hardware.] >> >> Not every product needs a filesystem, network stack, protected >> execution domains, etc. Those come with additional costs -- often >> in the form of a lack of understanding as to what the ACTUAL >> code in your product is doing at any given time. (this isn't the >> case in the smaller MCU world; it's possible for a developer to >> have written EVERY line of code in a smaller platform) >>> cost. There is almost nothing to gain from a move to 64-bit, but the >>> cost would be a good deal higher. >> Why is the cost "a good deal higher"? Code/data footprints don't >> uniformly "double" in size. The CPU doesn't slow down to handle >> bigger data. >> >> The cost is driven by where the market goes. Note how many 68Ks found >> design-ins vs. the T11, F11, 16032, etc. My first 32b design was >> physically large, consumed a boatload of power and ran at only a modest >> improvement (in terms of system clock) over 8b processors of its day. >> Now, I can buy two orders of magnitude more horsepower PLUS a >> bunch of built-in peripherals for two cups of coffee (at QTY 1) >>> So it is not going to happen - at >>> least not more than a very small and very gradual change. >> We got 32b processors NOT because the embedded world cried out for >> them but, rather, because of the influence of the 32b desktop world. >> We've had 32b processors since the early 80's. But, we've only had >> PCs since about the same timeframe! One assumes ubiquity in the >> desktop world would need to happen before any real spillover to embedded. >> (When the "desktop" was an '11 sitting in a back room, it wasn't seen >> as ubiquitous.) >> >> In the future, we'll see the 64b *phone* world drive the evolution >> of embedded designs, similarly. (do you really need 32b/64b to >> make a phone? how much code is actually executing at any given >> time and in how many different containers?) >> >> [The OP suggests MCus with radios -- maybe they'll be cell phone >> radios and *not* wifi/BLE as I assume he's thinking! Why add the >> need for some sort of access point to a product's deployment if >> the product *itself* can make a direct connection??] >> >> My current design can't fill a 32b address space (but, that's because >> I've decomposed apps to the point that they can be relatively small). >> OTOH, designing a system with a 32b limitation seems like an invitation >> to do it over when 64b is "cost effective". The extra "baggage" has >> proven to be relatively insignificant (I have ports of my codebase >> to SPARC as well as Atom running alongside a 32b ARM) >>> The OP sounds more like a salesman than someone who actually works with >>> embedded development in reality. >> Possibly. Or, just someone that wanted to stir up discussion... > > |> I contend that a good many "32b" implementations are really glorified > |> 8/16b applications that exhausted their memory space. > > The only thing that will take more than 4GB is video or a day's worth of photos. > So there is likely to be some embedded aps that need a > 32-bit address space. > Cost, size or storage capacity are no longer limiting factors. > > Am trying to puzzle out what a 64-bit embedded processor should look like. > At the low end, yeah, a simple RISC processor. And support for complex arithmetic > using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers? > 32-bit pointers into the software? >
The real value in 64 bit integer registers and 64 bit address space is just that, having an orthogonal "endless" space (well I remember some 30 years ago 32 bits seemed sort of "endless" to me...). Not needing to assign overlapping logical addresses to anything can make a big difference to how the OS is done. 32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP *numbers* can be quite useful for storing/passing data. Dimiter ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
On 6/8/2021 7:46 AM, Theo wrote:
> David Brown <david.brown@hesbynett.no> wrote: >> But for microcontrollers - which dominate embedded systems - there has >> been a lot to gain by going from 8-bit and 16-bit to 32-bit for little >> cost. There is almost nothing to gain from a move to 64-bit, but the >> cost would be a good deal higher. So it is not going to happen - at >> least not more than a very small and very gradual change. > > I think there will be divergence about what people mean by an N-bit system: > > Register size > Unit of logical/arithmetical processing > Memory address/pointer size > Memory bus/cache width
(General) Register size is the primary driver. A processor can have very different "size" subcomponents. E.g., a Z80 is an 8b processor -- registers are nominally 8b. However, it support 16b operations -- on register PAIRs (an implicit acknowledgement that the REGISTER is smaller than the register pair). This is common on many smaller processors. The address space is 16b -- with a separate 16b address space for I/Os. The Z180 extends the PHYSICAL address space to 20b but the logical address space remains unchanged at 16b (if you want to specify a physical address, you must use 20+ bits to represent it -- and invoke a separate mechanism to access it!). The ALU is *4* bits. Cache? Which one? I or D? L1/2/3/? What about the oddballs -- 12b? 1b?
> I think we will increasingly see parts which have different sizes on one > area but not the other. > > For example, for doing some kinds of logical operations (eg crypto), having > 64-bit registers and ALU makes sense, but you might only need kilobytes of > memory so only have <32 address bits.
That depends on the algorithm chosen and the hardware support available.
> For something else, like a microcontroller that's hung off the side of a > bigger system (eg the MCU on a PCIe card) you might want the ability to > handle 64 bit addresses but don't need to pay the price for 64-bit > registers. > > Or you might operate with 16 or 32 bit wide external RAM chip, but your > cache could extend that to a wider word width. > > There are many permutations, and I think people will pay the cost where it > benefits them and not where it doesn't.
But you don't buy MCUs with a-la-carte pricing. How much does an extra timer cost me? What if I want it to also serve as a *counter*? What cost for 100K of internal ROM? 200K? [It would be an interesting exercise to try to do a linear analysis of product prices with an idea of trying to tease out the "costs" (to the developer) for each feature in EXISTING products!] Instead, you see a *price* that is reflective of how widely used the device happens to be, today. You are reliant on the preferences of others to determine which is the most cost effective product -- for *you*. E.g., most of my devices have no "display" -- yet, the MCU I've chosen has hardware support for same. It would obviously cost me more to select a device WITHOUT that added capability -- because most purchasers *want* a display (and *they* drive the production economies). I could, potentially, use a 2A03 for some applications. But, the "TCO" of such an approach would exceed that of a 32b (or larger) processor! [What a crazy world!]
> This is not a new phenomenon, of course. But for a time all these numbers > were in the range between 16 and 32 bits, which made 32 simplest all round. > Just like we previously had various 8/16 hybrids (eg 8 bit datapath, 16 bit > address) I think we're going to see more 32/64 hybrids. > > Theo >
On 6/8/2021 12:38 PM, James Brakefield wrote:

> |> I contend that a good many "32b" implementations are really glorified > |> 8/16b applications that exhausted their memory space. > > The only thing that will take more than 4GB is video or a day's worth of photos.
That's not true. For example, I rely on a "PC" in my current design to support the RDBMS. Otherwise, I would have to design a "special node" (I have a distributed system) that had the resources necessary to process multiple concurrent queries in a timely fashion; I can put 100GB of RAM in a PC (whereas my current nodes only have 256MB). The alternative is to rely on secondary (disk) storage -- which is even worse! And "video" is incredibly nondescript. It conjures ideas of STBs. Instead, I see a wider range of applications in terms of *vision*. E.g., let your doorbell camera "notice motion", recognize that motion as indicative of someone/thing approaching it (e.g., a visitor), recognize the face/features of the visitor and alert you to its presence (if desired). No need to involve a cloud service to do this. [My "doorbell" is a camera/microphone/speaker. *If* I want to know that you are present, *it* will tell me. Or, if told to do so, will grant you access to the house (even in my absence). For "undesirables", I'm mounting a coin mechanism adjacent to the entryway (our front door is protected by a gated porch area): "Deposit 25c to ring bell. If we want to talk to you, your deposit will be refunded. If *not*, consider that the *cost* of pestering us!"] There are surveillance cameras discretely placed around the exterior of the house (don't want the place to look like a frigging *bank*!). One of them has a clear view of the mailbox (our mail is delivered via lettercarriers riding in mail trucks). Same front door camera hardware. But, now: detect motion; detect motion STOPPING proximate to mailbox (for a few seconds or more); detect motion resuming; signal "mail available". Again, no need to involve a cloud service to accomplish this. And, when not watching for mail delivery, it's performing "general" surveillance -- mail detection is a "free bonus"! Imagine designing a vision-based inspection system where you "train" the CAMERA -- instead of some box that the camera connects to. And, the CAMERA signals accept/reject directly. [I use a boatload of cameras, here; they are cheap sensors -- the "cost" lies in the signal processing!]
> So there is likely to be some embedded aps that need a > 32-bit address space. > Cost, size or storage capacity are no longer limiting factors.
No, cost size and storage are ALWAYS limiting factors! E.g., each of my nodes derive power from the wired network connection. That puts a practical limit of ~12W on what a node can dissipate. That has to support the processing core plus any local I/Os! Note that dissipated power == heat. So, one also has to be conscious of how that heat will affect the devices' environs. (Yes, there are schemes to increase this to ~100W but now the cost of providing power -- and BACKUP power -- to a remote device starts to be a sizeable portion of the product's cost and complexity). My devices are intended to be "invisible" to the user -- so, they have to hide *inside* something (most commonly, the walls or ceiling -- in standard Jboxes for accessibility and Code compliance). So, that limits their size/volume (mine are about the volume of a standard duplex receptacle -- 3 cu in -- so fit in even the smallest of 1G boxes... even pancake boxes!) They have to be inexpensive so I can justify using LOTS of them (I will have 240 deployed, here; my industrial beta site will have over 1000; commercial beta site almost a similar number). Not only is the cost of initial acquisition of concern, but also the *perceived* cost of maintaining the hardware in a functional state (customer doesn't want to have $10K of spares on hand for rapid incident response and staff to be able to diagnose and repair/replace "on demand") In my case, I sidestep the PERSISTENT storage issue by relegating that to the RDBMS. In *that* domain, I can freely add spinning rust or an SSD without complicating the design of the rest of the nodes. So, "storage" becomes: - how much do I need for a secure bootstrap - how much do I need to contain a downloaded (from the RDBMS!) binary - how much do I need to keep "local runtime resources" - how much can I exploit surplus capacity *elsewhere* in the system to address transient needs Imagine what it would be like having to replace "worn" SD cards at some frequency in hundreds of devices scattered around hundreds of "invisible" places! Almost as bad as replacing *batteries* in those devices! [Have you ever had an SD card suddenly write protect itself?]
> Am trying to puzzle out what a 64-bit embedded processor should look like.
"Should"? That depends on what you expect it to do for you. The nonrecurring cost of development will become an ever-increasing portion of the device's "cost". If you sell 10K units but spend 500K on development (over its lifetime), you've justification for spending a few more dollars on recurring costs *if* you can realize a reduction in development/maintenance costs (because the development is easier, bugs are fewer/easier to find, etc.) Developers (and silicon vendors, as Good Business Practice) will look at their code and see what's "hard" to do, efficiently. Then, consider mechanisms that could make that easier or more effective. I see the addition of hardware features that enhance the robustness of the software development *process*. E.g., allowing for compartmentalizing applications and subsystems more effectively and *efficiently*. [I put individual objects into their own address space containers to ensure Object A can't be mangled by Client B (or Object C). As a result, talking to an object is expensive because I have to hop back and forth across that protection boundary. It's even worse when the targeted object is located on some other physical node (as now I have the transport cost to contend with).] Similarly, making communications more robust. We already see that with crypto accelerators. The idea of device "islands" is obsolescent. Increasingly, devices will interact with other devices to solve problems. More processing will move to the edge simply because of scaling issues (I can add more CPUs far more effectively than I can increase the performance of a "centralized" CPU; add another sense/control point? let *it* bring some processing abilities along with it!). And, securing the product from tampering/counterfeiting; it seems like most approaches, to date, have some hidden weakness. It's hard to believe hardware can't ameliorate that. The fact that "obscurity" is still relied upon by silicon vendors suggests an acknowledgement of their weaknesses. Beyond that? Likely more DSP-related support in the "native" instruction set (so you can blend operations between conventional computing needs and signal processing related issues). And, graphics acceleration as many applications implement user interfaces in the appliance. There may be some other optimizations that help with hashing or managing large "datasets" (without them being considered formal datasets). Power management (and measurement) will become increasingly important (I spend almost as much on the "power supply" as I do on the compute engine). Developers will want to be able to easily ascertain what they are consuming as well as why -- so they can (dynamically) alter their strategies. In addition to varying CPU clock frequency, there may be mechanisms to automatically (!) power down sections of the die based on observed instruction sequences (instead of me having to explicitly do so). [E.g., I shed load when I'm running off backup power. This involves powering down nodes as well as the "fields" on selective nodes. How do I decide *which* load to shed to gain the greatest benefit?] Memory management (in the conventional sense) will likely see more innovation. Instead of just "settling" for a couple of page sizes, we might see "adjustable" page sizes. Or, the ability to specify some PORTION of a *particular* page as being "valid" -- instead of treating the entire page as such. Scheduling algorithms will hopefully get additional hardware support. E.g., everything is deadline driven in my design ("real-time"). So, schedulers are concerned with evaluating the deadlines of "ready" tasks -- which can vary, over time, as well as may need further qualification based on other criteria (e.g., least-slack-time scheduling) Everything in my system is an *opaque* object on which a set of POSSIBLE methods that can be invoked. But, each *Client* of that object (an Actor may be multiple Clients if it possesses multiple different Handles to the Object) is constrained as to which methods can be invoked via a particular Handle. So, I can (e.g.) create an Authenticator object that has methods like "set_passphrase" and "test_passphrase" and "invalidate_passphrase". Yet, no "disclose_passphrase" method (for obvious reasons). I can create an Interface to one privileged Client that allows it to *set* a new passphrase. And, all other Interfaces (to that Client as well as others!) may all be restricted to only *testing* the passphrase ("Is it 'foobar'?"). And, I can limit the number of attempts that you can invoke a particular method over a particular interface so the OS does the enforcement instead of relying on the Server to do so. [What's to stop a Client from hammering on the Server (Authenticator Object) repeatedly -- invoking test_passphrase with full knowledge that it doesn't know the correct passhrase: "Is it 'foobar'?" "Is it 'foobar'?" "Is it 'foobar'?" "Is it 'foobar'?" "Is it 'foobar'?" The Client has been enabled to do this; that doesn't mean he can't or won't abuse it! Note that unlimited access means the server has to respond to each of those method invocations. By contrast, putting a limit on them means the OS can block the invocation from ever reaching the Object (and needlessly tying up the Object's resources). A capabilities based system that relies on encrypted tokens means the Server has to decrypt a token in order to determine that it is invalid; the Server's resources are consumed instead of the Client's] It takes effort (in the kernel) to verify that a Client *can* access a particular Object (i.e., has a Handle to it) AND that the Client can invoke THAT particular Method on that Object via this Handle (bound to a particular Object *Interface*) as well as verifying the format of the data, converting to a format suitable for the targeted Object (which may use a different representational structure) for a particular Version of the Interface... I can either skimp on performing some of these checks (and rely on other mechanisms to ensure the security and reliability of the codebase -- in the presence of unvetted Actors) or hope that some hardware mechanism in the processor makes these a bit easier.
> At the low end, yeah, a simple RISC processor. And support for complex arithmetic > using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers? > 32-bit pointers into the software?
I doubt complex arithmetic will have much play. There might be support for *building* larger data types (e.g., I use BigRationals which are incredibly inefficient). But, the bigger bang will be for operators that allow tedious/iterative solutions to be implemented in constant time. This, for example, is why a hardware multiply (or other FPU capabilities) is such a win -- consider the amount of code that is replaced by a single op-code! Ditto things like "find first set bit", etc. Why stick with 32b floats when you can likely implement doubles with a bit more microcode (surely faster than trying to do wider operations built from narrower ones)? There's an entirely different mindset when you start thinking in terms of "bigger processors". I.e., the folks who see 32b processors as just *wider* 8/16b processors have typically not made this adjustment. It's like trying to "sample the carry" in a HLL (common in ASM) instead of concentrating on what you REALLY want to do and letting the language make it easier for you to express that. Expect to see people making leaps forward in terms of what they expect from the solutions they put forth. Anything that you could do with a PC, before, can now be done *in* a handheld flashlight!
On 6/8/2021 4:04 AM, David Brown wrote:
> On 08/06/2021 09:39, Don Y wrote: >> On 6/7/2021 10:59 PM, David Brown wrote: >>> 8-bit microcontrollers are still far more common than 32-bit devices in >>> the embedded world (and 4-bit devices are not gone yet). At the other >>> end, 64-bit devices have been used for a decade or two in some kinds of >>> embedded systems. >> >> I contend that a good many "32b" implementations are really glorified >> 8/16b applications that exhausted their memory space. > > Sure. Previously you might have used 32 kB flash on an 8-bit device, > now you can use 64 kB flash on a 32-bit device. The point is, you are > /not/ going to find yourself hitting GB limits any time soon. The step
I don't see the "problem" with 32b devices as one of address space limits (except devices utilizing VMM with insanely large page sizes). As I said, in my application, task address spaces are really just a handful of pages. I *do* see (flat) address spaces that find themselves filling up with stack-and-heap-per-task, big chunks set aside for "onboard" I/Os, *partial* address decoding for offboard I/Os, etc. (i.e., you're not likely going to fully decode a single address to access a set of DIP switches as the decode logic is disproportionately high relative to the functionality it adds) How often do you see a high-order address line used for kernel/user? (gee, now your "user" space has been halved)
> from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the > system - the step from 32-bit to 64-bit is totally pointless for 99.99% > of embedded systems. (Even for most embedded Linux systems, you usually > only have a 64-bit cpu because you want bigger and faster, not because > of memory limitations. It is only when you have a big gui with fast > graphics that 32-bit address space becomes a limitation.)
You're assuming there has to be some "capacity" value to the 64b move. You might discover that the ultralow power devices (for phones!) are being offered in the process geometries targeted for the 64b devices. Or, that some integrated peripheral "makes sense" for phones (but not MCUs targeting motor control applications). Or, that there are additional power management strategies supported in the hardware. In my mind, the distinction brought about by "32b" was more advanced memory protection/management -- even if not used in a particular application. You simply didn't see these sorts of mechanisms in 8/16b offerings. Likewise, floating point accelerators. Working in smaller processors meant you had to spend extra effort to bullet-proof your code, economize on math operators, etc. So, if you wanted the advantages of those (hardware) mechanisms, you "upgraded" your design to 32b -- even if it didn't need gobs of address space or generic MIPS. It just wasn't economical to bolt on an AM9511 or practical to build a homebrew MMU.
> A 32-bit microcontroller is simply much easier to work with than an > 8-bit or 16-bit with "extended" or banked memory to get beyond 64 K > address space limits.
There have been some 8b processors that could seemlessly (in HLL) handle extended address spaces. The Z180s were delightfully easy to use, thusly. You just had to keep in mind that a "call" to a different bank was more expensive than a "local" call (though there were no syntactic differences; the linkage editor and runtime package made this invisible to the developer). We were selling products with 128K of DRAM on Z80's back in 1981. Because it was easier to design THAT hardware than to step up to a 68K, for example. (as well as leveraging our existing codebase) The "video game era" was built on hybridized 8b systems -- even though you could buy 32b hardware, at the time. You would be surprised at the ingenuity of many of those systems in offloading the processor of costly (time consuming) operations to make the device appear more powerful than it actually was.
>>> We'll see 64-bit take a greater proportion of the embedded systems that >>> demand high throughput or processing power (network devices, hard cores >>> in expensive FPGAs, etc.) where the extra cost in dollars, power, >>> complexity, board design are not a problem. They will probably become >>> more common in embedded Linux systems as the core itself is not usually >>> the biggest part of the cost. And such systems are definitely on the >>> increase. >>> >>> But for microcontrollers - which dominate embedded systems - there has >>> been a lot to gain by going from 8-bit and 16-bit to 32-bit for little >> >> I disagree. The "cost" (barrier) that I see clients facing is the >> added complexity of a 32b platform and how it often implies (or even >> *requires*) a more formal OS underpinning the application. > > Yes, that is definitely a cost in some cases - 32-bit microcontrollers > are usually noticeably more complicated than 8-bit ones. How > significant the cost is depends on the balances of the project between > development costs and production costs, and how beneficial the extra > functionality can be (like moving from bare metal to RTOS, or supporting > networking).
I see most 32b designs operating without the benefits that a VMM system can apply (even if you discount demand paging). They just want to have a big address space and not have to dick with "segment registers", etc. They plow through the learning effort required to configure the device to move the "extra capabilities" out of the way. Then, just treat it like a bigger 8/16 processor. You can "bolt on" a simple network stack even with a rudimentary RTOS/MTOS. Likewise, a web server. Now, you remove the need for graphics and other UI activities hosted *in* the device. And, you likely don't need to support multiple concurrent clients. If you want to provide those capabilities, do that *outside* the device (let it be someone else's problem). And, you gain "remote access" for free. Few such devices *need* (or even WANT!) ARP caches, inetd, high performance stack, file systems, etc. Given the obvious (coming) push for enhanced security in devices, anything running on your box that you don't need (or UNDERSTAND!) is likely going to be pruned off as a way to reduce the attack surface. "Why is this port open? What is this process doing? How robust is the XXX subsystem implementation to hostile actors in an *unsupervised* setting?"
>>> cost. There is almost nothing to gain from a move to 64-bit, but the >>> cost would be a good deal higher. >> >> Why is the cost "a good deal higher"? Code/data footprints don't >> uniformly "double" in size. The CPU doesn't slow down to handle >> bigger data. > > Some parts of code and data /do/ double in size - but not uniformly, of > course. But your chip is bigger, faster, requires more power, has wider > buses, needs more advanced memories, has more balls on the package, > requires finer pitched pcb layouts, etc.
And has been targeted to a market that is EXTREMELY power sensitive (phones!). It is increasingly common for manufacturing technologies to be moving away from "casual development". The days of owning your own wave and doing in-house manufacturing at a small startup are gone. If you want to limit yourself to the kinds of products that you CAN (easily) assemble, you will find yourself operating with a much poorer selection of components available. I could fab a PCB in-house and build small runs of prototypes using the wave and shake-and-bake facilities that we had on hand. Harder to do so, nowadays. This has always been the case. When thru-hole met SMT, folks had to either retool to support SMT, or limit themselves to components that were available in thru-hole packages. As the trend has always been for MORE devices to move to newer packaging technologies, anyone who spent any time thinking about it could read the writing on the wall! (I bought my Leister in 1988? Now, I prefer begging favors from colleagues to get my prototypes assembled!) I suspect this is why we now see designs built on COTS "modules" increasingly. Just like designs using wall warts (so they don't have to do the testing on their own, internally designed supplies). It's one of the reasons FOSH is hampered (unlike FOSS, you can't roll your own copy of a hardware design!)
> In theory, you /could/ make a microcontroller in a 64-pin LQFP and > replace the 72 MHz Cortex-M4 with a 64-bit ARM core at the same clock > speed. The die would only cost two or three times more, and take > perhaps less than 10 times the power for the core. But it would be so > utterly pointless that no manufacturer would make such a device.
This is specious reasoning: "You could take the die out of a 68K and replace it with a 64 bit ARM." Would THAT core cost two or three times more (do you recall how BIG 68K die were?) and consume 10 times the power? (it would consume considerably LESS). The market will drive the cost (power, size, $$$, etc.) of 64b cores down as they will find increasing use in devices that are size and power constrained. There's far more incentive to make a cheap, low power 64b ARM than there is to make a cheap, low power i686 (or 68K) -- you don't see x86 devices in phones (laptops have bigger power budgets so less pressure on efficiency). There's no incentive to making thru-hole versions of any "serious" processor, today. Just like you can't find any fabs for DTL devices. Or 10 & 12" vinyl. (yeah, you can buy vinyl, today -- at a premium. And, I suspect you can find someone to package an ARM on a DIP carrier. But, each of those are niche markets, not where the "money lies")
> So a move to 64-bit in practice means moving from a small, cheap, > self-contained microcontroller to an embedded PC. Lots of new > possibilities, lots of new costs of all kinds.
How do you come to that conclusion? I have a 32b MCU on a board. And some FLASH and DRAM. How is that going to change when I move to a 64b processor? The 64b devices are also SoCs so it's not like you suddenly have to add address decoding logic, a clock generator, interrupt controller, etc. Will phones suddenly become FATTER to accommodate the extra hardware needed? Will they all need bolt on battery boosters?
> Oh, and the cpu /could/ be slower for some tasks - bigger cpus that are > optimised for throughput often have poorer latency and more jitter for > interrupts and other time-critical features.
You're cherry picking. They can also be FASTER for other tasks and likely will be optimized to justify/exploit those added abilities; a vendor isn't going to offer a product that is LESS desireable than his existing products. An IPv6 stack on a 64b processor is a bit easier to implement than on 32b. (remember, ARM is in a LOT of fabs! That speaks to how ubiquitous it is!)
>>> So it is not going to happen - at >>> least not more than a very small and very gradual change. >> >> We got 32b processors NOT because the embedded world cried out for >> them but, rather, because of the influence of the 32b desktop world. >> We've had 32b processors since the early 80's. But, we've only had >> PCs since about the same timeframe! One assumes ubiquity in the >> desktop world would need to happen before any real spillover to embedded. >> (When the "desktop" was an '11 sitting in a back room, it wasn't seen >> as ubiquitous.) > > I don't assume there is any direct connection between the desktop world > and the embedded world - the needs are usually very different. There is > a small overlap in the area of embedded devices with good networking and > a gui, where similarity to the desktop world is useful.
The desktop world inspires the embedded world. You see what CAN be done for "reasonable money". In the 70's, we put i4004's into products because we knew the processing that was required was "affordable" (at several kilobucks) -- because we had our own '11 on site. We leveraged the in-house '11 to compute "initialization constants" for the needs of specific users (operating the i4004-based products). We didn't hesitate to migrate to i8080/85 when they became available -- because the price point was largely unchanged (from where it had been with the i4004) AND we could skip the involvement of the '11 in computing those initialization constants! I watch the prices of the original 32b ARM I chose fall and see that as an opportunity -- to UPGRADE the capabilities (and future-safeness of the design). If I'd assumed $X was a tolerable price, before, then it likely still is!
> We have had 32-bit microcontrollers for decades. I used a 16-bit > Windows system when working with my first 32-bit microcontroller. But > at that time, 32-bit microcontrollers cost a lot more and required more > from the board (external memories, more power, etc.) than 8-bit or > 16-bit devices. That has gradually changed with an almost total > disregard for what has happened in the desktop world.
I disagree. I recall having to put lots of "peripherals" into an 8/16b system, external address decoding logic, clock generators, DRAM controllers, etc. And, the cost of entry was considerably higher. Development systems used to cost tens of kilodollars (Intellec MDS, Zilog ZRDS, Moto EXORmacs, etc.) I shared a development system with several other developers in the 70's -- because the idea of giving each of us our own was anathema, at the time. For 35+ years, you could put one on YOUR desk for a few kilobucks. Now, it's considerably less than that. You'd have to be blind to NOT think that the components that are "embedded" in products haven't -- and won't continue -- to see similar reductions in price and increases in performance. Do you think the folks making the components didn't anticipate the potential demand for smaller/faster/cheaper chips? We've had TCP/IP for decades. Why is it "suddenly" more ubiquitous in product offerings? People *see* what they can do with a technology in one application domain (e.g., desktop) and extrapolate that to other, similar application domains (embedded). I did my first full custom 30+ years ago. Now, I can buy an off-the-shelf component and "program" it to get similar functionality (without involving a service bureau). Ideas that previously were "gee, if only..." are now commonplace.
> Yes, the embedded world /did/ cry out for 32-bit microcontrollers for an > increasing proportion of tasks. We cried many tears when then > microcontroller manufacturers offered to give more flash space to their > 8-bit devices by having different memory models, banking, far jumps, and > all the other shit that goes with not having a big enough address space. > We cried out when we wanted to have Ethernet and the microcontroller > only had a few KB of ram. I have used maybe 6 or 8 different 32-bit > microcontroller processor architectures, and I used them because I > needed them for the task. It's only in the past 5+ years that I have > been using 32-bit microcontrollers for tasks that could be done fine > with 8-bit devices, but the 32-bit devices are smaller, cheaper and > easier to work with than the corresponding 8-bit parts.
But that's because your needs evolve and the tools you choose to use have, as well. I wanted to build a little line frequency clock to see how well it could discipline my NTPd. I've got all these PCs, single board PCs, etc. lying around. It was *easier* to hack together a small 8b processor to do the job -- less hardware to understand, no OS to get in the way, really simple to put a number on the interrupt latency that I could expect, no uncertainties about the hardware that's on the PC, etc. OTOH, I have a network stack that I wrote for the Z180 decades ago. Despite being written in a HLL, it is a bear to deploy and maintain owing to the tools and resources available in that platform. My 32b stack was a piece of cake to write, by comparison!
>> In the future, we'll see the 64b *phone* world drive the evolution >> of embedded designs, similarly. (do you really need 32b/64b to >> make a phone? how much code is actually executing at any given >> time and in how many different containers?) > > We will see that on devices that are, roughly speaking, tablets - > embedded systems with a good gui, a touchscreen, networking. And that's > fine. But these are a tiny proportion of the embedded devices made.
Again, I disagree. You've already admitted to using 32b processors where 8b could suffice. What makes you think you won't be using 64b processors when 32b could suffice? It's just as hard for me to prototype a 64b SoC as it is a 32b SoC. The boards are essentially the same size. "System" power consumption is almost identical. Cost is the sole differentiating factor, today. History tells us it will be less so, tomorrow. And, the innovations that will likely come in that offering will likely exceed the capabilities (or perceived market needs) of smaller processors. To say nothing of the *imagined* uses that future developers will envision! I can make a camera that "reports to google/amazon" to do motion detection, remote access, etc. Or, for virtually the same (customer) dollars, I can provide that functionality locally. Would a customer want to add an "unnecessary" dependency to a solution? "Tired of being dependant on Big Brother for your home security needs? ..." Imagine a 64b SoC with a cellular radio: "I'll *call* you when someone comes to the door..." (or SMS) I have cameras INSIDE my garage that assist with my parking and tell me if I've forgotten to close the garage door. Should I have google/amazon perform those value-added tasks for me? Will they tell me if I've left something in the car's path before I run over it? Will they turn on the light to make it easier for me to see? Should I, instead, tether all of those cameras to some "big box" that does all of that signal processing? What happens to those resources when the garage is "empty"?? The "electric eye" (interrupter) that guards against closing the garage door on a toddler/pet/item in it's path does nothing to protect me if I leave some portion of the vehicle in the path of the door (but ABOVE the detection range of the interrupter). Locating a *camera* on teh side of the doorway lets me detect if ANYTHING is in the path of the door, regardless of how high above the old interrupter's position it may be located. How *many* camera interfaces should the SoC *directly* support? The number (and type) of applications that can be addressed with ADDITIONAL *local* smarts/resources is almost boundless. And, folks don't have to wait for a cloud supplier (off-site processing) to decide to offer them. "Build it and they will come." [Does your thermostat REALLY need all of that horsepower -- two processors! -- AND google's server in order to control the HVAC in your home? My god, how did that simple bimetallic strip ever do it??!] If you move into the commercial/industrial domains, the opportunities are even more diverse! (e.g., build a camera that does component inspection *in* the camera and interfaces to a go/nogo gate or labeller) Note that none of these applications need a display, touch panel, etc. What they likely need is low power, small size, connectivity, MIPS and memory. The same sorts of things that are common in phones.
>>> The OP sounds more like a salesman than someone who actually works with >>> embedded development in reality. >> >> Possibly. Or, just someone that wanted to stir up discussion... > > Could be. And there's no harm in that!
On that, we agree. Time for ice cream (easiest -- and most enjoyable -- way to lose weight)!
James Brakefield <jim.brakefield@ieee.org> writes:
> Am trying to puzzle out what a 64-bit embedded processor should look like.
Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a remote web browser. There's your 64 bit embedded system.
On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:

>> Am trying to puzzle out what a 64-bit embedded processor should look like. >> At the low end, yeah, a simple RISC processor. And support for complex >> arithmetic >> using 32-bit floats? And support for pixel alpha blending using quad 16-bit >> numbers? >> 32-bit pointers into the software? > > The real value in 64 bit integer registers and 64 bit address space is > just that, having an orthogonal "endless" space (well I remember some > 30 years ago 32 bits seemed sort of "endless" to me...). > > Not needing to assign overlapping logical addresses to anything > can make a big difference to how the OS is done.
That depends on what you expect from the OS. If you are comfortable with the possibility of bugs propagating between different subsystems, then you can live with a logical address space that exactly coincides with a physical address space. But, consider how life was before Windows used compartmentalized applications (and OS). How easily it is for one "application" (or subsystem) to cause a reboot -- unceremoniously. The general direction (in software development, and, by association, hardware) seems to be to move away from unrestrained access to the underlying hardware in an attempt to limit the amount of damage that a "misbehaving" application can cause. You see this in languages designed to eliminate dereferencing pointers, pointer arithmetic, etc. Languages that claim to ensure your code can't misbehave because it can only do exactly what the language allows (no more injecting ASM into your HLL code). I think that because you are the sole developer in your application, you see a distorted vision of what the rest of the development world encounters. Imagine handing your codebase to a third party. And, *then* having to come back to it and fix the things that "got broken". Or, in my case, allowing a developer to install software that I have to "tolerate" (for some definition of "tolerate") without impacting the software that I've already got running. (i.e., its ok to kill off his application if it is broken; but he can't cause *my* portion of the system to misbehave!)
> 32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP > *numbers* can be quite useful for storing/passing data.
32 bit numbers have appeal if you're registers are 32b; they "fit nicely". Ditto 64b in 64b registers.
On 6/8/2021 1:39 PM, Dimiter_Popoff wrote:

> Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered > there is some real mess with their out of order execution, one needs to > do... hmmmm.. "sync", whatever they call it, all the time and there is > a huge performance cost because of that. Anybody heard anything about > it? (I only know what I was told).
Many processors support instruction reordering (and many compilers will reorder the code they generate). In each case, the reordering is supposed to preserve semantics. If the code "just runs" (and is never interrupted nor synchronized with something else), the result should be the same. If you want to be able to arbitrarily interrupt an instruction sequence, then you need to take special measures. This is why we have barriers, the ability to flush caches, etc. For "generic" code, the developer isn't involved with any of this. Inside the kernel (or device drivers), its often a different story...
On Tue, 8 Jun 2021 22:11:18 +0200, David Brown
<david.brown@hesbynett.no> wrote:


>Pretty much all processors except x86 and brain-dead old-fashioned 8-bit >CISC devices are RISC...
It certainly is correct to say of the x86 that its legacy, programmer visible, instruction set is CISC ... but it is no longer correct to say that the chip design is CISC. Since (at least) the Pentium 4 x86 really are a CISC decoder bolted onto the front of what essentially is a load/store RISC. "Complex" x86 instructions (in RAM and/or $I cache) are dynamically translated into equivalent short sequences[*] of RISC-like wide format instructions which are what actually is executed. Those sequences also are stored into a special trace cache in case they will be used again soon - e.g., in a loop - so they (hopefully) will not have to be translated again. [*] Actually, a great many x86 instructions map 1:1 to internal RISC instructions - only a small percentage of complex x86 instructions require "emulation" via a sequence of RISC instructions.
>... Not all [RISC] are simple.
Correct. Every successful RISC CPU has supported a suite of complex instructions. Of course, YMMV. George

The 2024 Embedded Online Conference