EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Speaking of Multiprocessing...

Started by rickman March 23, 2017
On Thu, 23 Mar 2017 17:49:39 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 3/23/2017 4:47 PM, Tim Wescott wrote: >> On Thu, 23 Mar 2017 16:26:46 -0700, Don Y wrote: >> >>> On 3/23/2017 4:19 PM, Tim Wescott wrote: >>>> On Thu, 23 Mar 2017 18:38:13 -0400, rickman wrote: >>>> >>>>> I recall a discussion about the design of an instruction set >>>>> architecture where someone was saying an instruction was required to >>>>> test and set a bit or word as an atomic operation if it was desired to >>>>> support multiple processors. Is this really true? Is this a function >>>>> that can't be emulated with other operations including the disabling >>>>> of interrupts? >>>> >>>> AFAIK as long as you surround your "test and set" with an interrupt >>>> disable and an interrupt enable then you're OK. At least, you're OK >>>> unless you have a processor that treats interrupts really strangely. >>> >>> Rethink that for the case of SMP... (coincidentally, "Support Multiple >>> Processors" :> ) >> >> D'oh. Atomic to the common memory, not to each individual processor, yes. > >Yes. If the processor supports a RMW memory cycle AND the memory >arbiter honors that contract, then any competing processors would >explicitly be held off from accessing the location in question >until the RMW cycle terminated.
The R/M/W popular a few decades ago when the core memory was much slower than the processor. The read operation in core memory is destructive, so you have to write back the original value. This is usually done within the memory controller, the same or modified value could also be written back by the processor, so you get the R/M/W sequence practically for "free". In modern systems, things get complicated, since you may have to read a full 64 bit memory word, bypassing caches on both read and write while keeping the RAS active through the whole sequence.
On 23/03/17 23:38, rickman wrote:
> I recall a discussion about the design of an instruction set > architecture where someone was saying an instruction was required to > test and set a bit or word as an atomic operation if it was desired to > support multiple processors. Is this really true? Is this a function > that can't be emulated with other operations including the disabling of > interrupts? >
There are many, many ways to implement synchronisation between threads, processors, whatever. In theory, they are mostly equivalent in that any one can be used to implement the others. In practice, there can be a lot of differences in the overheads in the hardware implementation, and the speed in practice. Typical implementations are "compare-and-swap" instructions (x86 uses these) and load-linked/store-conditional (common on RISC systems where instructions either load /or/ store, not both. And of course, on single processor systems there is always the "disable all interrupts" method. But if you can use dedicated hardware, there are many other methods. The XMOS devices have hardware support for pipelines and message passing. On a dual-core PPC device I used, there is a hardware block of semaphores. Each semaphore is a pair of 16-bit ID, 16-bit value that you can only access as a 32-bit read or write. You can write to it if the current ID is 0, or if the ID you are writing matches that of the semaphore. There is plenty of scope for variation based on that theme.
On 3/24/2017 12:48 AM, upsidedown@downunder.com wrote:
> On Thu, 23 Mar 2017 17:49:39 -0700, Don Y > <blockedofcourse@foo.invalid> wrote: > >> On 3/23/2017 4:47 PM, Tim Wescott wrote: >>> On Thu, 23 Mar 2017 16:26:46 -0700, Don Y wrote: >>> >>>> On 3/23/2017 4:19 PM, Tim Wescott wrote: >>>>> On Thu, 23 Mar 2017 18:38:13 -0400, rickman wrote: >>>>> >>>>>> I recall a discussion about the design of an instruction set >>>>>> architecture where someone was saying an instruction was required to >>>>>> test and set a bit or word as an atomic operation if it was desired to >>>>>> support multiple processors. Is this really true? Is this a function >>>>>> that can't be emulated with other operations including the disabling >>>>>> of interrupts? >>>>> >>>>> AFAIK as long as you surround your "test and set" with an interrupt >>>>> disable and an interrupt enable then you're OK. At least, you're OK >>>>> unless you have a processor that treats interrupts really strangely. >>>> >>>> Rethink that for the case of SMP... (coincidentally, "Support Multiple >>>> Processors" :> ) >>> >>> D'oh. Atomic to the common memory, not to each individual processor, yes. >> >> Yes. If the processor supports a RMW memory cycle AND the memory >> arbiter honors that contract, then any competing processors would >> explicitly be held off from accessing the location in question >> until the RMW cycle terminated. > > The R/M/W popular a few decades ago when the core memory was much > slower than the processor. The read operation in core memory is > destructive, so you have to write back the original value. This is > usually done within the memory controller, the same or modified value > could also be written back by the processor, so you get the R/M/W > sequence practically for "free".
On many "microprocessors", there are hints as to when RMW cycles are undertaken. E.g., the m68k would only issue address strobe for the "two phase" RMW cycle (a consequence of the TAS opcode). But, this requires the memory arbiter (for closely coupled coprocessors) to monitor /AS and not attempt early (read vs write) cycle termination (which is a potential performance hack in a shared memory system) by just watching the individual data strobes. Other legacy processors usually had exploits that could be leveraged to deduce when RMW-ish cycles were in effect -- at the cost of a bit of external logic (e.g., decoding opcode fetch cycles to provide cleaner arbitration points for memory sharing). However, as bus interface units became increasingly decoupled from execution units, it gets harder to reliably infer what is ACTUALLY happening in the CPU just by watching the bus.
> In modern systems, things get complicated, since you may have to read > a full 64 bit memory word, bypassing caches on both read and write > while keeping the RAS active through the whole sequence.
With SoC's, there's very little you can do to second-guess the processor so you have to rely on the processor to perform this sort of access (esp as the memory in question might be entirely "internal" to the processor).
On 24/03/17 08:17, David Brown wrote:
> But if you can use dedicated hardware, there are many other methods. > The XMOS devices have hardware support for pipelines and message > passing. On a dual-core PPC device I used, there is a hardware block of > semaphores. Each semaphore is a pair of 16-bit ID, 16-bit value that > you can only access as a 32-bit read or write. You can write to it if > the current ID is 0, or if the ID you are writing matches that of the > semaphore. There is plenty of scope for variation based on that theme.
I received my first XMOS board from Digi-Key a couple of days ago, and I'm looking forward to using it for some simple experiments. I /feel/ that many low-level things will be much simpler and with fewer potential nasties lurking in the undergrowth. (I felt the same with the Transputer, for obvious reasons, but never had a suitable problem at that time) With your experience, did you find any undocumented gotchas and any pleasant or unpleasant surprises?
On 24/03/17 10:47, Tim Wescott wrote:
> On Thu, 23 Mar 2017 16:26:46 -0700, Don Y wrote: > >> On 3/23/2017 4:19 PM, Tim Wescott wrote: >>> On Thu, 23 Mar 2017 18:38:13 -0400, rickman wrote: >>> >>>> I recall a discussion about the design of an instruction set >>>> architecture where someone was saying an instruction was required to >>>> test and set a bit or word as an atomic operation if it was desired to >>>> support multiple processors. Is this really true? Is this a function >>>> that can't be emulated with other operations including the disabling >>>> of interrupts? >>> >>> AFAIK as long as you surround your "test and set" with an interrupt >>> disable and an interrupt enable then you're OK. At least, you're OK >>> unless you have a processor that treats interrupts really strangely. >> >> Rethink that for the case of SMP... (coincidentally, "Support Multiple >> Processors" :> ) > > D'oh. Atomic to the common memory, not to each individual processor, yes. > > Although it wouldn't have to be an instruction per se: you could have it > be an "instruction" to whatever hardware is controlling the common > memory, to hold off the other processors while it does a read/modify/ > write cycle.
Yes. But when you have multi-level caching, perhaps some with write-back semantics, it needs to force write-through, and be bus-locked all the way to the common memory. X86 has a LOCK prefix which acts on certain following instructions to make this happen, and SMP and multi-CPU architectures honor it. Clifford Heath.
On 24/03/17 10:28, Tom Gardner wrote:
> On 24/03/17 08:17, David Brown wrote: >> But if you can use dedicated hardware, there are many other methods. >> The XMOS devices have hardware support for pipelines and message >> passing. On a dual-core PPC device I used, there is a hardware block of >> semaphores. Each semaphore is a pair of 16-bit ID, 16-bit value that >> you can only access as a 32-bit read or write. You can write to it if >> the current ID is 0, or if the ID you are writing matches that of the >> semaphore. There is plenty of scope for variation based on that theme. > > I received my first XMOS board from Digi-Key a couple of days > ago, and I'm looking forward to using it for some simple > experiments. I /feel/ that many low-level things will be > much simpler and with fewer potential nasties lurking in > the undergrowth. (I felt the same with the Transputer, for > obvious reasons, but never had a suitable problem at that > time) > > With your experience, did you find any undocumented gotchas > and any pleasant or unpleasant surprises? >
Before saying anything else, I would first note that my work with XMOS systems was about four years ago, when they first started getting popular. I believe many things that bugged me most have been improved since then, both in the hardware and software, but some may remain. I think the devices themselves are a really neat idea. You have very fast execution, very efficient hardware multi-threading, very predictable timings, and a variety of inter-thread and inter-process communication methods. Their "XC" programming language was also a neat idea, based on C with additional primitives to support the hardware features and multi-threading stuff, and an attempt to make some aspects of C safer (real arrays, control of when you can access variables, etc.). However, IMHO the whole thing suffered from a number of serious flaws that limit the possibilities for the chips. Sure, they would work well in some circumstances - but I was left with the feeling that "if only they had done /this/, the devices would be so much better and could be used for so many more purposes". It is a little unfair to concentrate on the shortcomings rather than the innovations and features, but that is how I felt when using them. And again, I know that at least some issues here have been greatly improved since I last used them. A obvious flaw with the chips is lack of memory. The basic device with one cpu and 8 threads had 64K ram that was for program memory and run-time data. There was no flash - you had to use an external SPI flash which used valuable pins (messing up the use of blocks of 8, 16 or 32 pins), and used up a thread if you wanted to be able to access the flash at run-time. And while you could implement an Ethernet MAC or a 480 Mbps USB 2.0 interface on the chip, there was nowhere near enough ram for buffering or to do anything useful with the interface. Adding external memory was ridiculously expensive in terms of pins, threads, and run-time inefficiency. The hardware threading is great, and provides a really easy model for all sorts of things. To make a UART transmitter, you have a thread that waits for data coming in on a pipe. To transmit a bit, you set a pin, wait for a bit time (using hardware timers), then move on to the next bit. The code is simple and elegant. A UART receiver is not much harder. There is lots of example code in this style. Then you realise that to implement a UART, you have used a quarter of the chip's resources. Your elegant flashing light is another thread, as is your PWM output. Suddenly you find you are using a 500 MIPS chip to do the work of a $0.50 microcontroller, and you only have a thread or two left for the actual application. And you end up trying to run FreeRTOS on one of your threads, or make your own scheduler to multiplex several PWM channels in one thread. Much of the elegance quickly disappears for real-world applications. Then there is the software. The XC language lets you write code that starts tasks in parallel, automatically allocates channels for communication, lets you declare timers and wait on them. That's all great in theory - but it quickly gets confusing when you try to figure out the details of when you can pass these around, when they get allocated and deallocated, or when you can have a thread create new threads. XC carefully tracks threads and data accesses, spotting and blocking all sorts of possible race conditions. If a variable is written by one thread, then it can't be accessed from another. You can work with arrays safely, but you can't take addresses. Data gets passed between threads using communication channels that are safe from race conditions and nicely synchronised. And then you realise that to actually make the thing work, you would need far more channels than there are on the device, and they would need to be far faster - all you really wanted was for two threads to share a circular buffer, and you know in your application code when it is safe to use it. But you can't do that in XC - the language and the tools won't let you. So you have to write that code in C, with calls back and forth with the XC code that handles the multi-threading stuff. And then you realise that from within the C, you need to access some hardware resources like timers, that can't be expressed properly in C, and you can't get back to the XC code at the time. So you end up with inline assembly. Then there are the libraries and examples. These were written in such a wide variety of styles that it was impossible to figure out what was going on. A typical example project would involve a USB interface and, for example, SPDIF channels for an USB audio interface. The Eclipse-based IDE was fine, but the example did not come as a project - it came as a collection of interdependent projects. Some bits referred to files in different projects. Some bits merely required other projects to be compiled. Some bits of the code in one project would use assembly for hardware resources, others would use XC, others would use C intrinsic functions, and others would use a sort of XML file that defines the setup for your chip resources. If you change values in one file in one project (say, the USB vendor ID), you have to figure out which sub-projects need to be manually forced to re-build in order for it to take effect consistently throughout the project. Some parts use a fairly obvious configuration file - a header with defines that let you control things like IDs, number of channels, pins, etc. Except they don't - only /some/ of the sub-projects read and use the configuration file, other parts are hard-coded or use values from elsewhere. It was a complete mess. Now, I know that newer XMOS devices have more resources, built-in flash, proper hardware peripherals for the devices that are most demanding or popular, and so on. And I can only hope that the language and tools have been improved to the point where inline assembly is not required, and that the examples and libraries have matured to the point that the libraries are usable as-is, and the examples show practical ways to develop code. I really hope XMOS does well here - it is so good to see a company that thinks in a very different way and brings in these new ideas. So if your experience with modern XMOS devices and tools is good, I would love to hear about it.
On 24/03/17 10:19, David Brown wrote:
> On 24/03/17 10:28, Tom Gardner wrote: >> On 24/03/17 08:17, David Brown wrote: >>> But if you can use dedicated hardware, there are many other methods. >>> The XMOS devices have hardware support for pipelines and message >>> passing. On a dual-core PPC device I used, there is a hardware block of >>> semaphores. Each semaphore is a pair of 16-bit ID, 16-bit value that >>> you can only access as a 32-bit read or write. You can write to it if >>> the current ID is 0, or if the ID you are writing matches that of the >>> semaphore. There is plenty of scope for variation based on that theme. >> >> I received my first XMOS board from Digi-Key a couple of days >> ago, and I'm looking forward to using it for some simple >> experiments. I /feel/ that many low-level things will be >> much simpler and with fewer potential nasties lurking in >> the undergrowth. (I felt the same with the Transputer, for >> obvious reasons, but never had a suitable problem at that >> time) >> >> With your experience, did you find any undocumented gotchas >> and any pleasant or unpleasant surprises? >> > > Before saying anything else, I would first note that my work with XMOS > systems was about four years ago, when they first started getting > popular. I believe many things that bugged me most have been improved > since then, both in the hardware and software, but some may remain. > > I think the devices themselves are a really neat idea. You have very > fast execution, very efficient hardware multi-threading, very > predictable timings, and a variety of inter-thread and inter-process > communication methods. > > Their "XC" programming language was also a neat idea, based on C with > additional primitives to support the hardware features and > multi-threading stuff, and an attempt to make some aspects of C safer > (real arrays, control of when you can access variables, etc.). > > However, IMHO the whole thing suffered from a number of serious flaws > that limit the possibilities for the chips. Sure, they would work well > in some circumstances - but I was left with the feeling that "if only > they had done /this/, the devices would be so much better and could be > used for so many more purposes". It is a little unfair to concentrate > on the shortcomings rather than the innovations and features, but that > is how I felt when using them. And again, I know that at least some > issues here have been greatly improved since I last used them. > > > A obvious flaw with the chips is lack of memory. The basic device with > one cpu and 8 threads had 64K ram that was for program memory and > run-time data. There was no flash - you had to use an external SPI > flash which used valuable pins (messing up the use of blocks of 8, 16 or > 32 pins), and used up a thread if you wanted to be able to access the > flash at run-time. And while you could implement an Ethernet MAC or a > 480 Mbps USB 2.0 interface on the chip, there was nowhere near enough > ram for buffering or to do anything useful with the interface. Adding > external memory was ridiculously expensive in terms of pins, threads, > and run-time inefficiency. > > The hardware threading is great, and provides a really easy model for > all sorts of things. To make a UART transmitter, you have a thread that > waits for data coming in on a pipe. To transmit a bit, you set a pin, > wait for a bit time (using hardware timers), then move on to the next > bit. The code is simple and elegant. A UART receiver is not much > harder. There is lots of example code in this style. > > Then you realise that to implement a UART, you have used a quarter of > the chip's resources. Your elegant flashing light is another thread, as > is your PWM output. Suddenly you find you are using a 500 MIPS chip to > do the work of a $0.50 microcontroller, and you only have a thread or > two left for the actual application. > > And you end up trying to run FreeRTOS on one of your threads, or make > your own scheduler to multiplex several PWM channels in one thread. > Much of the elegance quickly disappears for real-world applications. > > > Then there is the software. The XC language lets you write code that > starts tasks in parallel, automatically allocates channels for > communication, lets you declare timers and wait on them. That's all > great in theory - but it quickly gets confusing when you try to figure > out the details of when you can pass these around, when they get > allocated and deallocated, or when you can have a thread create new > threads. XC carefully tracks threads and data accesses, spotting and > blocking all sorts of possible race conditions. If a variable is > written by one thread, then it can't be accessed from another. You can > work with arrays safely, but you can't take addresses. Data gets passed > between threads using communication channels that are safe from race > conditions and nicely synchronised. > > And then you realise that to actually make the thing work, you would > need far more channels than there are on the device, and they would need > to be far faster - all you really wanted was for two threads to share a > circular buffer, and you know in your application code when it is safe > to use it. But you can't do that in XC - the language and the tools > won't let you. So you have to write that code in C, with calls back and > forth with the XC code that handles the multi-threading stuff. > > And then you realise that from within the C, you need to access some > hardware resources like timers, that can't be expressed properly in C, > and you can't get back to the XC code at the time. So you end up with > inline assembly. > > > Then there are the libraries and examples. These were written in such a > wide variety of styles that it was impossible to figure out what was > going on. A typical example project would involve a USB interface and, > for example, SPDIF channels for an USB audio interface. The > Eclipse-based IDE was fine, but the example did not come as a project - > it came as a collection of interdependent projects. Some bits referred > to files in different projects. Some bits merely required other > projects to be compiled. Some bits of the code in one project would use > assembly for hardware resources, others would use XC, others would use C > intrinsic functions, and others would use a sort of XML file that > defines the setup for your chip resources. If you change values in one > file in one project (say, the USB vendor ID), you have to figure out > which sub-projects need to be manually forced to re-build in order for > it to take effect consistently throughout the project. Some parts use a > fairly obvious configuration file - a header with defines that let you > control things like IDs, number of channels, pins, etc. Except they > don't - only /some/ of the sub-projects read and use the configuration > file, other parts are hard-coded or use values from elsewhere. It was a > complete mess. > > > Now, I know that newer XMOS devices have more resources, built-in flash, > proper hardware peripherals for the devices that are most demanding or > popular, and so on. And I can only hope that the language and tools > have been improved to the point where inline assembly is not required, > and that the examples and libraries have matured to the point that the > libraries are usable as-is, and the examples show practical ways to > develop code. > > I really hope XMOS does well here - it is so good to see a company that > thinks in a very different way and brings in these new ideas. So if > your experience with modern XMOS devices and tools is good, I would love > to hear about it.
Thanks for a speedy, comprehensive response. I'll re-read and digest it properly later. My initial gut feel is that many of your points were valid and probably are still valid - because they /ought/ to still be valid. The issues that most interest me relate to where you found it necessary to step outside the toolchain. Part of me thinks (hopes, really) that it is merely because your problem wasn't well suited to the devices strengths (esp. guaranteed timing), and/or were too big, and/or importing existing code/thinking lead to friction, and/or the tools were immature. I expect I'll end up agreeing with many of your observations, but I'll have fun finding that out :)
On 24/03/17 12:06, Tom Gardner wrote:
> On 24/03/17 10:19, David Brown wrote:
>> I really hope XMOS does well here - it is so good to see a company that >> thinks in a very different way and brings in these new ideas. So if >> your experience with modern XMOS devices and tools is good, I would love >> to hear about it. > > Thanks for a speedy, comprehensive response. I'll re-read > and digest it properly later. > > My initial gut feel is that many of your points were > valid and probably are still valid - because they > /ought/ to still be valid.
I know that at least some of my points are no longer an issue, or at least not as much of an issue - XMOS have devices with flash, USB hardware, etc. At least some of the toolchain issues should be fixable. And the mess of the examples and libraries is certainly fixable - at least, if one disregards the time and effort it would involve!
> > The issues that most interest me relate to where you found > it necessary to step outside the toolchain. Part of me thinks > (hopes, really) that it is merely because your problem > wasn't well suited to the devices strengths (esp. guaranteed > timing), and/or were too big, and/or importing existing > code/thinking lead to friction, and/or the tools were immature.
The existing code was mainly XMOS's own examples, libraries and reference designs... I do agree that much of their USB stuff was poorly suited to the devices and too big for them, and that probably made things worse - but it was XMOS's own code. With newer devices with hardware USB peripherals, I expect fewer such issues. I will go along with your hope - expectation, even - that the tools have matured and improved over time.
> > I expect I'll end up agreeing with many of your observations, > but I'll have fun finding that out :)
On 2017-03-23, Tim Wescott <seemywebsite@myfooter.really> wrote:
> On Thu, 23 Mar 2017 18:38:13 -0400, rickman wrote: > >> I recall a discussion about the design of an instruction set >> architecture where someone was saying an instruction was required to >> test and set a bit or word as an atomic operation if it was desired to >> support multiple processors. Is this really true? Is this a function >> that can't be emulated with other operations including the disabling of >> interrupts? > > AFAIK as long as you surround your "test and set" with an interrupt > disable and an interrupt enable then you're OK.
How does disabling interrupts prevent another processor from messing up your "atomic" operation? -- Grant Edwards grant.b.edwards Yow! I want EARS! I want at two ROUND BLACK EARS gmail.com to make me feel warm 'n secure!!
On 24/03/17 10:19, David Brown wrote:
> On 24/03/17 10:28, Tom Gardner wrote: >> On 24/03/17 08:17, David Brown wrote: >>> But if you can use dedicated hardware, there are many other methods. >>> The XMOS devices have hardware support for pipelines and message >>> passing. On a dual-core PPC device I used, there is a hardware block of >>> semaphores. Each semaphore is a pair of 16-bit ID, 16-bit value that >>> you can only access as a 32-bit read or write. You can write to it if >>> the current ID is 0, or if the ID you are writing matches that of the >>> semaphore. There is plenty of scope for variation based on that theme. >> >> I received my first XMOS board from Digi-Key a couple of days >> ago, and I'm looking forward to using it for some simple >> experiments. I /feel/ that many low-level things will be >> much simpler and with fewer potential nasties lurking in >> the undergrowth. (I felt the same with the Transputer, for >> obvious reasons, but never had a suitable problem at that >> time) >> >> With your experience, did you find any undocumented gotchas >> and any pleasant or unpleasant surprises? >> > > Before saying anything else, I would first note that my work with XMOS > systems was about four years ago, when they first started getting > popular. I believe many things that bugged me most have been improved > since then, both in the hardware and software, but some may remain. > > I think the devices themselves are a really neat idea. You have very > fast execution, very efficient hardware multi-threading, very > predictable timings, and a variety of inter-thread and inter-process > communication methods. > > Their "XC" programming language was also a neat idea, based on C with > additional primitives to support the hardware features and > multi-threading stuff, and an attempt to make some aspects of C safer > (real arrays, control of when you can access variables, etc.).
Yes, those are precisely the aspects that interest me. I'm particularly interested in easy-to-implement hard realtime systems. As far as I am concerned, caches and interrupts make it difficult to guarantee hard realtime performance, and C's explicit avoidance of multiprocessor biasses C away from "easy-to-implement". Yes, I know about libraries and modern compilers that may or may not compile your code in the way you expect! I'd far rather build on a solid foundation than have to employ (language) lawyers to sort out the mess ;) Besides, I want to use Occam++ :)
> However, IMHO the whole thing suffered from a number of serious flaws > that limit the possibilities for the chips. Sure, they would work well > in some circumstances - but I was left with the feeling that "if only > they had done /this/, the devices would be so much better and could be > used for so many more purposes". It is a little unfair to concentrate > on the shortcomings rather than the innovations and features, but that > is how I felt when using them. And again, I know that at least some > issues here have been greatly improved since I last used them. > > > A obvious flaw with the chips is lack of memory. The basic device with > one cpu and 8 threads had 64K ram that was for program memory and > run-time data. There was no flash - you had to use an external SPI > flash which used valuable pins (messing up the use of blocks of 8, 16 or > 32 pins), and used up a thread if you wanted to be able to access the > flash at run-time. And while you could implement an Ethernet MAC or a > 480 Mbps USB 2.0 interface on the chip, there was nowhere near enough > ram for buffering or to do anything useful with the interface. Adding > external memory was ridiculously expensive in terms of pins, threads, > and run-time inefficiency.
Yes, those did strike me as limitations to the extent I'm skeptical about networking connectivity. But maybe an XMOS device plus an ESP8266 would be worth considering for some purposes.
> The hardware threading is great, and provides a really easy model for > all sorts of things. To make a UART transmitter, you have a thread that > waits for data coming in on a pipe. To transmit a bit, you set a pin, > wait for a bit time (using hardware timers), then move on to the next > bit. The code is simple and elegant. A UART receiver is not much > harder. There is lots of example code in this style. > > Then you realise that to implement a UART, you have used a quarter of > the chip's resources. Your elegant flashing light is another thread, as > is your PWM output. Suddenly you find you are using a 500 MIPS chip to > do the work of a $0.50 microcontroller, and you only have a thread or > two left for the actual application.
Yes. However I don't care about wasting some resources if it makes the design easier/faster, so long as it isn't too expensive in terms of power and money.
> And you end up trying to run FreeRTOS on one of your threads, or make > your own scheduler to multiplex several PWM channels in one thread. > Much of the elegance quickly disappears for real-world applications.
Needing to run a separate RTOS would be a code smell. That's where the CSP+multicore approach /ought/ to be sufficient. For practicality, I exclude peripheral libraries and networking code from that statement.
> Then there is the software. The XC language lets you write code that > starts tasks in parallel, automatically allocates channels for > communication, lets you declare timers and wait on them. That's all > great in theory - but it quickly gets confusing when you try to figure > out the details of when you can pass these around, when they get > allocated and deallocated, or when you can have a thread create new > threads. XC carefully tracks threads and data accesses, spotting and > blocking all sorts of possible race conditions. If a variable is > written by one thread, then it can't be accessed from another. You can > work with arrays safely, but you can't take addresses. Data gets passed > between threads using communication channels that are safe from race > conditions and nicely synchronised.
That's the kind of thing I'm interested in exploring.
> And then you realise that to actually make the thing work, you would > need far more channels than there are on the device, and they would need > to be far faster - all you really wanted was for two threads to share a > circular buffer, and you know in your application code when it is safe > to use it. But you can't do that in XC - the language and the tools > won't let you. So you have to write that code in C, with calls back and > forth with the XC code that handles the multi-threading stuff.
That's the kind of thing I'm interested in exploring.
> And then you realise that from within the C, you need to access some > hardware resources like timers, that can't be expressed properly in C, > and you can't get back to the XC code at the time. So you end up with > inline assembly.
At which point many advantages would have been lost.
> Then there are the libraries and examples. These were written in such a > wide variety of styles that it was impossible to figure out what was > going on. A typical example project would involve a USB interface and, > for example, SPDIF channels for an USB audio interface. The > Eclipse-based IDE was fine, but the example did not come as a project - > it came as a collection of interdependent projects. Some bits referred > to files in different projects. Some bits merely required other > projects to be compiled. Some bits of the code in one project would use > assembly for hardware resources, others would use XC, others would use C > intrinsic functions, and others would use a sort of XML file that > defines the setup for your chip resources. If you change values in one > file in one project (say, the USB vendor ID), you have to figure out > which sub-projects need to be manually forced to re-build in order for > it to take effect consistently throughout the project. Some parts use a > fairly obvious configuration file - a header with defines that let you > control things like IDs, number of channels, pins, etc. Except they > don't - only /some/ of the sub-projects read and use the configuration > file, other parts are hard-coded or use values from elsewhere. It was a > complete mess.
Irritating, but not fundamental, and as you point out, they could be fixed with application of time and money.
> Now, I know that newer XMOS devices have more resources, built-in flash, > proper hardware peripherals for the devices that are most demanding or > popular, and so on. And I can only hope that the language and tools > have been improved to the point where inline assembly is not required, > and that the examples and libraries have matured to the point that the > libraries are usable as-is, and the examples show practical ways to > develop code. > > I really hope XMOS does well here - it is so good to see a company that > thinks in a very different way and brings in these new ideas. So if > your experience with modern XMOS devices and tools is good, I would love > to hear about it.
I think we are largely in violent agreement. I suspect my "hello world" program will be for their &#4294967295;10 StartKIT to echo anything it receives on the USB line, with a second task flipping uppercase to lowercase. That should give me a feel for a low-end resource usage. Then I'd like to make a reciprocal frequency counter to see how far I can push individual threads and their SERDES-like IO primitives. And I'll probably do some bitbashing to create analogue outputs that draw pretty XY pictures on an analogue scope.

The 2024 Embedded Online Conference