Custom CPU Designs| page 5

Reply by David Brown ●April 18, 20202020-04-18

On 17/04/2020 23:43, upsidedown@downunder.com wrote:
> On Fri, 17 Apr 2020 15:44:15 +0200, David Brown
> <david.brown@hesbynett.no> wrote:
> 
>>
>>>
>>> As you say, the XMOS /ecosystem/ is far more compelling,
>>> partly because it has excellent /integration/ between the
>>> hardware, the software and the toolchain. The latter two
>>> are usually missing.
> 
> The XCALE architecture is nice, if you are comfortable implementing
> whole applications mainly with interrupts. Just assign one core for
> each ISR. When an external HW signal ("interrupt") occurs, restart the
> program in that core. The program needs to be executed before the next
> "interrupt" occurs. Thus implementing e.g.  audio or video sampled
> system is nice. The core task just needs to be fast enough to handle
> one sample.

I have made some microcontroller programs where the main loop is nothing 
more than a "sleep until interrupt" opcode.  I'm fine with such designs.

> 
>> Agreed.  And the XMOS folk have learned and improved.  With the first
>> chips, they proudly showed off that you could make a 100 MBit Ethernet
>> controller in software on an XMOS chip.  Then it was pointed out to them
>> that - impressive achievement though it was - it was basically useless
>> because you didn't have the resources left to use it for much, and
>> hardware Ethernet controllers were much cheaper.  So they brought out
>> new XMOS chips with hardware Ethernet controllers.  The same thing
>> happened with USB.
> 
> On the Ethernet, the minimum MAC frame is 64 bytes, thus new short
> frames may appear every 6,4 us, thus the "ISR" must execute in less
> than 6.4 us. If not possible, let one task just split the frame into
> header and actual payload and use separate cores e.g. to handle the
> MAC and IP headers, Still 8 cores sounds a quite large number,
> 

8 cores is more than the XMOS documentation says, as far as I can see. 
I think it was 3 or 4, but it can depend on the features you want.  And 
I'm sure that at least some of these could be combined with other tasks 
if you don't mind the loss of modularity.

Your timings are a little off - you need to include the preamble, start 
of frame and interpacket gap, so the minimum packet time is 8.4 &micro;s.

Reply by Tom Gardner ●April 18, 20202020-04-18

On 18/04/20 14:06, David Brown wrote:
> On 17/04/2020 18:49, Tom Gardner wrote:
>> On 17/04/20 17:15, David Brown wrote:
>>> On 17/04/2020 16:23, Tom Gardner wrote:
>>>> On 17/04/20 14:44, David Brown wrote:
>>>>> On 17/04/2020 11:49, Tom Gardner wrote:
>>>>>> On 17/04/20 09:02, David Brown wrote:
>>>
>>>>>> As you say, the XMOS /ecosystem/ is far more compelling,
>>>>>> partly because it has excellent /integration/ between the
>>>>>> hardware, the software and the toolchain. The latter two
>>>>>> are usually missing.
>>>>>
>>>>> Agreed.&nbsp; And the XMOS folk have learned and improved.&nbsp; With the first 
>>>>> chips, they proudly showed off that you could make a 100 MBit Ethernet 
>>>>> controller in software on an XMOS chip.&nbsp; Then it was pointed out to them 
>>>>> that - impressive achievement though it was - it was basically useless 
>>>>> because you didn't have the resources left to use it for much, and hardware 
>>>>> Ethernet controllers were much cheaper.&nbsp; So they brought out new XMOS chips 
>>>>> with hardware Ethernet controllers.&nbsp; The same thing happened with USB.
>>>>
>>>> It looks like a USB controller needs ~8 cores, which isn't
>>>> a problem on a 16 core device :)
>>>>
>>>
>>> I've had another look, and I was mistaken - these devices only have the USB 
>>> and Ethernet PHYs, not the MACs, and thus require a lot of processor power, 
>>> pins, memory and other resources.&nbsp; It doesn't need 8 cores, but the whole 
>>> thing just seems so inefficient.&nbsp; No one is going to spend the extra cost for 
>>> an XMOS with a USB PHY, so why not put a hardware USB controller on the 
>>> chip?&nbsp; The silicon costs would surely be minor, and it would save a lot of 
>>> development effort and release resources that are useful for other tasks.  
>>> The same goes for Ethernet. Just because you /can/ make these things in 
>>> software on the XMOS devices, does not make it a good idea.
>>
>> Oh I agree! However, being able to do it in software is a
>> good demonstration of the device's unique characteristics,
>> and that "you aren't in Kansas anymore"
>>
> 
> Indeed.
> 
> The real power comes from when you want to do something that is /not/ standard, 
> or at least not common.
> 
> Implementing a standard UART in a couple of XMOS cores is a pointless waste of 
> silicon.&nbsp; Implementing a UART that uses Manchester encoding for the UART signals 
> so that you can use it on a balanced line without keeping track of which line is 
> which - /then/ you've got something that can be done just as easily on an XMOS 
> and is a big pain to do on a standard microcontroller.
> 
> Implementing an Ethernet MAC on an XMOS is pointless.&nbsp; Implementing an EtherCAT 
> slave is not going to be much harder for the XMOS than a normal Ethernet MAC, 
> but is impossible on any microcontroller without specialised peripherals.

Agreed.



>>> Overall, the thing that bugs me about XMOS is that you can write very simple, 
>>> elegant tasks for the cores to do various tasks.&nbsp; But when you do that, you 
>>> run out of cores almost immediately.&nbsp; So you have to write your code in a way 
>>> that implements your own scheduler, losing a major part of the point of the 
>>> whole system.&nbsp; Or you use the XMOS FreeRTOS port on one of the virtual cores 
>>> - in which case you could just switch to a Cortex-M microcontroller with 
>>> hardware USB, Ethernet, PWM, UART, etc. and a fraction of the price.
>>
>> I didn't know they had a FreeRTOS port, and it sounds
>> like having a dog and barking :) Sounds like it would
>> combine the disadvantages and negate the advantages!
>>
> 
> Some real-time stuff needs microsecond or sub-microsecond precision - XMOS lets 
> you do that in software on a core, while normally you'd do it in dedicated 
> peripherals on a microcontroller.&nbsp; But a lot needs millisecond or sub-second 
> precision, and FreeRTOS is absolutely fine for that.&nbsp; (As are other methods, 
> such as software timers.)
> 
>> Having said that, they did have a chip where one of the
>> processors was an ARM. Perhaps it was intended that the
>> ARM run FreeRTOS?
> 
> I haven't seen such a chip.&nbsp; Do you have a link?&nbsp; It could be an interesting 
> device.

Not offhand, and I don't think it is made anymore.

I pegged it as "seems nice at first glance, but
/two/ toolchains and what are the limitations on
the ARM?" -- and decided I didn't want to bother.



>>> If the XMOS devices and software had a way of neatly multi-tasking /within/ a 
>>> single virtual core, while keeping the same kind of inter-task communication 
>>> and other benefits, then they would have something I could see being very nice.
>>
>> There is a half-way house.
>>
>> If you adopt a certain coding style, the IDE will combine
>> several processes to run on a single processor.
>>
>> Basically it is equivalent to appending all the process'
>> "startup" code into a single block, and all the "forever
>> loop" code into a single block. The key bit is combining
>> all the process' select statements into as single select
>> statement.
>>
>> With that understanding, the coding style requirements
>> become obvious, not onerous, and they are checked by the
>> compiler.
> 
> I understand the principle, but you'd lose some of the modularity here. How well 
> does it work if you want to have your UART, your CAN, your PWM, etc., defined in 
> different files - and then you want to put them on the same core?&nbsp; I guess it 
> will be possible.

Yup, see bottom of post, and for a different optimisation:
distributable functions.



>>>> To me a large part of the attraction is that you can
>>>> /predict/ the /worst/ case latency and jitter (and hence
>>>> throughput), in a way that is difficult in a standard MCU
>>>> and easy in an FPGA.
>>>
>>> For standard MCU's, you aim to do this by using hardware peripherals (timers, 
>>> PWM blocks, communication controllers, etc.) for the most timing-critical 
>>> stuff.&nbsp; Then you don't need it in the software.
>>
>> Yebbut, the toolset won't analyse and predict worst case
>> performance. So you are back to "run it and hope we stumble
>> upon the worst case".
>>
> 
> The peripherals are independent, and specified in documentation.&nbsp; If the PWM 
> timer block can do 16-bit precision at 120 MHz, then you know its limits - and 
> it doesn't matter how many UARTs you use or how fast you want your SPI bus to 
> run.&nbsp; You don't need to analyse the timings - that was done when the chip was 
> designed.
> 
> You need to check the speeds of your high-level software that uses the modules, 
> but that applies to XMOS code too.
> 
>> Yes, that is sufficient in many cases, but it is /inelegant/,
>> dammit!
>>
> 
> There is always room for improvement and extra tools!

And that's recursive, iterative and fractal, and probably
another few woo-words as wwell :)


>>>> To that extent it allows FPGA-like performance with "traditional"
>>>> software development tools and methodologies. Plus a little
>>>> bit of "thinking parallel" that everybody will soon /have/ to
>>>> be doing :)
>>>
>>> It's a nice idea, and I'm sure XMOS has some good use-cases.&nbsp; But I can't 
>>> help feeling they have something that is /almost/ a good system - with a bit 
>>> more, they could be very much more useful.
>>
>> There's no doubt it is niche.
>>
>> The world will go parallel. A major difficulty will be finding
>> people that can think that way. (Just look at how difficult
>> softies find it when they try to "program" VHDL/Verilog)
>>
>> We need all the tools and concepts we can muster; my fear
>> is that CSP is the best! :)
> 
> I'd like a good way to do CSP stuff in C or C++.&nbsp; I've been looking at passing 
> std::variant types in message queues in FreeRTOS, but I'm not happy with the 
> results yet.&nbsp; I think I'll have to make my own alternative to std::variant, to 
> make it more efficient for the task.

It always amazed me that Boehm had to write his paper
indicating that you couldn't write threads as a library.
I presume that is not longer the case.


> If I ever get time, I will dig out my old XMOS kit and see how the current tools 
> are looking.&nbsp; If I can find a neat way to get your "half-way house" system 
> working, it will increase a good deal in its attractiveness for me.

 From my copy of The Programming Guide version F (there
may be a later variant).

That guide has the usual succinct and relevant examples
and pictures illustrating what is happening. If only
all documentation was as short and sweet.

*2.3 Creating tasks for flexible placement*

xC programs are built up from several tasks running in
parallel. These tasks can be of several different types
that can be used in different ways. The following [table]
shows the different types:

Normal tasks run on a logical core and run independently
to other tasks. The tasks have predictable running time
and can respond very efficiently to external events.

Combinable tasks can be combined to have several tasks
running on the same logical core. The core swaps context
based on cooperative multitasking between the tasks driven
by the compiler.

Distributable tasks can run over several cores, running
when required by the tasks connected to them. Using
these different tasks types you can maximize the resource
usage of the device depending on the form and timing
requirements of your tasks.

*2.3.1 Combinable functions*
If a tasks ends in an never-ending loop containing a
select statement, it represents a task that continually
reacts to events:
void task1 ( args ) {
  .. initialization ...
  while (1) {
   select {
    case ... :
     break ;
    case ... :
     break ;
    ...
   }
  }
}

If a function complies to this format then it can be marked
as combinable by adding the combinable attribute:
[[ combinable ]]

*2.3.2 Distributable functions*
Sometime tasks contain state and provide services to other tasks,
but do not need to react to any external events on their own.
These kinds of tasks only run any code when communicating with
other tasks. As such they do not need a core of their own but
can share the logical cores of the tasks they communicate with
(as shown in Figure 12).
More formally, a task can be marked as distributable if:
  - It satisfies the conditions to be combinable (i.e. ends
    in a never-ending loop containing a select)
  - The cases within that select only respond to interface
    transactions
...
A distributable task can be implemented very efficiently if
all the tasks it connects to are on the same tile. In this
case the compiler will not allocate it a logical core of
its own.

Reply by Theo ●April 18, 20202020-04-18

David Brown <david.brown@hesbynett.no> wrote:
> The real power comes from when you want to do something that is /not/ 
> standard, or at least not common.
> 
> Implementing a standard UART in a couple of XMOS cores is a pointless 
> waste of silicon.  Implementing a UART that uses Manchester encoding for 
> the UART signals so that you can use it on a balanced line without 
> keeping track of which line is which - /then/ you've got something that 
> can be done just as easily on an XMOS and is a big pain to do on a 
> standard microcontroller.
> 
> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an 
> EtherCAT slave is not going to be much harder for the XMOS than a normal 
> Ethernet MAC, but is impossible on any microcontroller without 
> specialised peripherals.

The Cypress PSoC has an interesting take on this.  You can specify (with the
GUI) that you want a component.  If you specify a simple component (let's
say I2C slave) there's a hard IP for that.  But if you specify something
that's more complicated (say I2C master and slave on the same pins) it
builds it with the existing IP plus some of its FPGA-like logic.  Takes more
resources but allows you to do many more things than they put in as hard
cores.

Unfortunately they don't provide a vast quantity of cells for that logic, so
it's fine if you want to add just a few unusual bits to the regular
microcontroller, but not a big system.  (PSoC also has the programmable
analogue stuff, which presumably constrains the process they can use)

It would be quite interesting to combine that with the XMOS approach - more
fluid boundaries between hardware and software.

I'm a bit surprised XMOS don't provide 'soft realtime' virtual cores - lock
down the cores running a task that absolutely needs to be bounded-latency,
and then multitask the remaining tasks across the other cores.  If that was
provided as an integrated service then it wouldn't need messing about
running a scheduler.

After all, there must be applications with a lot of do-this-once-a-second
tasks, that would be wasted using a whole core?  Do they have a scheduler
where you can tell a task to sleep until a particular time?

Or is FreeRTOS intended for that?  In which case you presumably have to
write the code in a different way compared to the hard-tasks?

Theo

Reply by Tom Gardner ●April 18, 20202020-04-18

On 18/04/20 16:51, Theo wrote:
> David Brown <david.brown@hesbynett.no> wrote:
>> The real power comes from when you want to do something that is /not/
>> standard, or at least not common.
>>
>> Implementing a standard UART in a couple of XMOS cores is a pointless
>> waste of silicon.  Implementing a UART that uses Manchester encoding for
>> the UART signals so that you can use it on a balanced line without
>> keeping track of which line is which - /then/ you've got something that
>> can be done just as easily on an XMOS and is a big pain to do on a
>> standard microcontroller.
>>
>> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an
>> EtherCAT slave is not going to be much harder for the XMOS than a normal
>> Ethernet MAC, but is impossible on any microcontroller without
>> specialised peripherals.
> 
> The Cypress PSoC has an interesting take on this.  You can specify (with the
> GUI) that you want a component.  If you specify a simple component (let's
> say I2C slave) there's a hard IP for that.  But if you specify something
> that's more complicated (say I2C master and slave on the same pins) it
> builds it with the existing IP plus some of its FPGA-like logic.  Takes more
> resources but allows you to do many more things than they put in as hard
> cores.
> 
> Unfortunately they don't provide a vast quantity of cells for that logic, so
> it's fine if you want to add just a few unusual bits to the regular
> microcontroller, but not a big system.  (PSoC also has the programmable
> analogue stuff, which presumably constrains the process they can use)
> 
> It would be quite interesting to combine that with the XMOS approach - more
> fluid boundaries between hardware and software.
> 
> I'm a bit surprised XMOS don't provide 'soft realtime' virtual cores - lock
> down the cores running a task that absolutely needs to be bounded-latency,
> and then multitask the remaining tasks across the other cores.  If that was
> provided as an integrated service then it wouldn't need messing about
> running a scheduler.

What scheduler?

To a very useful approximation the an RTOS's functions are
encoded in hardware.

The "select" statement is like a "switch" statement, except
that the core sleeps until one of the "case" conditions
becomes true. In effect the case conditions are events.
Events can be inputs arriving, outputs completing, timeouts, etc.

N.B. as far as a core is concerned, receiving input from a
pin is the same as receiving a message from another task/core.
Ditto output to a pin and sending a message to another task/core.
Everything is via the "port" abstraction, and that extends down
into the hardware and i/o system.

For a short intro to the hardware, software, and concepts see the
architecture flyer.
https://www.xmos.com/file/xcore-architecture-flyer/

FFI see the XMOS programming guide, which is beautifully written,
succinct, and clear.
https://www.xmos.com/file/xmos-programming-guide/

I wish all documentation was as good!

> After all, there must be applications with a lot of do-this-once-a-second
> tasks, that would be wasted using a whole core?  Do they have a scheduler
> where you can tell a task to sleep until a particular time?

See "combinable tasks" in my post in response to David Brown's
post.

> Or is FreeRTOS intended for that?  In which case you presumably have to
> write the code in a different way compared to the hard-tasks?

I really don't see the point of FreeRTOS running inside an
xCORE chip. To be over simplistic, RTOSs are designed to
schedule multiple  tasks on a single processor and to connect
i/o with tasks (typically via interrupts).

All that is done in hardware in the xCORE ecosystem.

Reply by Rick C ●April 18, 20202020-04-18

On Saturday, April 18, 2020 at 9:06:57 AM UTC-4, David Brown wrote:
> On 17/04/2020 18:49, Tom Gardner wrote:
> > On 17/04/20 17:15, David Brown wrote:
> >> On 17/04/2020 16:23, Tom Gardner wrote:
> >>> On 17/04/20 14:44, David Brown wrote:
> >>>> On 17/04/2020 11:49, Tom Gardner wrote:
> >>>>> On 17/04/20 09:02, David Brown wrote:
> >>
> >>>>> As you say, the XMOS /ecosystem/ is far more compelling,
> >>>>> partly because it has excellent /integration/ between the
> >>>>> hardware, the software and the toolchain. The latter two
> >>>>> are usually missing.
> >>>>
> >>>> Agreed.&nbsp; And the XMOS folk have learned and improved.&nbsp; With the 
> >>>> first chips, they proudly showed off that you could make a 100 MBit 
> >>>> Ethernet controller in software on an XMOS chip.&nbsp; Then it was 
> >>>> pointed out to them that - impressive achievement though it was - it 
> >>>> was basically useless because you didn't have the resources left to 
> >>>> use it for much, and hardware Ethernet controllers were much 
> >>>> cheaper.&nbsp; So they brought out new XMOS chips with hardware Ethernet 
> >>>> controllers.&nbsp; The same thing happened with USB.
> >>>
> >>> It looks like a USB controller needs ~8 cores, which isn't
> >>> a problem on a 16 core device :)
> >>>
> >>
> >> I've had another look, and I was mistaken - these devices only have 
> >> the USB and Ethernet PHYs, not the MACs, and thus require a lot of 
> >> processor power, pins, memory and other resources.&nbsp; It doesn't need 8 
> >> cores, but the whole thing just seems so inefficient.&nbsp; No one is going 
> >> to spend the extra cost for an XMOS with a USB PHY, so why not put a 
> >> hardware USB controller on the chip?&nbsp; The silicon costs would surely 
> >> be minor, and it would save a lot of development effort and release 
> >> resources that are useful for other tasks.&nbsp; The same goes for 
> >> Ethernet. Just because you /can/ make these things in software on the 
> >> XMOS devices, does not make it a good idea.
> > 
> > Oh I agree! However, being able to do it in software is a
> > good demonstration of the device's unique characteristics,
> > and that "you aren't in Kansas anymore"
> > 
> 
> Indeed.
> 
> The real power comes from when you want to do something that is /not/ 
> standard, or at least not common.
> 
> Implementing a standard UART in a couple of XMOS cores is a pointless 
> waste of silicon.  Implementing a UART that uses Manchester encoding for 
> the UART signals so that you can use it on a balanced line without 
> keeping track of which line is which - /then/ you've got something that 
> can be done just as easily on an XMOS and is a big pain to do on a 
> standard microcontroller.

I take your point even if I don't agree with your example.  

I don't follow your comment about "which line is which".  Manchester encoding uses the polarity of transitions for the data being transmitted.  To set up for the correct polarity data transition there are extra transitions which should be ignored.  To properly decode the data requires having the correct polarity of the signal at the input.  Swap lines and you get wrong data.  

It's not hard to receive Manchester encoded data on a typical MCU.  They virtually all have transition based interrupts on I/O pins.  Enable the interrupts and on each transition grab a timer value.  It would even be easy to have it auto-baud rate detect as the values should have two primary modes.  Why do you say this is a hard thing to do???


> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an 
> EtherCAT slave is not going to be much harder for the XMOS than a normal 
> Ethernet MAC, but is impossible on any microcontroller without 
> specialised peripherals.

That is exactly the point of the MCU market, the huge variety of combinations of hard peripherals available these days.  Nearly every maker of MCUs will have an almost perfect match for your needs.  Peripherals may use some real estate on a chip, but the lion's share is simply the memory.  These days MCU chips are more memory chips with built in CPUs and peripherals than CPU chips. 

So your point of adding an external Ethernet MAC to MCUs is a red herring. 

I just noticed you said "EtherCAT" which I was not aware of.  Seems this is some other protocol than Ethernet.  Ok, maybe that will be just as easy on either type of MCU since it is a relatively niche application.  Once the market grows you can be assured MCUs will be commonly available with EtherCAT in hard peripherals. 


> >> Overall, the thing that bugs me about XMOS is that you can write very 
> >> simple, elegant tasks for the cores to do various tasks.&nbsp; But when you 
> >> do that, you run out of cores almost immediately.&nbsp; So you have to 
> >> write your code in a way that implements your own scheduler, losing a 
> >> major part of the point of the whole system.&nbsp; Or you use the XMOS 
> >> FreeRTOS port on one of the virtual cores - in which case you could 
> >> just switch to a Cortex-M microcontroller with hardware USB, Ethernet, 
> >> PWM, UART, etc. and a fraction of the price.
> > 
> > I didn't know they had a FreeRTOS port, and it sounds
> > like having a dog and barking :) Sounds like it would
> > combine the disadvantages and negate the advantages!
> > 
> 
> Some real-time stuff needs microsecond or sub-microsecond precision - 
> XMOS lets you do that in software on a core, while normally you'd do it 
> in dedicated peripherals on a microcontroller.  But a lot needs 
> millisecond or sub-second precision, and FreeRTOS is absolutely fine for 
> that.  (As are other methods, such as software timers.)

We have hashed this out many times.  XMOS is a better solution for a very small range of applications in a niche area between FPGAs and MCUs.  It is a small area because MCUs are very popular, well understood and available with a very wide range of memory and peripherals to suit nearly every application where the speed of the processor is not the limitation.  Once you exceed that range there is only a small increment before the requirements outpace the XMOS capabilities and FPGAs are much better solutions.  One big advantage FPGAs have over XMOS is flexibility.  Need one small processor plus fast logic, boom!  Need lots of small processors and not so much logic, boom!  Need really fast processors plug lots of logic tightly integrated, boom!  

The XMOS shoe really only fits one size of feet. 


> > Having said that, they did have a chip where one of the
> > processors was an ARM. Perhaps it was intended that the
> > ARM run FreeRTOS?
> 
> I haven't seen such a chip.  Do you have a link?  It could be an 
> interesting device.
> 
> > 
> > 
> >> If the XMOS devices and software had a way of neatly multi-tasking 
> >> /within/ a single virtual core, while keeping the same kind of 
> >> inter-task communication and other benefits, then they would have 
> >> something I could see being very nice.
> > 
> > There is a half-way house.
> > 
> > If you adopt a certain coding style, the IDE will combine
> > several processes to run on a single processor.
> > 
> > Basically it is equivalent to appending all the process'
> > "startup" code into a single block, and all the "forever
> > loop" code into a single block. The key bit is combining
> > all the process' select statements into as single select
> > statement.
> > 
> > With that understanding, the coding style requirements
> > become obvious, not onerous, and they are checked by the
> > compiler.
> 
> I understand the principle, but you'd lose some of the modularity here. 
> How well does it work if you want to have your UART, your CAN, your PWM, 
> etc., defined in different files - and then you want to put them on the 
> same core?  I guess it will be possible.
> 
> > 
> > 
> >>>> There is a lot to like about XMOS devices and tools, but they still 
> >>>> strike me as a solution in search of a problem.&nbsp; An elegant 
> >>>> solution, perhaps, but still missing a problem.&nbsp; We used them for a 
> >>>> project many years ago for a USB Audio Class 2 device.&nbsp; There simply 
> >>>> were no realistic alternatives at the time, but I can't say the XMOS 
> >>>> solution was a good one.&nbsp; The device has far too little memory to 
> >>>> make sensible buffers (this still applies to XMOS devices, last I 
> >>>> looked), and the software at the time was painful (this I believe 
> >>>> has improved significantly).&nbsp; If we were making a new version of the 
> >>>> product, we'd drop the XMOS device in an instant and use an 
> >>>> off-the-shelf chip instead.
> >>>
> >>> I certainly wouldn't want to comment on your use case.
> >>
> >> As I said, it was a while ago, when XMOS were relatively new - I 
> >> assume the software, libraries and examples are better now than at 
> >> that time. But for applications like ours, you can just get a CMedia 
> >> chip and wire it up - no matter how good XMOS tools have become, they 
> >> don't beat that.
> >>
> >> (And then all the development budget can be spent on trying to get 
> >> drivers to work on idiotic Windows systems...)
> > 
> > What is this "Windows" of which you speak?
> > 
> 
> Something some customers have.  It is a system designed to be as 
> inconvenient for developers as humanly possible.
> 
> > 
> >>> To me a large part of the attraction is that you can
> >>> /predict/ the /worst/ case latency and jitter (and hence
> >>> throughput), in a way that is difficult in a standard MCU
> >>> and easy in an FPGA.
> >>
> >> For standard MCU's, you aim to do this by using hardware peripherals 
> >> (timers, PWM blocks, communication controllers, etc.) for the most 
> >> timing-critical stuff.&nbsp; Then you don't need it in the software.
> > 
> > Yebbut, the toolset won't analyse and predict worst case
> > performance. So you are back to "run it and hope we stumble
> > upon the worst case".
> > 
> 
> The peripherals are independent, and specified in documentation.  If the 
> PWM timer block can do 16-bit precision at 120 MHz, then you know its 
> limits - and it doesn't matter how many UARTs you use or how fast you 
> want your SPI bus to run.  You don't need to analyse the timings - that 
> was done when the chip was designed.
> 
> You need to check the speeds of your high-level software that uses the 
> modules, but that applies to XMOS code too.

This is an area where FPGAs excel.  Timing is relatively easy to verify and the details of coordinating the various processes is trivial.  

It's an area where MCUs can be very difficult to analyze. 

-- 

  Rick C.

  ++- Get 1,000 miles of free Supercharging
  ++- Tesla referral code - https://ts.la/richard11209

Reply by Paul Rubin ●April 18, 20202020-04-18

David Brown <david.brown@hesbynett.no> writes:
> I'd like a good way to do CSP stuff in C or C++.

https://seastar.io ?

Reply by Tom Gardner ●April 18, 20202020-04-18

On 18/04/20 21:09, Paul Rubin wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> I'd like a good way to do CSP stuff in C or C++.
> 
> https://seastar.io ?

We are thinking of Tony Hoare's Concurrent Sequential
Processes calculus. It doesn't look like that is the
seastar offering.

There are many many frameworks that split jobs across
cores. They are trivial in Java, and I implemented a
highly performant one for telecoms applications 16
years ago (gulp). The half-sync-half-async design
pattern works very well in such applications.

Reply by Rick C ●April 18, 20202020-04-18

On Saturday, April 18, 2020 at 10:57:55 AM UTC-4, David Brown wrote:
> On 17/04/2020 19:48, Rick C wrote:
> > On Friday, April 17, 2020 at 12:15:37 PM UTC-4, David Brown wrote:
> >> On 17/04/2020 16:23, Tom Gardner wrote:
> >>> On 17/04/20 14:44, David Brown wrote:
> >>>> On 17/04/2020 11:49, Tom Gardner wrote:
> >>>>> On 17/04/20 09:02, David Brown wrote:
> >> 
> >>>>> As you say, the XMOS /ecosystem/ is far more compelling, 
> >>>>> partly because it has excellent /integration/ between the 
> >>>>> hardware, the software and the toolchain. The latter two are
> >>>>> usually missing.
> >>>> 
> >>>> Agreed.  And the XMOS folk have learned and improved.  With the
> >>>> first chips, they proudly showed off that you could make a 100
> >>>> MBit Ethernet controller in software on an XMOS chip.  Then it
> >>>> was pointed out to them that - impressive achievement though it
> >>>> was - it was basically useless because you didn't have the
> >>>> resources left to use it for much, and hardware Ethernet
> >>>> controllers were much cheaper.  So they brought out new XMOS
> >>>> chips with hardware Ethernet controllers.  The same thing 
> >>>> happened with USB.
> >>> 
> >>> It looks like a USB controller needs ~8 cores, which isn't a
> >>> problem on a 16 core device :)
> >>> 
> >> 
> >> I've had another look, and I was mistaken - these devices only have
> >> the USB and Ethernet PHYs, not the MACs, and thus require a lot of
> >> processor power, pins, memory and other resources.  It doesn't need
> >> 8 cores, but the whole thing just seems so inefficient.  No one is
> >> going to spend the extra cost for an XMOS with a USB PHY, so why
> >> not put a hardware USB controller on the chip?  The silicon costs
> >> would surely be minor, and it would save a lot of development
> >> effort and release resources that are useful for other tasks.  The
> >> same goes for Ethernet.  Just because you /can/ make these things
> >> in software on the XMOS devices, does not make it a good idea.
> >> 
> >> Overall, the thing that bugs me about XMOS is that you can write
> >> very simple, elegant tasks for the cores to do various tasks.  But
> >> when you do that, you run out of cores almost immediately.  So you
> >> have to write your code in a way that implements your own
> >> scheduler, losing a major part of the point of the whole system.
> >> Or you use the XMOS FreeRTOS port on one of the virtual cores - in
> >> which case you could just switch to a Cortex-M microcontroller with
> >> hardware USB, Ethernet, PWM, UART, etc. and a fraction of the
> >> price.
> > 
> > Too bad the XMOS doesn't have more CPUs, like maybe 144 of them?
> > 
> 
> 144 cpus is far more than would be useful in practice - as long as you 
> have cores that can do useful work in a flexible way (like XMOS cores). 
>   When you have very limited cores that can barely do anything 
> themselves, you need lots of them.

The point of the XMOS concept is to not have to do multitasking on a single core.  As soon as your task number exceeds the number of cores you have to do multitasking on a single core.  No one ever complains about having too many resources. 

The design process for many small CPUs is not the same as a few larger CPUs.  But the point is it can be simpler, more like designing hardware.  

Atmel had an FPGA with a simple logic element equivalent to a 3 input LUT and FF (done with a small number of gates rather than the larger LUT structure).  It had relatively little routing, connecting logic elements to adjacent logic elements almost as the only routing.  The theory was the simplicity of the logic elements meant you could have a lot more of them for a given amount of real estate/$$$ so you could afford to use them for routing.  In practice they never kept up with state of the art process technology, so this didn't pan out.  But that is the concept of the many small cores.  You don't have to focus on being highly efficient with your CPU resources since you have so many. 


> (Arguably that's what you have on an FPGA - lots of tiny bits that do 
> very little on their own.  But the key difference is the tools - FPGA's 
> would be a lot less popular if you had to code each LU individually, do 
> placement manually by numbering them, and write a routing file by hand.)

Huh?  Why can't any of that be automated in the many CPU chip?  Certainly the issue is not so important with only 144 processors, but the same size chip would have many thousands of processors in a more modern technology.  I think the GA144 is 180 nm.  Bring that down to 15 nm and you have 15,000 nodes on a chip just 5 mm square!!!  That will require better tools for sure.  

Some of your use of language is interesting.  "lots of tiny bits that do very little on their own".  That sounds like the way people write software.  They decompose a large, complex design into lots of tiny routines that individually do little on their own.  How can you manage all that complexity???  Yes, I think the Zeno paradox must apply so that no large program can ever be finished.  That certainly happens sometimes. 


> >> If the XMOS devices and software had a way of neatly multi-tasking 
> >> /within/ a single virtual core, while keeping the same kind of 
> >> inter-task communication and other benefits, then they would have 
> >> something I could see being very nice.
> > 
> > Is a "virtual core" one CPU?  Multitasking a single CPU is the thing
> > the XMOS is supposed to eliminate, no?  Why bring it back?  Oh,
> > because there aren't enough CPUs on the XMOS for some applications!
> > So it's back to the fast ARM processors and multitasking.
> > 
> 
> When you are writing multi-tasking code, you often want a lot of tasks. 
>   More than 8.  (Sometimes, just to be more awkward, you want them to be 
> created dynamically.)  Any specific limit is an inconvenient limit.
> 
> There are always ways to get round this - like using an RTOS on one 
> virtual core of the XMOS.  But you lose some of the symmetry and 
> convenience that way.

"Some of the symmetry and convenience"?  That's supposed to be the whole point!  


> (I understand why the XMOS is designed the way it is - any system is 
> going to be a compromise between what the hardware designers can do 
> practically, and what the software designers want.)
> 
> > I seem to recall there being asymmetric multicores from various ARM
> > makers with one fast CPU for multitasking and a smaller CPU for
> > handling the lesser real time tasks without interference.
> > 
> 
> Yes - you have a Cortex-A cpu that can handle high throughtput, perhaps 
> with several cores, combined with a Cortex-M device that is more 
> deterministic and can give more real-time control.

That's also the idea to CPUs in FPGAs.  


> > That's a good combination, but again, a more specific target market.
> > It seems that is how the CPU market has gone.  The volumes are so
> > high there are many niche areas justifying their own type of SoC to
> > address it.
> > 
> 
> These are becoming increasingly common - they are no longer niche.  If 
> you have a system that needs the processing power of a bigger cpu (for 
> screens, image handling, embedded Linux, the convenience and low 
> development costs of high-level languages and off-the-shelf libraries, 
> etc.) then having a small cpu for handling ADC's, timers, PWM, UARTs, 
> power management, keys, and that kind of thing is a big win.
> 
> Even combinations of fast M7 or M4 cores with an M0 core are common, 
> especially if you have a specific task for the small core (like running 
> a Bluetooth stack for an embedded wireless device).

I was talking in general, not just to asymmetric multiprocessors, peripherals, memory, I/O... 


> >>>> There is a lot to like about XMOS devices and tools, but they
> >>>> still strike me as a solution in search of a problem.  An
> >>>> elegant solution, perhaps, but still missing a problem.  We
> >>>> used them for a project many years ago for a USB Audio Class 2
> >>>> device.  There simply were no realistic alternatives at the
> >>>> time, but I can't say the XMOS solution was a good one.  The
> >>>> device has far too little memory to make sensible buffers (this
> >>>> still applies to XMOS devices, last I looked), and the software
> >>>> at the time was painful (this I believe has improved 
> >>>> significantly).  If we were making a new version of the
> >>>> product, we'd drop the XMOS device in an instant and use an
> >>>> off-the-shelf chip instead.
> >>> 
> >>> I certainly wouldn't want to comment on your use case.
> >> 
> >> As I said, it was a while ago, when XMOS were relatively new - I
> >> assume the software, libraries and examples are better now than at
> >> that time. But for applications like ours, you can just get a
> >> CMedia chip and wire it up - no matter how good XMOS tools have
> >> become, they don't beat that.
> >> 
> >> (And then all the development budget can be spent on trying to get 
> >> drivers to work on idiotic Windows systems...)
> >> 
> >>> 
> >>> To me a large part of the attraction is that you can /predict/
> >>> the /worst/ case latency and jitter (and hence throughput), in a
> >>> way that is difficult in a standard MCU and easy in an FPGA.
> >> 
> >> For standard MCU's, you aim to do this by using hardware
> >> peripherals (timers, PWM blocks, communication controllers, etc.)
> >> for the most timing-critical stuff.  Then you don't need it in the
> >> software.
> > 
> > And all that hardware costs chip space which you may or may not use.
> > That's why they have so many flavors, to give the "perfect"
> > combination of memory, peripherals and analog to minimize cost for
> > each project.
> 
> The silicon costs of basic peripherals like timers and UARTs is tiny - 
> generally close to irrelevant.  Ethernet is a bit more costly, and some 
> kinds of peripherals have additional cost such as royalties (this used 
> to be the case for CAN controllers until the patents ran out).  Analogue 
> parts can be most costly in silicon space, especially if they need 
> calibrating in some way.  Much of the cost of peripherals is in the IO 
> blocks and drivers, and the multiplexing and signal routing to support them.
> 
> Memory blocks - while simple - usually take up a much bigger part of the 
> die area.

Yes, memory is the big dog.  Peripherals aren't quite irrelevant though, in total they are often as large or larger than the CPU. 


> > FPGAs have a cost overhead which is fading into the background as
> > they become more and more efficient.  For many designs an FPGA
> > provides a good trade off between cost and flexibility.  In many
> > cases it also provides a functionality that can't be duplicated
> > elsewhere.
> 
> Sure, FPGAs have their uses - including areas where they are the only 
> sensible solution, and areas of overlap where either microcontrollers or 
> FPGAs could do the job.
> 
> When looking at the cost of making these choices, there are three main 
> parts.  The development costs, the production costs, and the lifetime 
> costs.  How you balance these will depend on the type of product, the 
> quantities you make, the use of the product, and its expected lifetime. 
>   So no single answer is every going to be "right".  But one thing is 
> very clear - for developers and companies that have done a lot of FPGA 
> development, the costs of developing a new FPGA-based device will be far 
> smaller than for a company that has not done such systems before.  Don't 
> assume that because FPGA design is cheap for /you/ to do, that it is 
> necessarily cheap for others.
> 
> The opposite is true as well - there are plenty of boards made where an 
> FPGA (or other programmable logic) would make things simpler and 
> cheaper, but is not seriously considered because programmable logic is 
> often viewed as expensive and difficult.

Yeah, that's why I get work.  I make it easy.

-- 

  Rick C.

  +++ Get 1,000 miles of free Supercharging
  +++ Tesla referral code - https://ts.la/richard11209

Reply by Rick C ●April 18, 20202020-04-18

On Saturday, April 18, 2020 at 11:51:55 AM UTC-4, Theo wrote:
> David Brown <david.brown@hesbynett.no> wrote:
> > The real power comes from when you want to do something that is /not/ 
> > standard, or at least not common.
> > 
> > Implementing a standard UART in a couple of XMOS cores is a pointless 
> > waste of silicon.  Implementing a UART that uses Manchester encoding for 
> > the UART signals so that you can use it on a balanced line without 
> > keeping track of which line is which - /then/ you've got something that 
> > can be done just as easily on an XMOS and is a big pain to do on a 
> > standard microcontroller.
> > 
> > Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an 
> > EtherCAT slave is not going to be much harder for the XMOS than a normal 
> > Ethernet MAC, but is impossible on any microcontroller without 
> > specialised peripherals.
> 
> The Cypress PSoC has an interesting take on this.  You can specify (with the
> GUI) that you want a component.  If you specify a simple component (let's
> say I2C slave) there's a hard IP for that.  But if you specify something
> that's more complicated (say I2C master and slave on the same pins) it
> builds it with the existing IP plus some of its FPGA-like logic.  Takes more
> resources but allows you to do many more things than they put in as hard
> cores.

It is hard to talk about the PSoC devices because they have so many very different flavors, each different in significant ways.  They have some devices that don't actually have any programmable logic, rather just a couple of flexible serial ports that can be a variety of serial devices.  Others have logic that is at a lower level of complexity, but typically very different from FPGAs.  The comparison is not very useful.  

The main difference is that while they may support Verilog, I'm pretty sure it is not a very complete implementation in that you can't describe arbitrary logic with it.  I don't know this for certain, but the devices are so limited in the logic they contain that it would be hard to do much with them.  I'd like to see a FSM implementation of something bigger than a stop light or elevator. 

> Unfortunately they don't provide a vast quantity of cells for that logic, so
> it's fine if you want to add just a few unusual bits to the regular
> microcontroller, but not a big system.  (PSoC also has the programmable
> analogue stuff, which presumably constrains the process they can use)

I believe that is correct for a pretty wide range of "big".  

I don't think the analog logic constrains the programmability so much, but the digital makes it hard to include very good analog.  Programmable logic is not silicon efficient which is typically dealt with by using the most advanced processes and smallest feature sizes.  That's not compatible with Flash.  

-- 

  Rick C.

  ---- Get 1,000 miles of free Supercharging
  ---- Tesla referral code - https://ts.la/richard11209

Reply by Paul Rubin ●April 18, 20202020-04-18

Rick C <gnuarm.deletethisbit@gmail.com> writes:
> Some of your use of language is interesting.  "lots of tiny bits that
> do very little on their own".  That sounds like the way people write
> software.  They decompose a large, complex design into lots of tiny
> routines that individually do little on their own.  How can you manage
> all that complexity???

Use abstractions to reason about large chunks of the code, and encode
the abstractions and the reasoning into the code itself so that the
compiler can check the reasoning's validity.  Checking reasoning is
basically what a static type system does.

There is something of a "wheel of karma" going on, where bigger
abstractions make reasoning simpler, decreasing the amount of compiler
checking needed.  So unchecked languages like assembler and Forth led to
checked ones like C, but then Python delivered bigger abstractions which
got rid of the need for some checking, but then Python programs got
bigger so now Python has grown a static checking feature ("mypy").
After that is probably something like Wolfram Alpha, which I haven't used.

> Yes, I think the Zeno paradox must apply so that no large program can
> ever be finished.  That certainly happens sometimes.

The sayign I'm used to is that no program is ever finished til the last
user is dead. ;)

Previous 3 456 7 8 Next

Custom CPU Designs

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group