Custom CPU Designs| page 4

Reply by Theo ●April 17, 20202020-04-17

Grant Edwards <invalid@invalid.invalid> wrote:
> Once I got a UART working so I count print messages, I just gave up on
> the JTAG BS.  Another interesting quirk was that the Altera USB JTAG
> interface only worked right with a few specific models of powered USB
> hubs.

I've spent months working around such problems :(
We have an application that pushes gigabytes through JTAG UARTs and have
learnt all about it...

There's a pile of specific issues:

- the USB 1.1 JTAG is an FT245 chip which basically bitbangs JTAG; it sends
a byte containing 4 bits for the 4 JTAG wires.  The software is literally
saying "clock high, clock low, clock high, clock low" etc.  Timing of that
is not reliable.  Newer development boards have a USB 2.0 programmer
where things are a bit better here, but it's still bitbanging.

- being USB 1.1, if you have a cheap USB 2.0 hub it may only support USB SST
which means all USB 1.1 peripherals share 12Mbps of bandwidth.  In our case
we have 16 FPGAs all trying to chat using that shared 12Mbps bandwidth. 
Starvation occurs and nobody makes any progress.
A better hub with MST will allow multiple 12Mbps streams to share the
480Mbps USB 2.0 bandwidth.  Unfortunately when you buy a hub this is never
advertised or explained. 

- The software daemon that generates the bitbanging data is called jtagd and
it's single threaded.  It can max out a CPU core bitbanging, and that can
lead to unreliability.  I had an Atom where it was unusable.  I now install
i7s in servers with FPGAs, purely to push bits down the JTAG wire.

- To parallelise downloads to multiple FPGAs, I've written some horrible
containerisation scripts that lie to each jtagd there's only one FPGA in
tte system.  Then I can launch 16 jtagds and use all 16 cores in my system
to push traffic through the JTAG UARTs

- Did I mention that programming an FPGA takes about 700MB?  So I need to
fit at least 8GB of RAM to avoid memory starvation when doing parallel
programming (if the system swaps the bitbanging stalls and the FPGA
programming fails)

- there's some troubles with jtagd and libudev.so.0 - if you don't have it
things seem to work but get unreliable.  I just symlink libudev.so.1 on
Ubuntu and it seems to fix it.

- the register-level interface of the JTAG UART isn't able to read the state
of the input FIFO without also dequeuing the data on it.  Writing
reliable device drivers is almost impossible.  I have a version that wraps
the UART in a 16550 register interface to avoid this problem.

- if the FPGA is failing timing, the producer/consumer of the UART can break
in interesting ways, which look a lot like there's some problem with the USB
hub or similar.


It's a very precarious pile of hardware and software that falls over in
numerous ways if pushed at all hard :(

Theo
[adding comp.arch.fpga since this is relevant to those folks]

Reply by Tom Gardner ●April 17, 20202020-04-17

On 17/04/20 17:15, David Brown wrote:
> On 17/04/2020 16:23, Tom Gardner wrote:
>> On 17/04/20 14:44, David Brown wrote:
>>> On 17/04/2020 11:49, Tom Gardner wrote:
>>>> On 17/04/20 09:02, David Brown wrote:
> 
>>>> As you say, the XMOS /ecosystem/ is far more compelling,
>>>> partly because it has excellent /integration/ between the
>>>> hardware, the software and the toolchain. The latter two
>>>> are usually missing.
>>>
>>> Agreed.&nbsp; And the XMOS folk have learned and improved.&nbsp; With the first chips, 
>>> they proudly showed off that you could make a 100 MBit Ethernet controller in 
>>> software on an XMOS chip.&nbsp; Then it was pointed out to them that - impressive 
>>> achievement though it was - it was basically useless because you didn't have 
>>> the resources left to use it for much, and hardware Ethernet controllers were 
>>> much cheaper.&nbsp; So they brought out new XMOS chips with hardware Ethernet 
>>> controllers.&nbsp; The same thing happened with USB.
>>
>> It looks like a USB controller needs ~8 cores, which isn't
>> a problem on a 16 core device :)
>>
> 
> I've had another look, and I was mistaken - these devices only have the USB and 
> Ethernet PHYs, not the MACs, and thus require a lot of processor power, pins, 
> memory and other resources.&nbsp; It doesn't need 8 cores, but the whole thing just 
> seems so inefficient.&nbsp; No one is going to spend the extra cost for an XMOS with 
> a USB PHY, so why not put a hardware USB controller on the chip?&nbsp; The silicon 
> costs would surely be minor, and it would save a lot of development effort and 
> release resources that are useful for other tasks.&nbsp; The same goes for Ethernet.  
> Just because you /can/ make these things in software on the XMOS devices, does 
> not make it a good idea.

Oh I agree! However, being able to do it in software is a
good demonstration of the device's unique characteristics,
and that "you aren't in Kansas anymore"


> Overall, the thing that bugs me about XMOS is that you can write very simple, 
> elegant tasks for the cores to do various tasks.&nbsp; But when you do that, you run 
> out of cores almost immediately.&nbsp; So you have to write your code in a way that 
> implements your own scheduler, losing a major part of the point of the whole 
> system.&nbsp; Or you use the XMOS FreeRTOS port on one of the virtual cores - in 
> which case you could just switch to a Cortex-M microcontroller with hardware 
> USB, Ethernet, PWM, UART, etc. and a fraction of the price.

I didn't know they had a FreeRTOS port, and it sounds
like having a dog and barking :) Sounds like it would
combine the disadvantages and negate the advantages!

Having said that, they did have a chip where one of the
processors was an ARM. Perhaps it was intended that the
ARM run FreeRTOS?


> If the XMOS devices and software had a way of neatly multi-tasking /within/ a 
> single virtual core, while keeping the same kind of inter-task communication and 
> other benefits, then they would have something I could see being very nice.

There is a half-way house.

If you adopt a certain coding style, the IDE will combine
several processes to run on a single processor.

Basically it is equivalent to appending all the process'
"startup" code into a single block, and all the "forever
loop" code into a single block. The key bit is combining
all the process' select statements into as single select
statement.

With that understanding, the coding style requirements
become obvious, not onerous, and they are checked by the
compiler.


>>> There is a lot to like about XMOS devices and tools, but they still strike me 
>>> as a solution in search of a problem.&nbsp; An elegant solution, perhaps, but 
>>> still missing a problem.&nbsp; We used them for a project many years ago for a USB 
>>> Audio Class 2 device.&nbsp; There simply were no realistic alternatives at the 
>>> time, but I can't say the XMOS solution was a good one.&nbsp; The device has far 
>>> too little memory to make sensible buffers (this still applies to XMOS 
>>> devices, last I looked), and the software at the time was painful (this I 
>>> believe has improved significantly).&nbsp; If we were making a new version of the 
>>> product, we'd drop the XMOS device in an instant and use an off-the-shelf 
>>> chip instead.
>>
>> I certainly wouldn't want to comment on your use case.
> 
> As I said, it was a while ago, when XMOS were relatively new - I assume the 
> software, libraries and examples are better now than at that time. But for 
> applications like ours, you can just get a CMedia chip and wire it up - no 
> matter how good XMOS tools have become, they don't beat that.
> 
> (And then all the development budget can be spent on trying to get drivers to 
> work on idiotic Windows systems...)

What is this "Windows" of which you speak?


>> To me a large part of the attraction is that you can
>> /predict/ the /worst/ case latency and jitter (and hence
>> throughput), in a way that is difficult in a standard MCU
>> and easy in an FPGA.
> 
> For standard MCU's, you aim to do this by using hardware peripherals (timers, 
> PWM blocks, communication controllers, etc.) for the most timing-critical 
> stuff.&nbsp; Then you don't need it in the software.

Yebbut, the toolset won't analyse and predict worst case
performance. So you are back to "run it and hope we stumble
upon the worst case".

Yes, that is sufficient in many cases, but it is /inelegant/,
dammit!


>> To that extent it allows FPGA-like performance with "traditional"
>> software development tools and methodologies. Plus a little
>> bit of "thinking parallel" that everybody will soon /have/ to
>> be doing :)
> 
> It's a nice idea, and I'm sure XMOS has some good use-cases.&nbsp; But I can't help 
> feeling they have something that is /almost/ a good system - with a bit more, 
> they could be very much more useful.

There's no doubt it is niche.

The world will go parallel. A major difficulty will be finding
people that can think that way. (Just look at how difficult
softies find it when they try to "program" VHDL/Verilog)

We need all the tools and concepts we can muster; my fear
is that CSP is the best! :)

Reply by Rick C ●April 17, 20202020-04-17

On Friday, April 17, 2020 at 12:15:37 PM UTC-4, David Brown wrote:
> On 17/04/2020 16:23, Tom Gardner wrote:
> > On 17/04/20 14:44, David Brown wrote:
> >> On 17/04/2020 11:49, Tom Gardner wrote:
> >>> On 17/04/20 09:02, David Brown wrote:
> 
> >>> As you say, the XMOS /ecosystem/ is far more compelling,
> >>> partly because it has excellent /integration/ between the
> >>> hardware, the software and the toolchain. The latter two
> >>> are usually missing.
> >>
> >> Agreed.&nbsp; And the XMOS folk have learned and improved.&nbsp; With the first 
> >> chips, they proudly showed off that you could make a 100 MBit Ethernet 
> >> controller in software on an XMOS chip.&nbsp; Then it was pointed out to 
> >> them that - impressive achievement though it was - it was basically 
> >> useless because you didn't have the resources left to use it for much, 
> >> and hardware Ethernet controllers were much cheaper.&nbsp; So they brought 
> >> out new XMOS chips with hardware Ethernet controllers.&nbsp; The same thing 
> >> happened with USB.
> > 
> > It looks like a USB controller needs ~8 cores, which isn't
> > a problem on a 16 core device :)
> > 
> 
> I've had another look, and I was mistaken - these devices only have the 
> USB and Ethernet PHYs, not the MACs, and thus require a lot of processor 
> power, pins, memory and other resources.  It doesn't need 8 cores, but 
> the whole thing just seems so inefficient.  No one is going to spend the 
> extra cost for an XMOS with a USB PHY, so why not put a hardware USB 
> controller on the chip?  The silicon costs would surely be minor, and it 
> would save a lot of development effort and release resources that are 
> useful for other tasks.  The same goes for Ethernet.  Just because you 
> /can/ make these things in software on the XMOS devices, does not make 
> it a good idea.
> 
> Overall, the thing that bugs me about XMOS is that you can write very 
> simple, elegant tasks for the cores to do various tasks.  But when you 
> do that, you run out of cores almost immediately.  So you have to write 
> your code in a way that implements your own scheduler, losing a major 
> part of the point of the whole system.  Or you use the XMOS FreeRTOS 
> port on one of the virtual cores - in which case you could just switch 
> to a Cortex-M microcontroller with hardware USB, Ethernet, PWM, UART, 
> etc. and a fraction of the price.

Too bad the XMOS doesn't have more CPUs, like maybe 144 of them? 


> If the XMOS devices and software had a way of neatly multi-tasking 
> /within/ a single virtual core, while keeping the same kind of 
> inter-task communication and other benefits, then they would have 
> something I could see being very nice.

Is a "virtual core" one CPU?  Multitasking a single CPU is the thing the XMOS is supposed to eliminate, no?  Why bring it back?  Oh, because there aren't enough CPUs on the XMOS for some applications!  So it's back to the fast ARM processors and multitasking. 

I seem to recall there being asymmetric multicores from various ARM makers with one fast CPU for multitasking and a smaller CPU for handling the lesser real time tasks without interference.  

That's a good combination, but again, a more specific target market.  It seems that is how the CPU market has gone.  The volumes are so high there are many niche areas justifying their own type of SoC to address it. 


> >> There is a lot to like about XMOS devices and tools, but they still 
> >> strike me as a solution in search of a problem.&nbsp; An elegant solution, 
> >> perhaps, but still missing a problem.&nbsp; We used them for a project many 
> >> years ago for a USB Audio Class 2 device.&nbsp; There simply were no 
> >> realistic alternatives at the time, but I can't say the XMOS solution 
> >> was a good one.&nbsp; The device has far too little memory to make sensible 
> >> buffers (this still applies to XMOS devices, last I looked), and the 
> >> software at the time was painful (this I believe has improved 
> >> significantly).&nbsp; If we were making a new version of the product, we'd 
> >> drop the XMOS device in an instant and use an off-the-shelf chip instead.
> > 
> > I certainly wouldn't want to comment on your use case.
> 
> As I said, it was a while ago, when XMOS were relatively new - I assume 
> the software, libraries and examples are better now than at that time. 
> But for applications like ours, you can just get a CMedia chip and wire 
> it up - no matter how good XMOS tools have become, they don't beat that.
> 
> (And then all the development budget can be spent on trying to get 
> drivers to work on idiotic Windows systems...)
> 
> > 
> > To me a large part of the attraction is that you can
> > /predict/ the /worst/ case latency and jitter (and hence
> > throughput), in a way that is difficult in a standard MCU
> > and easy in an FPGA.
> 
> For standard MCU's, you aim to do this by using hardware peripherals 
> (timers, PWM blocks, communication controllers, etc.) for the most 
> timing-critical stuff.  Then you don't need it in the software.

And all that hardware costs chip space which you may or may not use.  That's why they have so many flavors, to give the "perfect" combination of memory, peripherals and analog to minimize cost for each project. 

FPGAs have a cost overhead which is fading into the background as they become more and more efficient.  For many designs an FPGA provides a good trade off between cost and flexibility.  In many cases it also provides a functionality that can't be duplicated elsewhere. 


> > To that extent it allows FPGA-like performance with "traditional"
> > software development tools and methodologies. Plus a little
> > bit of "thinking parallel" that everybody will soon /have/ to
> > be doing :)
> 
> It's a nice idea, and I'm sure XMOS has some good use-cases.  But I 
> can't help feeling they have something that is /almost/ a good system - 
> with a bit more, they could be very much more useful.

It's good, it just has it's own niche of applications where it is the best solution.  Nothing wrong with that! 

-- 

  Rick C.

  -++ Get 1,000 miles of free Supercharging
  -++ Tesla referral code - https://ts.la/richard11209

Reply by Paul Rubin ●April 17, 20202020-04-17

Rick C <gnuarm.deletethisbit@gmail.com> writes:
> A single, fast CPU is harder to program than many, fast CPUs.
> Programmers have to learn a lot in order to perform multitasking on a
> single CPU.

Really it's the other way around.  A typical programmer these days might
not know how to implement a multitasker or OS on a bare machine, but
they do know how to spawn processes and use them on a machine with an
OS.  Organizing a parallel or distributed program is much harder.

Reply by Rick C ●April 17, 20202020-04-17

On Friday, April 17, 2020 at 3:17:55 PM UTC-4, Paul Rubin wrote:
> Rick C <gnuarm.deletethisbit@gmail.com> writes:
> > A single, fast CPU is harder to program than many, fast CPUs.
> > Programmers have to learn a lot in order to perform multitasking on a
> > single CPU.
> 
> Really it's the other way around.  A typical programmer these days might
> not know how to implement a multitasker or OS on a bare machine, but
> they do know how to spawn processes and use them on a machine with an
> OS.  Organizing a parallel or distributed program is much harder.

Really?  Multitasking is a lot more complex than just spawning tasks.  There are potential conditions that can lock up the computer or the tasks.  Managing task priorities can be a very complex issue and learning how to do that correctly is an important part of multitasking.  In real time systems it becomes potentially the hardest part of a project. 

Breaking a design down to assign tasks on various processors is a much simpler matter.  It's much like hardware design where you dedicate hardware to perform various actions and simply don't have the many problems of sharing a single CPU among many tasks.  

Do I have it wrong?  Is multitasking actually simple and the various articles I've read about the complexities overstate the matter? 

-- 

  Rick C.

  +-- Get 1,000 miles of free Supercharging
  +-- Tesla referral code - https://ts.la/richard11209

Reply by ●April 17, 20202020-04-17

On Fri, 17 Apr 2020 15:44:15 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>
>> 
>> As you say, the XMOS /ecosystem/ is far more compelling,
>> partly because it has excellent /integration/ between the
>> hardware, the software and the toolchain. The latter two
>> are usually missing.

The XCALE architecture is nice, if you are comfortable implementing
whole applications mainly with interrupts. Just assign one core for
each ISR. When an external HW signal ("interrupt") occurs, restart the
program in that core. The program needs to be executed before the next
"interrupt" occurs. Thus implementing e.g.  audio or video sampled
system is nice. The core task just needs to be fast enough to handle
one sample.

>Agreed.  And the XMOS folk have learned and improved.  With the first 
>chips, they proudly showed off that you could make a 100 MBit Ethernet 
>controller in software on an XMOS chip.  Then it was pointed out to them 
>that - impressive achievement though it was - it was basically useless 
>because you didn't have the resources left to use it for much, and 
>hardware Ethernet controllers were much cheaper.  So they brought out 
>new XMOS chips with hardware Ethernet controllers.  The same thing 
>happened with USB.

On the Ethernet, the minimum MAC frame is 64 bytes, thus new short
frames may appear every 6,4 us, thus the "ISR" must execute in less
than 6.4 us. If not possible, let one task just split the frame into
header and actual payload and use separate cores e.g. to handle the
MAC and IP headers, Still 8 cores sounds a quite large number,

Reply by Paul Rubin ●April 17, 20202020-04-17

Rick C <gnuarm.deletethisbit@gmail.com> writes:
> Do I have it wrong?  Is multitasking actually simple and the various
> articles I've read about the complexities overstate the matter?

Multitasking isn't exactly simple, but we (programmers) are used to it
by now.  The stuff you read about lock hazards is mostly from
multi-threading in a single process.  If you have processes
communicating through channels, there are still ways to mess up, but
it's usually simpler than dealing with threads and locks.

Reply by Rick C ●April 17, 20202020-04-17

On Friday, April 17, 2020 at 5:48:50 PM UTC-4, Paul Rubin wrote:
> Rick C <gnuarm.deletethisbit@gmail.com> writes:
> > Do I have it wrong?  Is multitasking actually simple and the various
> > articles I've read about the complexities overstate the matter?
> 
> Multitasking isn't exactly simple, but we (programmers) are used to it
> by now.  The stuff you read about lock hazards is mostly from
> multi-threading in a single process.  If you have processes
> communicating through channels, there are still ways to mess up, but
> it's usually simpler than dealing with threads and locks.

Exactly, the mindset it to use multitasking... but it can still be complex.  That's my point... what you are used to is what you use even when it's not the best approach. 

Splitting a design to run on independent processors is just as easy if not more so because of the lack of sharing issues. 

The stuff you are thinking of with distributed processing is when your application doesn't suit multitasking and it needs to be distributed over a lot of processors to speed it up.  That's not the same issue at all of simply getting the job done.  That's the sort of stuff they have problems with on super computers. 

I think we've been down this road before. 

-- 

  Rick C.

  +-+ Get 1,000 miles of free Supercharging
  +-+ Tesla referral code - https://ts.la/richard11209

Reply by David Brown ●April 18, 20202020-04-18

On 17/04/2020 18:49, Tom Gardner wrote:
> On 17/04/20 17:15, David Brown wrote:
>> On 17/04/2020 16:23, Tom Gardner wrote:
>>> On 17/04/20 14:44, David Brown wrote:
>>>> On 17/04/2020 11:49, Tom Gardner wrote:
>>>>> On 17/04/20 09:02, David Brown wrote:
>>
>>>>> As you say, the XMOS /ecosystem/ is far more compelling,
>>>>> partly because it has excellent /integration/ between the
>>>>> hardware, the software and the toolchain. The latter two
>>>>> are usually missing.
>>>>
>>>> Agreed.&nbsp; And the XMOS folk have learned and improved.&nbsp; With the 
>>>> first chips, they proudly showed off that you could make a 100 MBit 
>>>> Ethernet controller in software on an XMOS chip.&nbsp; Then it was 
>>>> pointed out to them that - impressive achievement though it was - it 
>>>> was basically useless because you didn't have the resources left to 
>>>> use it for much, and hardware Ethernet controllers were much 
>>>> cheaper.&nbsp; So they brought out new XMOS chips with hardware Ethernet 
>>>> controllers.&nbsp; The same thing happened with USB.
>>>
>>> It looks like a USB controller needs ~8 cores, which isn't
>>> a problem on a 16 core device :)
>>>
>>
>> I've had another look, and I was mistaken - these devices only have 
>> the USB and Ethernet PHYs, not the MACs, and thus require a lot of 
>> processor power, pins, memory and other resources.&nbsp; It doesn't need 8 
>> cores, but the whole thing just seems so inefficient.&nbsp; No one is going 
>> to spend the extra cost for an XMOS with a USB PHY, so why not put a 
>> hardware USB controller on the chip?&nbsp; The silicon costs would surely 
>> be minor, and it would save a lot of development effort and release 
>> resources that are useful for other tasks.&nbsp; The same goes for 
>> Ethernet. Just because you /can/ make these things in software on the 
>> XMOS devices, does not make it a good idea.
> 
> Oh I agree! However, being able to do it in software is a
> good demonstration of the device's unique characteristics,
> and that "you aren't in Kansas anymore"
> 

Indeed.

The real power comes from when you want to do something that is /not/ 
standard, or at least not common.

Implementing a standard UART in a couple of XMOS cores is a pointless 
waste of silicon.  Implementing a UART that uses Manchester encoding for 
the UART signals so that you can use it on a balanced line without 
keeping track of which line is which - /then/ you've got something that 
can be done just as easily on an XMOS and is a big pain to do on a 
standard microcontroller.

Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an 
EtherCAT slave is not going to be much harder for the XMOS than a normal 
Ethernet MAC, but is impossible on any microcontroller without 
specialised peripherals.

> 
>> Overall, the thing that bugs me about XMOS is that you can write very 
>> simple, elegant tasks for the cores to do various tasks.&nbsp; But when you 
>> do that, you run out of cores almost immediately.&nbsp; So you have to 
>> write your code in a way that implements your own scheduler, losing a 
>> major part of the point of the whole system.&nbsp; Or you use the XMOS 
>> FreeRTOS port on one of the virtual cores - in which case you could 
>> just switch to a Cortex-M microcontroller with hardware USB, Ethernet, 
>> PWM, UART, etc. and a fraction of the price.
> 
> I didn't know they had a FreeRTOS port, and it sounds
> like having a dog and barking :) Sounds like it would
> combine the disadvantages and negate the advantages!
> 

Some real-time stuff needs microsecond or sub-microsecond precision - 
XMOS lets you do that in software on a core, while normally you'd do it 
in dedicated peripherals on a microcontroller.  But a lot needs 
millisecond or sub-second precision, and FreeRTOS is absolutely fine for 
that.  (As are other methods, such as software timers.)

> Having said that, they did have a chip where one of the
> processors was an ARM. Perhaps it was intended that the
> ARM run FreeRTOS?

I haven't seen such a chip.  Do you have a link?  It could be an 
interesting device.

> 
> 
>> If the XMOS devices and software had a way of neatly multi-tasking 
>> /within/ a single virtual core, while keeping the same kind of 
>> inter-task communication and other benefits, then they would have 
>> something I could see being very nice.
> 
> There is a half-way house.
> 
> If you adopt a certain coding style, the IDE will combine
> several processes to run on a single processor.
> 
> Basically it is equivalent to appending all the process'
> "startup" code into a single block, and all the "forever
> loop" code into a single block. The key bit is combining
> all the process' select statements into as single select
> statement.
> 
> With that understanding, the coding style requirements
> become obvious, not onerous, and they are checked by the
> compiler.

I understand the principle, but you'd lose some of the modularity here. 
How well does it work if you want to have your UART, your CAN, your PWM, 
etc., defined in different files - and then you want to put them on the 
same core?  I guess it will be possible.

> 
> 
>>>> There is a lot to like about XMOS devices and tools, but they still 
>>>> strike me as a solution in search of a problem.&nbsp; An elegant 
>>>> solution, perhaps, but still missing a problem.&nbsp; We used them for a 
>>>> project many years ago for a USB Audio Class 2 device.&nbsp; There simply 
>>>> were no realistic alternatives at the time, but I can't say the XMOS 
>>>> solution was a good one.&nbsp; The device has far too little memory to 
>>>> make sensible buffers (this still applies to XMOS devices, last I 
>>>> looked), and the software at the time was painful (this I believe 
>>>> has improved significantly).&nbsp; If we were making a new version of the 
>>>> product, we'd drop the XMOS device in an instant and use an 
>>>> off-the-shelf chip instead.
>>>
>>> I certainly wouldn't want to comment on your use case.
>>
>> As I said, it was a while ago, when XMOS were relatively new - I 
>> assume the software, libraries and examples are better now than at 
>> that time. But for applications like ours, you can just get a CMedia 
>> chip and wire it up - no matter how good XMOS tools have become, they 
>> don't beat that.
>>
>> (And then all the development budget can be spent on trying to get 
>> drivers to work on idiotic Windows systems...)
> 
> What is this "Windows" of which you speak?
> 

Something some customers have.  It is a system designed to be as 
inconvenient for developers as humanly possible.

> 
>>> To me a large part of the attraction is that you can
>>> /predict/ the /worst/ case latency and jitter (and hence
>>> throughput), in a way that is difficult in a standard MCU
>>> and easy in an FPGA.
>>
>> For standard MCU's, you aim to do this by using hardware peripherals 
>> (timers, PWM blocks, communication controllers, etc.) for the most 
>> timing-critical stuff.&nbsp; Then you don't need it in the software.
> 
> Yebbut, the toolset won't analyse and predict worst case
> performance. So you are back to "run it and hope we stumble
> upon the worst case".
> 

The peripherals are independent, and specified in documentation.  If the 
PWM timer block can do 16-bit precision at 120 MHz, then you know its 
limits - and it doesn't matter how many UARTs you use or how fast you 
want your SPI bus to run.  You don't need to analyse the timings - that 
was done when the chip was designed.

You need to check the speeds of your high-level software that uses the 
modules, but that applies to XMOS code too.

> Yes, that is sufficient in many cases, but it is /inelegant/,
> dammit!
> 

There is always room for improvement and extra tools!

> 
>>> To that extent it allows FPGA-like performance with "traditional"
>>> software development tools and methodologies. Plus a little
>>> bit of "thinking parallel" that everybody will soon /have/ to
>>> be doing :)
>>
>> It's a nice idea, and I'm sure XMOS has some good use-cases.&nbsp; But I 
>> can't help feeling they have something that is /almost/ a good system 
>> - with a bit more, they could be very much more useful.
> 
> There's no doubt it is niche.
> 
> The world will go parallel. A major difficulty will be finding
> people that can think that way. (Just look at how difficult
> softies find it when they try to "program" VHDL/Verilog)
> 
> We need all the tools and concepts we can muster; my fear
> is that CSP is the best! :)

I'd like a good way to do CSP stuff in C or C++.  I've been looking at 
passing std::variant types in message queues in FreeRTOS, but I'm not 
happy with the results yet.  I think I'll have to make my own 
alternative to std::variant, to make it more efficient for the task.

It's fun.  A different kind of fun from playing with the XMOS, but fun.

If I ever get time, I will dig out my old XMOS kit and see how the 
current tools are looking.  If I can find a neat way to get your 
"half-way house" system working, it will increase a good deal in its 
attractiveness for me.

Reply by David Brown ●April 18, 20202020-04-18

On 17/04/2020 19:48, Rick C wrote:
> On Friday, April 17, 2020 at 12:15:37 PM UTC-4, David Brown wrote:
>> On 17/04/2020 16:23, Tom Gardner wrote:
>>> On 17/04/20 14:44, David Brown wrote:
>>>> On 17/04/2020 11:49, Tom Gardner wrote:
>>>>> On 17/04/20 09:02, David Brown wrote:
>> 
>>>>> As you say, the XMOS /ecosystem/ is far more compelling, 
>>>>> partly because it has excellent /integration/ between the 
>>>>> hardware, the software and the toolchain. The latter two are
>>>>> usually missing.
>>>> 
>>>> Agreed.  And the XMOS folk have learned and improved.  With the
>>>> first chips, they proudly showed off that you could make a 100
>>>> MBit Ethernet controller in software on an XMOS chip.  Then it
>>>> was pointed out to them that - impressive achievement though it
>>>> was - it was basically useless because you didn't have the
>>>> resources left to use it for much, and hardware Ethernet
>>>> controllers were much cheaper.  So they brought out new XMOS
>>>> chips with hardware Ethernet controllers.  The same thing 
>>>> happened with USB.
>>> 
>>> It looks like a USB controller needs ~8 cores, which isn't a
>>> problem on a 16 core device :)
>>> 
>> 
>> I've had another look, and I was mistaken - these devices only have
>> the USB and Ethernet PHYs, not the MACs, and thus require a lot of
>> processor power, pins, memory and other resources.  It doesn't need
>> 8 cores, but the whole thing just seems so inefficient.  No one is
>> going to spend the extra cost for an XMOS with a USB PHY, so why
>> not put a hardware USB controller on the chip?  The silicon costs
>> would surely be minor, and it would save a lot of development
>> effort and release resources that are useful for other tasks.  The
>> same goes for Ethernet.  Just because you /can/ make these things
>> in software on the XMOS devices, does not make it a good idea.
>> 
>> Overall, the thing that bugs me about XMOS is that you can write
>> very simple, elegant tasks for the cores to do various tasks.  But
>> when you do that, you run out of cores almost immediately.  So you
>> have to write your code in a way that implements your own
>> scheduler, losing a major part of the point of the whole system.
>> Or you use the XMOS FreeRTOS port on one of the virtual cores - in
>> which case you could just switch to a Cortex-M microcontroller with
>> hardware USB, Ethernet, PWM, UART, etc. and a fraction of the
>> price.
> 
> Too bad the XMOS doesn't have more CPUs, like maybe 144 of them?
> 

144 cpus is far more than would be useful in practice - as long as you 
have cores that can do useful work in a flexible way (like XMOS cores). 
  When you have very limited cores that can barely do anything 
themselves, you need lots of them.

(Arguably that's what you have on an FPGA - lots of tiny bits that do 
very little on their own.  But the key difference is the tools - FPGA's 
would be a lot less popular if you had to code each LU individually, do 
placement manually by numbering them, and write a routing file by hand.)

> 
>> If the XMOS devices and software had a way of neatly multi-tasking 
>> /within/ a single virtual core, while keeping the same kind of 
>> inter-task communication and other benefits, then they would have 
>> something I could see being very nice.
> 
> Is a "virtual core" one CPU?  Multitasking a single CPU is the thing
> the XMOS is supposed to eliminate, no?  Why bring it back?  Oh,
> because there aren't enough CPUs on the XMOS for some applications!
> So it's back to the fast ARM processors and multitasking.
> 

When you are writing multi-tasking code, you often want a lot of tasks. 
  More than 8.  (Sometimes, just to be more awkward, you want them to be 
created dynamically.)  Any specific limit is an inconvenient limit.

There are always ways to get round this - like using an RTOS on one 
virtual core of the XMOS.  But you lose some of the symmetry and 
convenience that way.

(I understand why the XMOS is designed the way it is - any system is 
going to be a compromise between what the hardware designers can do 
practically, and what the software designers want.)

> I seem to recall there being asymmetric multicores from various ARM
> makers with one fast CPU for multitasking and a smaller CPU for
> handling the lesser real time tasks without interference.
> 

Yes - you have a Cortex-A cpu that can handle high throughtput, perhaps 
with several cores, combined with a Cortex-M device that is more 
deterministic and can give more real-time control.

> That's a good combination, but again, a more specific target market.
> It seems that is how the CPU market has gone.  The volumes are so
> high there are many niche areas justifying their own type of SoC to
> address it.
> 

These are becoming increasingly common - they are no longer niche.  If 
you have a system that needs the processing power of a bigger cpu (for 
screens, image handling, embedded Linux, the convenience and low 
development costs of high-level languages and off-the-shelf libraries, 
etc.) then having a small cpu for handling ADC's, timers, PWM, UARTs, 
power management, keys, and that kind of thing is a big win.

Even combinations of fast M7 or M4 cores with an M0 core are common, 
especially if you have a specific task for the small core (like running 
a Bluetooth stack for an embedded wireless device).

> 
>>>> There is a lot to like about XMOS devices and tools, but they
>>>> still strike me as a solution in search of a problem.  An
>>>> elegant solution, perhaps, but still missing a problem.  We
>>>> used them for a project many years ago for a USB Audio Class 2
>>>> device.  There simply were no realistic alternatives at the
>>>> time, but I can't say the XMOS solution was a good one.  The
>>>> device has far too little memory to make sensible buffers (this
>>>> still applies to XMOS devices, last I looked), and the software
>>>> at the time was painful (this I believe has improved 
>>>> significantly).  If we were making a new version of the
>>>> product, we'd drop the XMOS device in an instant and use an
>>>> off-the-shelf chip instead.
>>> 
>>> I certainly wouldn't want to comment on your use case.
>> 
>> As I said, it was a while ago, when XMOS were relatively new - I
>> assume the software, libraries and examples are better now than at
>> that time. But for applications like ours, you can just get a
>> CMedia chip and wire it up - no matter how good XMOS tools have
>> become, they don't beat that.
>> 
>> (And then all the development budget can be spent on trying to get 
>> drivers to work on idiotic Windows systems...)
>> 
>>> 
>>> To me a large part of the attraction is that you can /predict/
>>> the /worst/ case latency and jitter (and hence throughput), in a
>>> way that is difficult in a standard MCU and easy in an FPGA.
>> 
>> For standard MCU's, you aim to do this by using hardware
>> peripherals (timers, PWM blocks, communication controllers, etc.)
>> for the most timing-critical stuff.  Then you don't need it in the
>> software.
> 
> And all that hardware costs chip space which you may or may not use.
> That's why they have so many flavors, to give the "perfect"
> combination of memory, peripherals and analog to minimize cost for
> each project.

The silicon costs of basic peripherals like timers and UARTs is tiny - 
generally close to irrelevant.  Ethernet is a bit more costly, and some 
kinds of peripherals have additional cost such as royalties (this used 
to be the case for CAN controllers until the patents ran out).  Analogue 
parts can be most costly in silicon space, especially if they need 
calibrating in some way.  Much of the cost of peripherals is in the IO 
blocks and drivers, and the multiplexing and signal routing to support them.

Memory blocks - while simple - usually take up a much bigger part of the 
die area.

> 
> FPGAs have a cost overhead which is fading into the background as
> they become more and more efficient.  For many designs an FPGA
> provides a good trade off between cost and flexibility.  In many
> cases it also provides a functionality that can't be duplicated
> elsewhere.

Sure, FPGAs have their uses - including areas where they are the only 
sensible solution, and areas of overlap where either microcontrollers or 
FPGAs could do the job.

When looking at the cost of making these choices, there are three main 
parts.  The development costs, the production costs, and the lifetime 
costs.  How you balance these will depend on the type of product, the 
quantities you make, the use of the product, and its expected lifetime. 
  So no single answer is every going to be "right".  But one thing is 
very clear - for developers and companies that have done a lot of FPGA 
development, the costs of developing a new FPGA-based device will be far 
smaller than for a company that has not done such systems before.  Don't 
assume that because FPGA design is cheap for /you/ to do, that it is 
necessarily cheap for others.

The opposite is true as well - there are plenty of boards made where an 
FPGA (or other programmable logic) would make things simpler and 
cheaper, but is not seriously considered because programmable logic is 
often viewed as expensive and difficult.

About the only thing you can be sure of in embedded development is that 
there are many possible answers.  And for any serious project, by the 
time you have a finished product there will be new devices and new 
answers that could have made the whole thing cheaper!

> 
> 
>>> To that extent it allows FPGA-like performance with
>>> "traditional" software development tools and methodologies. Plus
>>> a little bit of "thinking parallel" that everybody will soon
>>> /have/ to be doing :)
>> 
>> It's a nice idea, and I'm sure XMOS has some good use-cases.  But
>> I can't help feeling they have something that is /almost/ a good
>> system - with a bit more, they could be very much more useful.
> 
> It's good, it just has it's own niche of applications where it is the
> best solution.  Nothing wrong with that!
> 

Indeed.

I think what bugs me most is that those niches haven't yet turned up in 
the projects my customers are asking for!

Previous 2 345 6 7 Next

Custom CPU Designs

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group