Custom CPU Designs| page 8

Reply by David Brown ●April 20, 20202020-04-20

On 18/04/2020 17:08, Tom Gardner wrote:
> On 18/04/20 14:06, David Brown wrote:

>> I'd like a good way to do CSP stuff in C or C++.&nbsp; I've been looking at 
>> passing std::variant types in message queues in FreeRTOS, but I'm not 
>> happy with the results yet.&nbsp; I think I'll have to make my own 
>> alternative to std::variant, to make it more efficient for the task.
> 
> It always amazed me that Boehm had to write his paper
> indicating that you couldn't write threads as a library.

It always seemed obvious to me that you can't write thread support 
purely in standard C (C11 provides threading functions, but these can't 
be implemented in pure standard C).

> I presume that is not longer the case.

It is still the case.  With C11 (or C++11) you can write code that 
/uses/ threads in pure C (or C++), but you can't write the 
implementation of the threads.

> 
> 
>> If I ever get time, I will dig out my old XMOS kit and see how the 
>> current tools are looking.&nbsp; If I can find a neat way to get your 
>> "half-way house" system working, it will increase a good deal in its 
>> attractiveness for me.
> 
>  From my copy of The Programming Guide version F (there
> may be a later variant).
> 
> That guide has the usual succinct and relevant examples
> and pictures illustrating what is happening. If only
> all documentation was as short and sweet.
> 
> *2.3 Creating tasks for flexible placement*
> 

<snip details>

Thank you for that.  This is a significant new feature since I used 
XMOS, and one that I see as a major step forward.

Your posts in this thread have renewed my interest in XMOS.  If XMOS run 
a pyramid scheme for evangelism, you have definitely earned yourself a 
few more brownie points.

Now all I need to do is find a customer with an application that fits...

(If there were an EtherCAT slave peripheral available for XMOS, I might 
have had an application.)

Reply by David Brown ●April 20, 20202020-04-20

On 18/04/2020 17:51, Theo wrote:
> David Brown <david.brown@hesbynett.no> wrote:
>> The real power comes from when you want to do something that is /not/
>> standard, or at least not common.
>>
>> Implementing a standard UART in a couple of XMOS cores is a pointless
>> waste of silicon.  Implementing a UART that uses Manchester encoding for
>> the UART signals so that you can use it on a balanced line without
>> keeping track of which line is which - /then/ you've got something that
>> can be done just as easily on an XMOS and is a big pain to do on a
>> standard microcontroller.
>>
>> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an
>> EtherCAT slave is not going to be much harder for the XMOS than a normal
>> Ethernet MAC, but is impossible on any microcontroller without
>> specialised peripherals.
> 
> The Cypress PSoC has an interesting take on this.  You can specify (with the
> GUI) that you want a component.  If you specify a simple component (let's
> say I2C slave) there's a hard IP for that.  But if you specify something
> that's more complicated (say I2C master and slave on the same pins) it
> builds it with the existing IP plus some of its FPGA-like logic.  Takes more
> resources but allows you to do many more things than they put in as hard
> cores.
> 
> Unfortunately they don't provide a vast quantity of cells for that logic, so
> it's fine if you want to add just a few unusual bits to the regular
> microcontroller, but not a big system.  (PSoC also has the programmable
> analogue stuff, which presumably constrains the process they can use)
> 

My impression with PSoC's has always been that they have too few digital 
blocks.  It's been a good many years since I looked at them (I try to 
keep up with different architectures beyond those that I use, but there 
are so many!), and no doubt there are bigger devices now.  But at the 
time I tried to figure out if they could work for a couple of boards I 
was working on, and they would have needed several times as many digital 
blocks to work.

> It would be quite interesting to combine that with the XMOS approach - more
> fluid boundaries between hardware and software.

XMOS does have some things in hardware - for the IO ports, they have 
shift registers with parallel/serial conversion and buffering, and a 
whole lot of timers.

(Tom's posts answer the rest of your points.)

Reply by Tom Gardner ●April 20, 20202020-04-20

On 20/04/20 14:13, David Brown wrote:
> On 18/04/2020 17:08, Tom Gardner wrote:
>>> If I ever get time, I will dig out my old XMOS kit and see how the current 
>>> tools are looking.&nbsp; If I can find a neat way to get your "half-way house" 
>>> system working, it will increase a good deal in its attractiveness for me.
>>
>> &nbsp;From my copy of The Programming Guide version F (there
>> may be a later variant).
>>
>> That guide has the usual succinct and relevant examples
>> and pictures illustrating what is happening. If only
>> all documentation was as short and sweet.
>>
>> *2.3 Creating tasks for flexible placement*
>>
> 
> <snip details>
> 
> Thank you for that.&nbsp; This is a significant new feature since I used XMOS, and 
> one that I see as a major step forward.

Arguably it is merely syntactic sugar, but clarity
isn't to be underestimated.


> Your posts in this thread have renewed my interest in XMOS.&nbsp; If XMOS run a 
> pyramid scheme for evangelism, you have definitely earned yourself a few more 
> brownie points.

Regrettably not; I'm merely a fanboy :)


> Now all I need to do is find a customer with an application that fits...
> 
> (If there were an EtherCAT slave peripheral available for XMOS, I might have had 
> an application.)

Yes, and I'm sure there is a reason you couldn't fall
back on the old ways, and use an external peripheral.

Reply by David Brown ●April 20, 20202020-04-20

On 19/04/2020 12:33, Tom Gardner wrote:
> On 19/04/20 11:19, upsidedown@downunder.com wrote:
>> On Sun, 19 Apr 2020 10:11:18 +0100, Tom Gardner
>> <spamjunk@blueyonder.co.uk> wrote:
>>
>>> On 18/04/20 22:53, Rick C wrote:
>>>> On Saturday, April 18, 2020 at 1:07:00 PM UTC-4, Tom Gardner wrote:
>>
>> <clip>
>>
>>>>> For a short intro to the hardware, software, and concepts see the
>>>>> architecture flyer. 
>>>>> https://www.xmos.com/file/xcore-architecture-flyer/
>>>>>
>>>>> FFI see the XMOS programming guide, which is beautifully written, 
>>>>> succinct,
>>>>> and clear. https://www.xmos.com/file/xmos-programming-guide/
>>>>>
>>>>> I wish all documentation was as good!
>>
>> This information seems to be a few years old.
> 
> In one sense that is good: it means there has been no
> fundamental changes recently. Stability (of such things)
> is good :)
> 
> Detailed implementations, OTOH, should change regularly.
> 
> 
>>>> The tricky part is dealing with the real time requirements of 
>>>> handling the
>>>> I/O events.&nbsp; Which task must be done first, which tasks can be 
>>>> interrupted to
>>>> deal with other tasks, what resources are required to process an 
>>>> event, etc.
>>>> This is what makes real time multitasking difficult.
>>>
>>> Those problems mostly disappear with sufficient
>>> independent cores.
>>
>> Unfortunately, the xCore tile has only 8 logical cores.
>>
>> It seems hard to find out from XMOS web pages, how many tiles are
>> integrated on recent chips. 
> 
> They do seem to be a less "open" than before, which I don't like.
> However, although I haven't checked recently, I presume a free
> registration continues to be all that is needed.
> 
> 
>> If there are only 1 or 2 tiles,s there
>> would be only 8-16 logical cores on a chip.
> 
> 4 tiles. 

There are a maximum of 8 logical cores on a tile.  There are XMOS 
devices with different numbers of cores and tiles.  A 16-core device 
will have 2 tiles, a 32-core device will have 4 tiles.  AFAIUI, anyway.

> I /believe/ multiple chips can be interconnected
> so as to transparently extend the internal switch matrix,
> albeit with a increased latency. NUMA anyone :)

Yes, that has always been part of the idea.  And the interconnections 
can use 1, 2 or 4 wires IIRC.  They work exactly the same between 
virtual cores, between tiles, between chips - but perhaps with different 
throughputs and latencies.

(When I worked with them, the latency for transfer between virtual cores 
on the same tile was very much slower than you could get with simple 
sharing of memory - but XC would not let you share the memory.  So you 
had a messy system with inline assembly mixed with XC in order to get 
fast buffers.  I presume this has improved since then.)

> 
> 
> See what is stocked by DigiKey...
> 
> https://www.digikey.co.uk/products/en/integrated-circuits-ics/embedded-microcontrollers/685?k=xmos&k=&pkeyword=xmos&sv=0&sf=0&FV=-8%7C685&quantity=&ColumnSort=-143&page=1&stock=1&nstock=1&pageSize=25 
> 
> 
> Example:
> 
> XE232-1024-FB374 Features
> Multicore Microcontroller with Advanced Multi-Core RISC Architecture
> &bull; 32 real-time logical cores on 4 xCORE tiles
> &bull; Cores share up to 2000 MIPS
> &mdash; Up to 4000 MIPS in dual issue mode
> &bull; Each logical core has:
> &mdash; Guaranteed throughput of between 1 / 5 and 1 /8 of tile MIPS
> &mdash; 16x32bit dedicated registers
> &bull; 167 high-density 16/32-bit instructions
> &mdash; All have single clock-cycle execution (except for divide)
> &mdash; 32x32&rarr;64-bit MAC instructions for DSP, arithmetic and user-definable 
> cryptographic
> functions

Reply by Tom Gardner ●April 20, 20202020-04-20

On 20/04/20 14:20, David Brown wrote:
> On 18/04/2020 17:51, Theo wrote:
>> David Brown <david.brown@hesbynett.no> wrote:
>>> The real power comes from when you want to do something that is /not/
>>> standard, or at least not common.
>>>
>>> Implementing a standard UART in a couple of XMOS cores is a pointless
>>> waste of silicon.&nbsp; Implementing a UART that uses Manchester encoding for
>>> the UART signals so that you can use it on a balanced line without
>>> keeping track of which line is which - /then/ you've got something that
>>> can be done just as easily on an XMOS and is a big pain to do on a
>>> standard microcontroller.
>>>
>>> Implementing an Ethernet MAC on an XMOS is pointless.&nbsp; Implementing an
>>> EtherCAT slave is not going to be much harder for the XMOS than a normal
>>> Ethernet MAC, but is impossible on any microcontroller without
>>> specialised peripherals.
>>
>> The Cypress PSoC has an interesting take on this.&nbsp; You can specify (with the
>> GUI) that you want a component.&nbsp; If you specify a simple component (let's
>> say I2C slave) there's a hard IP for that.&nbsp; But if you specify something
>> that's more complicated (say I2C master and slave on the same pins) it
>> builds it with the existing IP plus some of its FPGA-like logic.&nbsp; Takes more
>> resources but allows you to do many more things than they put in as hard
>> cores.
>>
>> Unfortunately they don't provide a vast quantity of cells for that logic, so
>> it's fine if you want to add just a few unusual bits to the regular
>> microcontroller, but not a big system.&nbsp; (PSoC also has the programmable
>> analogue stuff, which presumably constrains the process they can use)
>>
> 
> My impression with PSoC's has always been that they have too few digital 
> blocks.&nbsp; It's been a good many years since I looked at them (I try to keep up 
> with different architectures beyond those that I use, but there are so many!), 
> and no doubt there are bigger devices now.&nbsp; But at the time I tried to figure 
> out if they could work for a couple of boards I was working on, and they would 
> have needed several times as many digital blocks to work.
> 
>> It would be quite interesting to combine that with the XMOS approach - more
>> fluid boundaries between hardware and software.
> 
> XMOS does have some things in hardware - for the IO ports, they have shift 
> registers with parallel/serial conversion and buffering, and a whole lot of timers.
> 
> (Tom's posts answer the rest of your points.)

FFI about ports, see
https://www.xmos.com/download/Introduction-to-XS1-ports(1.0).pdf

Overview...
Unbuffered ports:
  - standard clocked: up to 250MHz, independent rate from processor clocks
  - timed: occurs at a specific counter count (each port has its own counter)
  - timestamped: records when i/o actually occurred
  - conditional input: ignore unless == or != a value
  - 1/4/8/16/32 bits wide
Buffered ports:
  - as above plus single level of buffering
  - SERDES
Strobed ports:
  - master/slave/bidirectional

The i/o setup is also pleasingly simple and orthogonal, effectively
declarative. No mucking about with obscure registers :)

Reply by David Brown ●April 20, 20202020-04-20

On 18/04/2020 21:38, Rick C wrote:
> On Saturday, April 18, 2020 at 9:06:57 AM UTC-4, David Brown wrote:
>> On 17/04/2020 18:49, Tom Gardner wrote:
>>> On 17/04/20 17:15, David Brown wrote:
>>>> On 17/04/2020 16:23, Tom Gardner wrote:
>>>>> On 17/04/20 14:44, David Brown wrote:
>>>>>> On 17/04/2020 11:49, Tom Gardner wrote:
>>>>>>> On 17/04/20 09:02, David Brown wrote:
>>>> 
>>>>>>> As you say, the XMOS /ecosystem/ is far more compelling, 
>>>>>>> partly because it has excellent /integration/ between
>>>>>>> the hardware, the software and the toolchain. The latter
>>>>>>> two are usually missing.
>>>>>> 
>>>>>> Agreed.  And the XMOS folk have learned and improved.  With
>>>>>> the first chips, they proudly showed off that you could
>>>>>> make a 100 MBit Ethernet controller in software on an XMOS
>>>>>> chip.  Then it was pointed out to them that - impressive
>>>>>> achievement though it was - it was basically useless
>>>>>> because you didn't have the resources left to use it for
>>>>>> much, and hardware Ethernet controllers were much cheaper.
>>>>>> So they brought out new XMOS chips with hardware Ethernet 
>>>>>> controllers.  The same thing happened with USB.
>>>>> 
>>>>> It looks like a USB controller needs ~8 cores, which isn't a
>>>>> problem on a 16 core device :)
>>>>> 
>>>> 
>>>> I've had another look, and I was mistaken - these devices only
>>>> have the USB and Ethernet PHYs, not the MACs, and thus require
>>>> a lot of processor power, pins, memory and other resources.  It
>>>> doesn't need 8 cores, but the whole thing just seems so
>>>> inefficient.  No one is going to spend the extra cost for an
>>>> XMOS with a USB PHY, so why not put a hardware USB controller
>>>> on the chip?  The silicon costs would surely be minor, and it
>>>> would save a lot of development effort and release resources
>>>> that are useful for other tasks.  The same goes for Ethernet.
>>>> Just because you /can/ make these things in software on the 
>>>> XMOS devices, does not make it a good idea.
>>> 
>>> Oh I agree! However, being able to do it in software is a good
>>> demonstration of the device's unique characteristics, and that
>>> "you aren't in Kansas anymore"
>>> 
>> 
>> Indeed.
>> 
>> The real power comes from when you want to do something that is
>> /not/ standard, or at least not common.
>> 
>> Implementing a standard UART in a couple of XMOS cores is a
>> pointless waste of silicon.  Implementing a UART that uses
>> Manchester encoding for the UART signals so that you can use it on
>> a balanced line without keeping track of which line is which -
>> /then/ you've got something that can be done just as easily on an
>> XMOS and is a big pain to do on a standard microcontroller.
> 
> I take your point even if I don't agree with your example.
> 
> I don't follow your comment about "which line is which".  Manchester
> encoding uses the polarity of transitions for the data being
> transmitted.  To set up for the correct polarity data transition
> there are extra transitions which should be ignored.  To properly
> decode the data requires having the correct polarity of the signal at
> the input.  Swap lines and you get wrong data.

Sorry, it is /differential/ Manchester encoding that is polarity 
independent.  But as you say, it is the principle of the point that 
matters, rather than the details.

> 
> It's not hard to receive Manchester encoded data on a typical MCU.
> They virtually all have transition based interrupts on I/O pins.
> Enable the interrupts and on each transition grab a timer value.  It
> would even be easy to have it auto-baud rate detect as the values
> should have two primary modes.  Why do you say this is a hard thing
> to do???
> 

Of course it can all be done in software on an MCU, using timers and 
interrupts.  But the more timing-critical things you do in pure 
software, the harder it gets to keep everything in specification in the 
worst case.  The way the XMOS is built up with its IO ports connected to 
timers and buffers, it's easy to make such features with extremely low 
jitter - even if you have many of them at high speed.  And more 
importantly, the way the software tools are designed it can check that 
your code is within your specifications for this kind of thing.

> 
>> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing
>> an EtherCAT slave is not going to be much harder for the XMOS than
>> a normal Ethernet MAC, but is impossible on any microcontroller
>> without specialised peripherals.
> 
> That is exactly the point of the MCU market, the huge variety of
> combinations of hard peripherals available these days.  Nearly every
> maker of MCUs will have an almost perfect match for your needs.

I need an MCU with 4 EtherCAT slave channels.  There are exactly 0 on 
the market.  There are only two or three in total - from all 
manufacturers together - with even /one/ EtherCAT slave.

If there were an XMOS EtherCAT slave peripheral, perhaps I could have 
used one of their devices.  (There are FPGA EtherCAT cores available, 
but the price of the cores, the price of the required FPGAs, and other 
complications rule that out.)

> Peripherals may use some real estate on a chip, but the lion's share
> is simply the memory.  These days MCU chips are more memory chips
> with built in CPUs and peripherals than CPU chips.

I agree entirely, for "standard" peripherals.  But some peripherals are 
too unusual to turn up on MCUs.  (There is, of course, not clear 
definition of what a "standard peripheral" might be.)

> 
> So your point of adding an external Ethernet MAC to MCUs is a red
> herring.
> 
> I just noticed you said "EtherCAT" which I was not aware of.  Seems
> this is some other protocol than Ethernet.  

Ah, that explains things.

Roughly speaking, an EtherCAT slave takes Ethernet packets as they come 
in, reads and writes to them as they go past, and sends the packets on 
towards the next slave on the bus with the least possible delay (they do 
not wait for the packet to be fully received).  The master side can run 
on a normal Ethernet MAC, but the slave side requires special hardware.

> Ok, maybe that will be
> just as easy on either type of MCU since it is a relatively niche
> application.  Once the market grows you can be assured MCUs will be
> commonly available with EtherCAT in hard peripherals.
> 

EtherCAT has been around for a long time - it is not new.  But it is 
quite niche.

<snip>
>> 
>> You need to check the speeds of your high-level software that uses
>> the modules, but that applies to XMOS code too.
> 
> This is an area where FPGAs excel.  Timing is relatively easy to
> verify and the details of coordinating the various processes is
> trivial.
> 

Yes, especially for the fine details.  But it gets harder again if there 
can be a varying number of clock cycles in events.  You do have the big 
advantage with FPGAs that timing of different parts is usually 
independent, unlike on MCUs.

> It's an area where MCUs can be very difficult to analyze.
> 

Absolutely.

You could say that XMOS devices and tools are a middle ground here.

Reply by Tom Gardner ●April 20, 20202020-04-20

On 20/04/20 14:26, David Brown wrote:
> (When I worked with them, the latency for transfer between virtual cores on the 
> same tile was very much slower than you could get with simple sharing of memory 
> - but XC would not let you share the memory.&nbsp; So you had a messy system with 
> inline assembly mixed with XC in order to get fast buffers.&nbsp; I presume this has 
> improved since then.)

I believe so, but I haven't investigated in detail.

The programming guide / tutorial has an double buffering example
where two tasks swap pointers to the in-memory buffer.

Reply by David Brown ●April 20, 20202020-04-20

On 20/04/2020 15:24, Tom Gardner wrote:
> On 20/04/20 14:13, David Brown wrote:
>>
>> (If there were an EtherCAT slave peripheral available for XMOS, I 
>> might have had an application.)
> 
> Yes, and I'm sure there is a reason you couldn't fall
> back on the old ways, and use an external peripheral.

There is no reason at all - external peripherals is /exactly/ the 
solution we have used!

Reply by David Brown ●April 20, 20202020-04-20

On 18/04/2020 23:21, Rick C wrote:
> On Saturday, April 18, 2020 at 10:57:55 AM UTC-4, David Brown wrote:
>> On 17/04/2020 19:48, Rick C wrote:
>>> On Friday, April 17, 2020 at 12:15:37 PM UTC-4, David Brown
>>> wrote:
>>>> On 17/04/2020 16:23, Tom Gardner wrote:
>>>>> On 17/04/20 14:44, David Brown wrote:
>>>>>> On 17/04/2020 11:49, Tom Gardner wrote:
>>>>>>> On 17/04/20 09:02, David Brown wrote:
>>>> 
>>>>>>> As you say, the XMOS /ecosystem/ is far more compelling, 
>>>>>>> partly because it has excellent /integration/ between
>>>>>>> the hardware, the software and the toolchain. The latter
>>>>>>> two are usually missing.
>>>>>> 
>>>>>> Agreed.  And the XMOS folk have learned and improved.  With
>>>>>> the first chips, they proudly showed off that you could
>>>>>> make a 100 MBit Ethernet controller in software on an XMOS
>>>>>> chip.  Then it was pointed out to them that - impressive
>>>>>> achievement though it was - it was basically useless
>>>>>> because you didn't have the resources left to use it for
>>>>>> much, and hardware Ethernet controllers were much cheaper.
>>>>>> So they brought out new XMOS chips with hardware Ethernet
>>>>>> controllers.  The same thing happened with USB.
>>>>> 
>>>>> It looks like a USB controller needs ~8 cores, which isn't a 
>>>>> problem on a 16 core device :)
>>>>> 
>>>> 
>>>> I've had another look, and I was mistaken - these devices only
>>>> have the USB and Ethernet PHYs, not the MACs, and thus require
>>>> a lot of processor power, pins, memory and other resources.  It
>>>> doesn't need 8 cores, but the whole thing just seems so
>>>> inefficient.  No one is going to spend the extra cost for an
>>>> XMOS with a USB PHY, so why not put a hardware USB controller
>>>> on the chip?  The silicon costs would surely be minor, and it
>>>> would save a lot of development effort and release resources
>>>> that are useful for other tasks.  The same goes for Ethernet.
>>>> Just because you /can/ make these things in software on the
>>>> XMOS devices, does not make it a good idea.
>>>> 
>>>> Overall, the thing that bugs me about XMOS is that you can
>>>> write very simple, elegant tasks for the cores to do various
>>>> tasks.  But when you do that, you run out of cores almost
>>>> immediately.  So you have to write your code in a way that
>>>> implements your own scheduler, losing a major part of the point
>>>> of the whole system. Or you use the XMOS FreeRTOS port on one
>>>> of the virtual cores - in which case you could just switch to a
>>>> Cortex-M microcontroller with hardware USB, Ethernet, PWM,
>>>> UART, etc. and a fraction of the price.
>>> 
>>> Too bad the XMOS doesn't have more CPUs, like maybe 144 of them?
>>> 
>> 
>> 144 cpus is far more than would be useful in practice - as long as
>> you have cores that can do useful work in a flexible way (like XMOS
>> cores). When you have very limited cores that can barely do
>> anything themselves, you need lots of them.
> 
> The point of the XMOS concept is to not have to do multitasking on a
> single core.  As soon as your task number exceeds the number of cores
> you have to do multitasking on a single core.  No one ever complains
> about having too many resources.

Agreed.  That is why I have been pleased to hear Tom's description of 
how you can deal with multitasking within a core on XMOS, for when you 
have more tasks than cores.

> 
> The design process for many small CPUs is not the same as a few
> larger CPUs.  

Agreed.

> But the point is it can be simpler, more like designing
> hardware.

Disagreed.  Well, sort-of disagreed.

/Some/ things can be easier on small and simple cpus - other things are 
harder.  But when a cpu is small enough and limited enough, almost 
everything gets harder.

And no, hardware design is /not/ simpler.  It is /different/.

> 
> Atmel had an FPGA with a simple logic element equivalent to a 3 input
> LUT and FF (done with a small number of gates rather than the larger
> LUT structure).  It had relatively little routing, connecting logic
> elements to adjacent logic elements almost as the only routing.  The
> theory was the simplicity of the logic elements meant you could have
> a lot more of them for a given amount of real estate/$$$ so you could
> afford to use them for routing.  In practice they never kept up with
> state of the art process technology, so this didn't pan out.  But
> that is the concept of the many small cores.  You don't have to focus
> on being highly efficient with your CPU resources since you have so
> many.

You equate "having more of resource X than you need" with "it's easy". 
That simply isn't the case.

It doesn't matter how many cpus you have if they can't do what you need 
- or if it takes a ridiculous effort to make them do what you need.

Ten thousand mice weigh less than a horse - they will eat less, provide 
redundancy, and have a much higher combined strength.  Which do you 
think is better for pulling a cart?

> 
> 
>> (Arguably that's what you have on an FPGA - lots of tiny bits that
>> do very little on their own.  But the key difference is the tools -
>> FPGA's would be a lot less popular if you had to code each LU
>> individually, do placement manually by numbering them, and write a
>> routing file by hand.)
> 
> Huh?  Why can't any of that be automated in the many CPU chip?

That's a very good question.  And if it /were/ automated for the GA144, 
maybe it would be a much more useful chip.

> Certainly the issue is not so important with only 144 processors, but
> the same size chip would have many thousands of processors in a more
> modern technology.  I think the GA144 is 180 nm.  Bring that down to
> 15 nm and you have 15,000 nodes on a chip just 5 mm square!!!  That
> will require better tools for sure.

Tools are critical.

I think the biggest failing of the GA144 is the tools.  With a 
completely different philosophy of toolset, maybe the device would 
become a realistic and exciting new architecture.

> 
> Some of your use of language is interesting.  "lots of tiny bits that
> do very little on their own".  That sounds like the way people write
> software.  They decompose a large, complex design into lots of tiny
> routines that individually do little on their own.  How can you
> manage all that complexity???  Yes, I think the Zeno paradox must
> apply so that no large program can ever be finished.  That certainly
> happens sometimes.

In normal (serial, or multi-threaded) software, the programming language 
you use lets you combine the bits.  It checks at least some aspects of 
how they fit together, it lets you use the different bits from where you 
want, it lets you split them up or combine them, write big bits and 
small bits.  The same applies to FPGA design with Verilog, VHDL, or 
higher level languages.

The GA144 - judging from the application notes and examples - requires 
you to figure out the tiniest details and split things up manually.  It 
is a level of micromanagement beyond what is needed for assembly 
programming.

It's the tools.

>> 
>> Yes - you have a Cortex-A cpu that can handle high throughtput,
>> perhaps with several cores, combined with a Cortex-M device that is
>> more deterministic and can give more real-time control.
> 
> That's also the idea to CPUs in FPGAs.
> 

Yes.

Reply by David Brown ●April 20, 20202020-04-20

On 17/04/2020 22:22, Rick C wrote:
> On Friday, April 17, 2020 at 3:17:55 PM UTC-4, Paul Rubin wrote:
>> Rick C <gnuarm.deletethisbit@gmail.com> writes:
>>> A single, fast CPU is harder to program than many, fast CPUs. 
>>> Programmers have to learn a lot in order to perform multitasking
>>> on a single CPU.
>> 
>> Really it's the other way around.  A typical programmer these days
>> might not know how to implement a multitasker or OS on a bare
>> machine, but they do know how to spawn processes and use them on a
>> machine with an OS.  Organizing a parallel or distributed program
>> is much harder.
> 
> Really?  Multitasking is a lot more complex than just spawning tasks.
> There are potential conditions that can lock up the computer or the
> tasks.  Managing task priorities can be a very complex issue and
> learning how to do that correctly is an important part of
> multitasking.  In real time systems it becomes potentially the
> hardest part of a project.
> 
> Breaking a design down to assign tasks on various processors is a
> much simpler matter.  It's much like hardware design where you
> dedicate hardware to perform various actions and simply don't have
> the many problems of sharing a single CPU among many tasks.
> 
> Do I have it wrong?  Is multitasking actually simple and the various
> articles I've read about the complexities overstate the matter?
> 

Nah, it's mostly the same whether you have one processor or many.  /If/ 
you follow one basic rule, that is.

One thing that people often get wrong is they try to control something 
or set a variable from different places in the code.  They get this 
wrong with single-tasking software, and lose their overview of the 
control flow and data flow (often without knowing that it is lost).  It 
becomes impractical or impossible to see that the code is actually 
correct, and you usually can't find your problems in testing.

A common "solution" to this is to ban global variables.  This is a mere 
band-aid, and usually unhelpful - the problem exists whether you set 
variable "foo" directly or call "set_foo()".

And when you have multiple threads or tasks, the problem is bigger - it 
is not just different bits of the program that can get mixed up, they 
can be in different contexts too, and can be interrupt in the middle.

The answer is to think like a hardware designer - think separate modules 
that communicate by signals.  If you have different hardware modules 
that all access a shared resource, you need a multiplexer or 
prioritising system, possibly with locks or gates.  The same applies in 
software.  A single hardware output can drive many inputs, but an input 
can only be driven by one output - again you need a multiplexer, 
combination gate, or other selection system do to otherwise.  The same 
applies in software.  Bidirectional or tristate signals can be useful to 
cut down on resources, but are much harder to get right.  The same in 
software.

If you have individually compiled programs for different cpus, it is 
harder to get this wrong - you have no choice but to have a clear 
interface between the parts.  And if you use XC on XMOS, you have an 
advantage too - the tools won't let you set the same data from different 
virtual cores.  (This used to be an irritating limit for cases when the 
programmer knows better than the tools about what is safe, but I believe 
this has improved.)  And of course FPGA tools won't let you drive a 
signal from multiple sources.

Beyond that, you have mostly the same issues.  Deadlock, livelock, 
synchronisation - they are all something you have to consider whether 
you are making an FPGA design, multi-tasking on one cpu, or running 
independent tasks on independent processors.

Task prioritising is an important issue.  But it is not just for 
multitasking on a single cpu.  If you have a high priority task A that 
sometimes has to wait for the results from a low priority task B, you 
have an issue to deal with.  That applies whether they are on the same 
cpu or different ones.  On a single cpu, you have the solution of 
bumping up the priority for task B for a bit (priority inheritance) - on 
different cpus, you just have to wait.

Previous 6 789 10 11 Next

Custom CPU Designs

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group