Custom CPU Designs| page 6

Reply by Rick C ●April 18, 20202020-04-18

On Saturday, April 18, 2020 at 1:07:00 PM UTC-4, Tom Gardner wrote:
> On 18/04/20 16:51, Theo wrote:
> > David Brown <david.brown@hesbynett.no> wrote:
> >> The real power comes from when you want to do something that is /not/
> >> standard, or at least not common.
> >>
> >> Implementing a standard UART in a couple of XMOS cores is a pointless
> >> waste of silicon.  Implementing a UART that uses Manchester encoding for
> >> the UART signals so that you can use it on a balanced line without
> >> keeping track of which line is which - /then/ you've got something that
> >> can be done just as easily on an XMOS and is a big pain to do on a
> >> standard microcontroller.
> >>
> >> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an
> >> EtherCAT slave is not going to be much harder for the XMOS than a normal
> >> Ethernet MAC, but is impossible on any microcontroller without
> >> specialised peripherals.
> > 
> > The Cypress PSoC has an interesting take on this.  You can specify (with the
> > GUI) that you want a component.  If you specify a simple component (let's
> > say I2C slave) there's a hard IP for that.  But if you specify something
> > that's more complicated (say I2C master and slave on the same pins) it
> > builds it with the existing IP plus some of its FPGA-like logic.  Takes more
> > resources but allows you to do many more things than they put in as hard
> > cores.
> > 
> > Unfortunately they don't provide a vast quantity of cells for that logic, so
> > it's fine if you want to add just a few unusual bits to the regular
> > microcontroller, but not a big system.  (PSoC also has the programmable
> > analogue stuff, which presumably constrains the process they can use)
> > 
> > It would be quite interesting to combine that with the XMOS approach - more
> > fluid boundaries between hardware and software.
> > 
> > I'm a bit surprised XMOS don't provide 'soft realtime' virtual cores - lock
> > down the cores running a task that absolutely needs to be bounded-latency,
> > and then multitask the remaining tasks across the other cores.  If that was
> > provided as an integrated service then it wouldn't need messing about
> > running a scheduler.
> 
> What scheduler?
> 
> To a very useful approximation the an RTOS's functions are
> encoded in hardware.
> 
> The "select" statement is like a "switch" statement, except
> that the core sleeps until one of the "case" conditions
> becomes true. In effect the case conditions are events.
> Events can be inputs arriving, outputs completing, timeouts, etc.

Isn't that what interrupts are about?  Many processors have a "halt and wait for interrupt" instruction. 


> N.B. as far as a core is concerned, receiving input from a
> pin is the same as receiving a message from another task/core.
> Ditto output to a pin and sending a message to another task/core.
> Everything is via the "port" abstraction, and that extends down
> into the hardware and i/o system.

That sounds a lot like the Transputer. 


> For a short intro to the hardware, software, and concepts see the
> architecture flyer.
> https://www.xmos.com/file/xcore-architecture-flyer/
> 
> FFI see the XMOS programming guide, which is beautifully written,
> succinct, and clear.
> https://www.xmos.com/file/xmos-programming-guide/
> 
> I wish all documentation was as good!
> 
> 
> > After all, there must be applications with a lot of do-this-once-a-second
> > tasks, that would be wasted using a whole core?  Do they have a scheduler
> > where you can tell a task to sleep until a particular time?
> 
> See "combinable tasks" in my post in response to David Brown's
> post.
> 
> 
> > Or is FreeRTOS intended for that?  In which case you presumably have to
> > write the code in a different way compared to the hard-tasks?
> 
> I really don't see the point of FreeRTOS running inside an
> xCORE chip. To be over simplistic, RTOSs are designed to
> schedule multiple  tasks on a single processor and to connect
> i/o with tasks (typically via interrupts).
> 
> All that is done in hardware in the xCORE ecosystem.

 That's true of most processors, no?  An I/O event happens and an interrupt invokes a task called an interrupt handler.  

The tricky part is dealing with the real time requirements of handling the I/O events.  Which task must be done first, which tasks can be interrupted to deal with other tasks, what resources are required to process an event, etc.  This is what makes real time multitasking difficult.  

-- 

  Rick C.

  ---+ Get 1,000 miles of free Supercharging
  ---+ Tesla referral code - https://ts.la/richard11209

Reply by Tom Gardner ●April 19, 20202020-04-19

On 18/04/20 22:53, Rick C wrote:
> On Saturday, April 18, 2020 at 1:07:00 PM UTC-4, Tom Gardner wrote:
>> On 18/04/20 16:51, Theo wrote:
>>> David Brown <david.brown@hesbynett.no> wrote:
>>>> The real power comes from when you want to do something that is /not/ 
>>>> standard, or at least not common.
>>>> 
>>>> Implementing a standard UART in a couple of XMOS cores is a pointless 
>>>> waste of silicon.  Implementing a UART that uses Manchester encoding
>>>> for the UART signals so that you can use it on a balanced line without 
>>>> keeping track of which line is which - /then/ you've got something
>>>> that can be done just as easily on an XMOS and is a big pain to do on
>>>> a standard microcontroller.
>>>> 
>>>> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an 
>>>> EtherCAT slave is not going to be much harder for the XMOS than a
>>>> normal Ethernet MAC, but is impossible on any microcontroller without 
>>>> specialised peripherals.
>>> 
>>> The Cypress PSoC has an interesting take on this.  You can specify (with
>>> the GUI) that you want a component.  If you specify a simple component
>>> (let's say I2C slave) there's a hard IP for that.  But if you specify
>>> something that's more complicated (say I2C master and slave on the same
>>> pins) it builds it with the existing IP plus some of its FPGA-like logic.
>>> Takes more resources but allows you to do many more things than they put
>>> in as hard cores.
>>> 
>>> Unfortunately they don't provide a vast quantity of cells for that logic,
>>> so it's fine if you want to add just a few unusual bits to the regular 
>>> microcontroller, but not a big system.  (PSoC also has the programmable 
>>> analogue stuff, which presumably constrains the process they can use)
>>> 
>>> It would be quite interesting to combine that with the XMOS approach -
>>> more fluid boundaries between hardware and software.
>>> 
>>> I'm a bit surprised XMOS don't provide 'soft realtime' virtual cores -
>>> lock down the cores running a task that absolutely needs to be
>>> bounded-latency, and then multitask the remaining tasks across the other
>>> cores.  If that was provided as an integrated service then it wouldn't
>>> need messing about running a scheduler.
>> 
>> What scheduler?
>> 
>> To a very useful approximation the an RTOS's functions are encoded in
>> hardware.
>> 
>> The "select" statement is like a "switch" statement, except that the core
>> sleeps until one of the "case" conditions becomes true. In effect the case
>> conditions are events. Events can be inputs arriving, outputs completing,
>> timeouts, etc.
> 
> Isn't that what interrupts are about?  Many processors have a "halt and wait
> for interrupt" instruction.

Hardware on its own is trivial and has been done many times
over the decades.

XMOS have integrated the hardware with the software with
the conceptual tools, and have provided excellent development
tools.

That has not been achieved elsewhere.



>> N.B. as far as a core is concerned, receiving input from a pin is the same
>> as receiving a message from another task/core. Ditto output to a pin and
>> sending a message to another task/core. Everything is via the "port"
>> abstraction, and that extends down into the hardware and i/o system.
> 
> That sounds a lot like the Transputer.

Exactly.

And xC is a modern successor to Occam, with CSP a core abstraction.

Some of the original Transputer team are in XMOS.




>> For a short intro to the hardware, software, and concepts see the 
>> architecture flyer. https://www.xmos.com/file/xcore-architecture-flyer/
>> 
>> FFI see the XMOS programming guide, which is beautifully written, succinct,
>> and clear. https://www.xmos.com/file/xmos-programming-guide/
>> 
>> I wish all documentation was as good!
>> 
>> 
>>> After all, there must be applications with a lot of
>>> do-this-once-a-second tasks, that would be wasted using a whole core?  Do
>>> they have a scheduler where you can tell a task to sleep until a
>>> particular time?
>> 
>> See "combinable tasks" in my post in response to David Brown's post.
>> 
>> 
>>> Or is FreeRTOS intended for that?  In which case you presumably have to 
>>> write the code in a different way compared to the hard-tasks?
>> 
>> I really don't see the point of FreeRTOS running inside an xCORE chip. To
>> be over simplistic, RTOSs are designed to schedule multiple  tasks on a
>> single processor and to connect i/o with tasks (typically via interrupts).
>> 
>> All that is done in hardware in the xCORE ecosystem.
> 
> That's true of most processors, no?  An I/O event happens and an interrupt
> invokes a task called an interrupt handler.
> 
> The tricky part is dealing with the real time requirements of handling the
> I/O events.  Which task must be done first, which tasks can be interrupted to
> deal with other tasks, what resources are required to process an event, etc.
> This is what makes real time multitasking difficult.

Those problems mostly disappear with sufficient
independent cores.

What remains is the logical dependencies which can result in
livelock or deadlock. No implementation technology can solve
those, since they are inherent in either the problem or the
architectural solution.

Reply by ●April 19, 20202020-04-19

On Sun, 19 Apr 2020 10:11:18 +0100, Tom Gardner
<spamjunk@blueyonder.co.uk> wrote:

>On 18/04/20 22:53, Rick C wrote:
>> On Saturday, April 18, 2020 at 1:07:00 PM UTC-4, Tom Gardner wrote:

<clip>

>>> For a short intro to the hardware, software, and concepts see the 
>>> architecture flyer. https://www.xmos.com/file/xcore-architecture-flyer/
>>> 
>>> FFI see the XMOS programming guide, which is beautifully written, succinct,
>>> and clear. https://www.xmos.com/file/xmos-programming-guide/
>>> 
>>> I wish all documentation was as good!

This information seems to be a few years old.

<clip>

>> The tricky part is dealing with the real time requirements of handling the
>> I/O events.  Which task must be done first, which tasks can be interrupted to
>> deal with other tasks, what resources are required to process an event, etc.
>> This is what makes real time multitasking difficult.
>
>Those problems mostly disappear with sufficient
>independent cores.

Unfortunately, the xCore tile has only 8 logical cores. 

It seems hard to find out from XMOS web pages, how many tiles are
integrated on recent chips. If there are only 1 or 2 tiles,s there
would be only 8-16 logical cores on a chip.

Reply by Tom Gardner ●April 19, 20202020-04-19

On 19/04/20 11:19, upsidedown@downunder.com wrote:
> On Sun, 19 Apr 2020 10:11:18 +0100, Tom Gardner
> <spamjunk@blueyonder.co.uk> wrote:
> 
>> On 18/04/20 22:53, Rick C wrote:
>>> On Saturday, April 18, 2020 at 1:07:00 PM UTC-4, Tom Gardner wrote:
> 
> <clip>
> 
>>>> For a short intro to the hardware, software, and concepts see the
>>>> architecture flyer. https://www.xmos.com/file/xcore-architecture-flyer/
>>>>
>>>> FFI see the XMOS programming guide, which is beautifully written, succinct,
>>>> and clear. https://www.xmos.com/file/xmos-programming-guide/
>>>>
>>>> I wish all documentation was as good!
> 
> This information seems to be a few years old.

In one sense that is good: it means there has been no
fundamental changes recently. Stability (of such things)
is good :)

Detailed implementations, OTOH, should change regularly.


>>> The tricky part is dealing with the real time requirements of handling the
>>> I/O events.  Which task must be done first, which tasks can be interrupted to
>>> deal with other tasks, what resources are required to process an event, etc.
>>> This is what makes real time multitasking difficult.
>>
>> Those problems mostly disappear with sufficient
>> independent cores.
> 
> Unfortunately, the xCore tile has only 8 logical cores.
> 
> It seems hard to find out from XMOS web pages, how many tiles are
> integrated on recent chips. 

They do seem to be a less "open" than before, which I don't like.
However, although I haven't checked recently, I presume a free
registration continues to be all that is needed.


> If there are only 1 or 2 tiles,s there
> would be only 8-16 logical cores on a chip.

4 tiles. I /believe/ multiple chips can be interconnected
so as to transparently extend the internal switch matrix,
albeit with a increased latency. NUMA anyone :)


See what is stocked by DigiKey...

https://www.digikey.co.uk/products/en/integrated-circuits-ics/embedded-microcontrollers/685?k=xmos&k=&pkeyword=xmos&sv=0&sf=0&FV=-8%7C685&quantity=&ColumnSort=-143&page=1&stock=1&nstock=1&pageSize=25

Example:

XE232-1024-FB374 Features
Multicore Microcontroller with Advanced Multi-Core RISC Architecture
&bull; 32 real-time logical cores on 4 xCORE tiles
&bull; Cores share up to 2000 MIPS
&mdash; Up to 4000 MIPS in dual issue mode
&bull; Each logical core has:
&mdash; Guaranteed throughput of between 1 / 5 and 1 /8 of tile MIPS
&mdash; 16x32bit dedicated registers
&bull; 167 high-density 16/32-bit instructions
&mdash; All have single clock-cycle execution (except for divide)
&mdash; 32x32&rarr;64-bit MAC instructions for DSP, arithmetic and user-definable cryptographic
functions

Reply by Przemek Klosowski ●April 19, 20202020-04-19

On Thu, 16 Apr 2020 17:13:41 -0700, Paul Rubin wrote:

> Grant Edwards <invalid@invalid.invalid> writes:
>> Definitely. The M-class parts are so cheap, there's not much point in
>> thinking about doing it in an FPGA.
> 
> Well I think the idea is already you have other stuff in the FPGA, so
> you save a package and some communications by dropping in a softcore
> rather than using an external MCU.  I'm surprised that only high end
> FPGA's currently have hard MCU's already there.  Just like they have DSP
> blocks, ram blocks, SERDES, etc., they might as well put in some CPU
> blocks.

Maybe Risc-V will catch on. The design is FOSS, as is the toolchain (GDB 
and LLVM have Risc-V backends already for a while), and the simple 
versions take very few gates.
https://github.com/SpinalHDL/VexRiscv
https://hackaday.com/2019/11/19/emulating-risc-v-on-an-fpga/

Reply by David Brown ●April 19, 20202020-04-19

On 17/04/2020 18:11, Rick C wrote:
> On Friday, April 17, 2020 at 4:03:06 AM UTC-4, David Brown wrote:
>> On 17/04/2020 03:37, Rick C wrote:
>>> On Thursday, April 16, 2020 at 8:13:45 PM UTC-4, Paul Rubin
>>> wrote:
>>>> Grant Edwards <invalid@invalid.invalid> writes:
>>>>> Definitely. The M-class parts are so cheap, there's not much 
>>>>> point in thinking about doing it in an FPGA.
>>>> 
>>>> Well I think the idea is already you have other stuff in the
>>>> FPGA, so you save a package and some communications by dropping
>>>> in a softcore rather than using an external MCU.  I'm surprised
>>>> that only high end FPGA's currently have hard MCU's already
>>>> there.  Just like they have DSP blocks, ram blocks, SERDES,
>>>> etc., they might as well put in some CPU blocks.
>>> 
>>> There's a chip that goes the other direction.  The GA144 has 144
>>> very fast, very tiny CPUs in an FPGA fashion with no FPGA
>>> fabric.
>>> 
>>> It's not a popular chip because most people are invested in the 
>>> single CPU, lots of memory paradigm and focusing their efforts
>>> on making a single CPU work like it's many CPUs doing many
>>> different things.  Using this chip requires a very different
>>> mindset because of the memory size limitations which are inherent
>>> in the CPU design.
>> 
>> Could it be that the chip is not popular because it is not a good
>> fit for many applications?  A signal fast core is more flexible and
>> useful than many small cores, while an FPGA can do far more than
>> the tiny CPUs in the GA144.
> 
> You are the poster child for my statement about mindset.

You don't think that I /do/ understand that a different mindset is 
needed for a chip like the GA144?  I think it is a design that will be 
of little use in most cases, and a poor fit for many applications.  That 
is not because I don't understand how to work with such a device, or how 
it is different from other devices, or that I can't get into the right 
"mindset".

Please stop assuming that the only reason someone might disagree with 
you is that they are ignorant.

>  A single,
> fast CPU is harder to program than many, fast CPUs.  

Even if this were true, it is irrelevant, because the GA144 cores are 
not "fast CPUs".  They can each do almost nothing.  It really doesn't 
matter if they can do it quickly.  Clearly these cores can do /some/ 
things, but they are not comparable to a normal cpu when they have so 
limited memory.

A single very fast CPU can do single-tasking things very quickly.  It 
can also do many small and simple tasks, because it can run a 
multitasking system.

But if a task can't be split up into parallel or pipelined activities, 
then smaller or slower cpus can't do it as well, no matter how many you 
have.

> Programmers have
> to learn a lot in order to perform multitasking on a single CPU.

Sure.  But if they want to perform multitasking with more than one CPU, 
they have to learn the same things - and then more.  And if they need to 
do manual partitioning of tasks on many parts, they have even more to learn.

> It's a difficult and error prone design exercise.  Transition that to
> many CPUs and the vast majority of those problems go away.
> 

No, they don't.  (Let's be clear here - the majority of programmers 
/use/ multitasking systems rather than designing them.)

> I recall one designers complaining there was not nearly enough I/O
> capability to keep all the CPUs supplied with data, completely
> missing the idea that the CPUs are no longer a precious resource that
> you must squeeze every last drop of performance from.

You almost always have one resource that is the limit first - it may be 
cpu power, or ram, or flash, or IO, or any other kind of resource.  And 
it's fair to say that a lot of designers and programmers are not 
particularly good at judging what that resource might be - resulting in 
a lot of effort wasted in optimising for one resource when others are 
more significant.

> 
> 
>> You need outstanding benefits of something like the GA144 before it
>> makes sense to use it - and that's not something I have seen when I
>> have looked through the website and read the information.  It is 
>> not enough to simply be an interesting idea - it is not even enough
>> to be useful or better than alternatives.  In order to capture
>> customers, it has to be so hugely better that it is worth the cost
>> for people to learn a very different type of architecture, a very
>> different programming language, and a very different way of looking
>> at problems.
> 
> Yes, you are agreeing with me I think.  Designers are comfortable
> with the present paradigm and have trouble even conceiving of how to
> use this device.  That's not the same thing as the device not being
> suitable for applications... although it is definitely not a shoe
> that fits every foot.

In this case, it appears to be a shoe that fits almost no foot.  A good 
indication of this is that the whole company apparently stopped in 2012, 
judging by their website - only a short time after they got the whole 
thing going.  By my reckoning (from what I see on their site), they 
spent about a year or two after making the chip, making examples and 
application notes, and trying to sell the thing, and then collapsed 
through lack of interest.

> 
> However, there are many, many uses for it.  One area where it fits
> very well is a signal processing app where data flows through the
> chip.  This would be a nearly perfect match to the architecture
> allowing even a designer with little imagination to use the device.

Where would be the advantage?  Let's assume a company's design team are 
happy to spend the time learning the programming language (which would 
be a leap backwards in time compared to modern Forths), the tools from a 
pre-DOS era, and the unusual system architecture.  And we'll assume that 
their management are happy investing the time and effort with a unique 
product from a dead company run by a guy who was undoubtedly a pioneer 
and a genius in his time, but has been increasingly disconnected from 
reality for several decades.

What can this chip do that an FPGA can't do better?  What can it do that 
a DSP can't do better?  You have all these processing elements that can 
work fairly fast, but with almost no resources and very limited 
communication.  You can't connect freely across the chip, like you can 
with an FPGA.  You can't use much memory - you couldn't do anything but 
the smallest of filters.  The cores are weak - they may be able to run 
quickly, but any multiplication needs to be written out manually as 
loops of additions and shifts.  A DSP or FPGA will be capable of 
handling massively more data in and out of the chips.  And a normal cpu 
with SIMD will probably handle most jobs as well.

Don't get me wrong here - I am not saying there are no applications for 
such a device, or no applications for which it is a good fit.  I am not 
even saying there are no applications for which it is a /better/ fit - 
faster, smaller, cheaper or lower power than alternatives.

I am saying that its suitable use-cases are so small, and its 
disadvantages so big, that it is not going to be an appropriate choice 
in more than a tiny handful of systems.

The GA144 is not the first "big array of tiny processors" chip.  I have 
heard of several others over the years.  I haven't heard of any that 
have been big successes.  (That does not, by any means, prove that there 
/haven't/ been big successes.)

I think it is a good thing that people come along with some wildly 
different architectures, or totally out-of-the-box ideas.  But if it is 
ever going to be a success, such ideas need a "killer application".  If 
the GA144 really is useful for DSP work, then the company should have 
been making demo boards that did something amazing in DSP.  They should 
have looked at a Texas Instruments DSP running 720p30 MPEG encoding in 
real time (this was a decade ago), and make a GA144 board that did 
1080p60 encoding with a tenth of the power consumption, and shown how 
you could scale it with four GA144's to do 2160p60.  /Then/ they would 
have had something they could sell - something that would give companies 
a reason to spend multiple man-years teaching their development team a 
new way of designing.  Instead, they give an expensive and 
incomprehensible way to get make an Ethernet MAC at speeds that hadn't 
been seen for a decade.

> 
> I considered some apps for this device.  One was an attached
> oscilloscope.  I believe a design would suit this device pretty well.
> My preference would be to ship the data over USB to a PC for display
> which would require a USB interface which no one has yet developed,
> but connecting to an attached display would be simple.  An external
> memory would only be needed if advanced processing functions were
> required.  Most could be done right on the chip.
> 
> 
>> And frankly, the GA144 is not impressive or better at all.  Look at
>> the application notes and examples - you've got a 10 Mbit Ethernet 
>> controller that requires a third of the hardware of the GA144.  An
>> MD5 hash takes over a tenth of the hardware and has a speed similar
>> to a cheaper microcontroller from the year the GA144 was made,
>> 2012.  And it all requires programming in a language where the
>> /colour/ of the words is semantically and syntactically critical.
> 
> Yes, very clearly showing your bias and lack of imagination.  I
> remember the first time I used an IDE for code development.  The damn
> thing barely worked and was horribly slow.  I was happy with manually
> tracing code on printouts.  lol
> 
> 
>> It is nice to see people coming up with new ideas and new kinds of 
>> architectures, but there has to be big benefits if it is going to
>> be anything more than an academic curiosity.  GA144 is not it.
>> 
>> (The XMOS is far more realistic and with greater potential - that's
>> a family of devices that people can, and do, use.)
> 
> Ah, yes, the ever so niched XMOS.  An expensive replacement for the
> many faster and simpler devices that everyone is comfortable with.
> These devices have the same issues of user comfort the GA144 has and
> on top of that are only suited to a small niche where they have any
> real advantage over CPUs or FPGAs.
> 

I wonder if you are thinking of the same things.

The GA144 cores do very little each, and are programmed with all the 
user friendliness of a Spectrum ZX81, complete with hand assembling the 
machine code and manually counting addresses.  This is primarily a 
result of the fanaticism of the people behind the device and tools - the 
insistence that it all be able to run on this chip.  And it has a user 
community consisting of a dozen or so disciples of Chuck Moore who 
apparently haven't talked to anyone else since the 1980's.

XMOS takes a bit of changed ideas and some new ways of thinking to get 
the best from them.  But the (virtual) cores are solid devices, 
programmed in decent modern languages (after some very frustrating 
limitations in their early days) with a good IDE and tools, and a large 
community of users well supported by the company.  There are some things 
about the chips that I think are downright silly limitations, and I 
think a bit less "purism" would help, but they are a world apart from 
the GA144.

> I believe we've had this discussion before.
> 
> 
>>> That said, FPGA makers are rather stuck in the -larger is better-
>>> rut mostly because their bread and butter customer base, the
>>> comms vendors, want large FPGAs with fast ARM processors if any.
>> 
>> It is those customers that pay for the development of the whole
>> thing. Small customers then get the benefits of that.
> 
> Not if FPGAs that suit them are never developed.
> 
> That's the problem.  The two large FPGA makers aren't interested in
> the (I assume) smaller markets which require smaller devices in
> smaller, easy to use packages, potentially combined with CPUs such as
> ARM CM4 or RISC-V.
> 
> I have seen devices from Microchip with a RISC-V but they are
> typically far too pricey to end up in any of my designs.  This device
> will be out later this year.
> 

I think we can all agree that every customer would be happier with lower 
prices here!

Still, there are devices from Efinix (I know almost nothing about the 
company) and Lattice for a dollar or so.   Yes, these are fine-pitch BGA 
packages, but the range of companies that can handle these is much 
greater than it used to be (even if it is by outsourcing the board 
production).

Reply by Rick C ●April 19, 20202020-04-19

On Sunday, April 19, 2020 at 5:11:22 AM UTC-4, Tom Gardner wrote:
> On 18/04/20 22:53, Rick C wrote:
> > On Saturday, April 18, 2020 at 1:07:00 PM UTC-4, Tom Gardner wrote:
> >> On 18/04/20 16:51, Theo wrote:
> >>> David Brown <david.brown@hesbynett.no> wrote:
> >>>> The real power comes from when you want to do something that is /not/ 
> >>>> standard, or at least not common.
> >>>> 
> >>>> Implementing a standard UART in a couple of XMOS cores is a pointless 
> >>>> waste of silicon.  Implementing a UART that uses Manchester encoding
> >>>> for the UART signals so that you can use it on a balanced line without 
> >>>> keeping track of which line is which - /then/ you've got something
> >>>> that can be done just as easily on an XMOS and is a big pain to do on
> >>>> a standard microcontroller.
> >>>> 
> >>>> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an 
> >>>> EtherCAT slave is not going to be much harder for the XMOS than a
> >>>> normal Ethernet MAC, but is impossible on any microcontroller without 
> >>>> specialised peripherals.
> >>> 
> >>> The Cypress PSoC has an interesting take on this.  You can specify (with
> >>> the GUI) that you want a component.  If you specify a simple component
> >>> (let's say I2C slave) there's a hard IP for that.  But if you specify
> >>> something that's more complicated (say I2C master and slave on the same
> >>> pins) it builds it with the existing IP plus some of its FPGA-like logic.
> >>> Takes more resources but allows you to do many more things than they put
> >>> in as hard cores.
> >>> 
> >>> Unfortunately they don't provide a vast quantity of cells for that logic,
> >>> so it's fine if you want to add just a few unusual bits to the regular 
> >>> microcontroller, but not a big system.  (PSoC also has the programmable 
> >>> analogue stuff, which presumably constrains the process they can use)
> >>> 
> >>> It would be quite interesting to combine that with the XMOS approach -
> >>> more fluid boundaries between hardware and software.
> >>> 
> >>> I'm a bit surprised XMOS don't provide 'soft realtime' virtual cores -
> >>> lock down the cores running a task that absolutely needs to be
> >>> bounded-latency, and then multitask the remaining tasks across the other
> >>> cores.  If that was provided as an integrated service then it wouldn't
> >>> need messing about running a scheduler.
> >> 
> >> What scheduler?
> >> 
> >> To a very useful approximation the an RTOS's functions are encoded in
> >> hardware.
> >> 
> >> The "select" statement is like a "switch" statement, except that the core
> >> sleeps until one of the "case" conditions becomes true. In effect the case
> >> conditions are events. Events can be inputs arriving, outputs completing,
> >> timeouts, etc.
> > 
> > Isn't that what interrupts are about?  Many processors have a "halt and wait
> > for interrupt" instruction.
> 
> Hardware on its own is trivial and has been done many times
> over the decades.
> 
> XMOS have integrated the hardware with the software with
> the conceptual tools, and have provided excellent development
> tools.
> 
> That has not been achieved elsewhere.

Not sure what your point is.  XMOS doesn't have programmable hardware, so it's not clear what you are referring to "integrating".  The only company I've seen integrate the hardware and software is Cypress and that is fairly crude. 

What exactly is special or unique about XMOS other than the fact they have N processors with a shared memory on a single chip?  The point is if this is not the Goldilocks combination for you, the XMOS is not so great a chip.  


> >> N.B. as far as a core is concerned, receiving input from a pin is the same
> >> as receiving a message from another task/core. Ditto output to a pin and
> >> sending a message to another task/core. Everything is via the "port"
> >> abstraction, and that extends down into the hardware and i/o system.
> > 
> > That sounds a lot like the Transputer.
> 
> Exactly.
> 
> And xC is a modern successor to Occam, with CSP a core abstraction.
> 
> Some of the original Transputer team are in XMOS.
> 
> 
> 
> 
> >> For a short intro to the hardware, software, and concepts see the 
> >> architecture flyer. https://www.xmos.com/file/xcore-architecture-flyer/
> >> 
> >> FFI see the XMOS programming guide, which is beautifully written, succinct,
> >> and clear. https://www.xmos.com/file/xmos-programming-guide/
> >> 
> >> I wish all documentation was as good!
> >> 
> >> 
> >>> After all, there must be applications with a lot of
> >>> do-this-once-a-second tasks, that would be wasted using a whole core?  Do
> >>> they have a scheduler where you can tell a task to sleep until a
> >>> particular time?
> >> 
> >> See "combinable tasks" in my post in response to David Brown's post.
> >> 
> >> 
> >>> Or is FreeRTOS intended for that?  In which case you presumably have to 
> >>> write the code in a different way compared to the hard-tasks?
> >> 
> >> I really don't see the point of FreeRTOS running inside an xCORE chip. To
> >> be over simplistic, RTOSs are designed to schedule multiple  tasks on a
> >> single processor and to connect i/o with tasks (typically via interrupts).
> >> 
> >> All that is done in hardware in the xCORE ecosystem.
> > 
> > That's true of most processors, no?  An I/O event happens and an interrupt
> > invokes a task called an interrupt handler.
> > 
> > The tricky part is dealing with the real time requirements of handling the
> > I/O events.  Which task must be done first, which tasks can be interrupted to
> > deal with other tasks, what resources are required to process an event, etc.
> > This is what makes real time multitasking difficult.
> 
> Those problems mostly disappear with sufficient
> independent cores.

Hmmm...  the keyword there is "sufficient".  That was when I mentioned 144 and the cycle began.  


> What remains is the logical dependencies which can result in
> livelock or deadlock. No implementation technology can solve
> those, since they are inherent in either the problem or the
> architectural solution.

Indeed.  However these problems go away once you decompose your problem for independent processors such as in an FPGA.  I can't remember ever having to even consider a deadlock other than the most simple of issues when designing in an FPGA.  

Multitasking on a single processor (or even on several) can open up a design to very complex interactions which have subtle failure modes. 

-- 

  Rick C.

  --+- Get 1,000 miles of free Supercharging
  --+- Tesla referral code - https://ts.la/richard11209

Reply by Rick C ●April 19, 20202020-04-19

On Sunday, April 19, 2020 at 2:52:55 PM UTC-4, Przemek Klosowski wrote:
> On Thu, 16 Apr 2020 17:13:41 -0700, Paul Rubin wrote:
> 
> > Grant Edwards <invalid@invalid.invalid> writes:
> >> Definitely. The M-class parts are so cheap, there's not much point in
> >> thinking about doing it in an FPGA.
> > 
> > Well I think the idea is already you have other stuff in the FPGA, so
> > you save a package and some communications by dropping in a softcore
> > rather than using an external MCU.  I'm surprised that only high end
> > FPGA's currently have hard MCU's already there.  Just like they have DSP
> > blocks, ram blocks, SERDES, etc., they might as well put in some CPU
> > blocks.
> 
> Maybe Risc-V will catch on. The design is FOSS, as is the toolchain (GDB 
> and LLVM have Risc-V backends already for a while), and the simple 
> versions take very few gates.
> https://github.com/SpinalHDL/VexRiscv
> https://hackaday.com/2019/11/19/emulating-risc-v-on-an-fpga/

Microchip is already working on a RISC-V FPGA intended to be out later this year.  It's a pretty hefty beast though, more like the zynq and less like their smaller ARM devices. 

It will be interesting to see just how quickly the RISC-V is adopted in the greater community.  I expect it to show up in the cell phone market and ramp up exponentially.  (I would say virally, but not these days) 

-- 

  Rick C.

  --++ Get 1,000 miles of free Supercharging
  --++ Tesla referral code - https://ts.la/richard11209

Reply by Tom Gardner ●April 19, 20202020-04-19

On 19/04/20 21:37, Rick C wrote:
> On Sunday, April 19, 2020 at 5:11:22 AM UTC-4, Tom Gardner wrote:
>> On 18/04/20 22:53, Rick C wrote:
>>> On Saturday, April 18, 2020 at 1:07:00 PM UTC-4, Tom Gardner wrote:
>>>> On 18/04/20 16:51, Theo wrote:
>>>>> David Brown <david.brown@hesbynett.no> wrote:
>>>>>> The real power comes from when you want to do something that is /not/
>>>>>> standard, or at least not common.
>>>>>>
>>>>>> Implementing a standard UART in a couple of XMOS cores is a pointless
>>>>>> waste of silicon.  Implementing a UART that uses Manchester encoding
>>>>>> for the UART signals so that you can use it on a balanced line without
>>>>>> keeping track of which line is which - /then/ you've got something
>>>>>> that can be done just as easily on an XMOS and is a big pain to do on
>>>>>> a standard microcontroller.
>>>>>>
>>>>>> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing an
>>>>>> EtherCAT slave is not going to be much harder for the XMOS than a
>>>>>> normal Ethernet MAC, but is impossible on any microcontroller without
>>>>>> specialised peripherals.
>>>>>
>>>>> The Cypress PSoC has an interesting take on this.  You can specify (with
>>>>> the GUI) that you want a component.  If you specify a simple component
>>>>> (let's say I2C slave) there's a hard IP for that.  But if you specify
>>>>> something that's more complicated (say I2C master and slave on the same
>>>>> pins) it builds it with the existing IP plus some of its FPGA-like logic.
>>>>> Takes more resources but allows you to do many more things than they put
>>>>> in as hard cores.
>>>>>
>>>>> Unfortunately they don't provide a vast quantity of cells for that logic,
>>>>> so it's fine if you want to add just a few unusual bits to the regular
>>>>> microcontroller, but not a big system.  (PSoC also has the programmable
>>>>> analogue stuff, which presumably constrains the process they can use)
>>>>>
>>>>> It would be quite interesting to combine that with the XMOS approach -
>>>>> more fluid boundaries between hardware and software.
>>>>>
>>>>> I'm a bit surprised XMOS don't provide 'soft realtime' virtual cores -
>>>>> lock down the cores running a task that absolutely needs to be
>>>>> bounded-latency, and then multitask the remaining tasks across the other
>>>>> cores.  If that was provided as an integrated service then it wouldn't
>>>>> need messing about running a scheduler.
>>>>
>>>> What scheduler?
>>>>
>>>> To a very useful approximation the an RTOS's functions are encoded in
>>>> hardware.
>>>>
>>>> The "select" statement is like a "switch" statement, except that the core
>>>> sleeps until one of the "case" conditions becomes true. In effect the case
>>>> conditions are events. Events can be inputs arriving, outputs completing,
>>>> timeouts, etc.
>>>
>>> Isn't that what interrupts are about?  Many processors have a "halt and wait
>>> for interrupt" instruction.
>>
>> Hardware on its own is trivial and has been done many times
>> over the decades.
>>
>> XMOS have integrated the hardware with the software with
>> the conceptual tools, and have provided excellent development
>> tools.
>>
>> That has not been achieved elsewhere.
> 
> Not sure what your point is.  XMOS doesn't have programmable hardware, so it's not clear what you are referring to "integrating".  The only company I've seen integrate the hardware and software is Cypress and that is fairly crude.
> 
> What exactly is special or unique about XMOS other than the fact they have N processors with a shared memory on a single chip?  The point is if this is not the Goldilocks combination for you, the XMOS is not so great a chip.
> 
> 
>>>> N.B. as far as a core is concerned, receiving input from a pin is the same
>>>> as receiving a message from another task/core. Ditto output to a pin and
>>>> sending a message to another task/core. Everything is via the "port"
>>>> abstraction, and that extends down into the hardware and i/o system.
>>>
>>> That sounds a lot like the Transputer.
>>
>> Exactly.
>>
>> And xC is a modern successor to Occam, with CSP a core abstraction.
>>
>> Some of the original Transputer team are in XMOS.
>>
>>
>>
>>
>>>> For a short intro to the hardware, software, and concepts see the
>>>> architecture flyer. https://www.xmos.com/file/xcore-architecture-flyer/
>>>>
>>>> FFI see the XMOS programming guide, which is beautifully written, succinct,
>>>> and clear. https://www.xmos.com/file/xmos-programming-guide/
>>>>
>>>> I wish all documentation was as good!
>>>>
>>>>
>>>>> After all, there must be applications with a lot of
>>>>> do-this-once-a-second tasks, that would be wasted using a whole core?  Do
>>>>> they have a scheduler where you can tell a task to sleep until a
>>>>> particular time?
>>>>
>>>> See "combinable tasks" in my post in response to David Brown's post.
>>>>
>>>>
>>>>> Or is FreeRTOS intended for that?  In which case you presumably have to
>>>>> write the code in a different way compared to the hard-tasks?
>>>>
>>>> I really don't see the point of FreeRTOS running inside an xCORE chip. To
>>>> be over simplistic, RTOSs are designed to schedule multiple  tasks on a
>>>> single processor and to connect i/o with tasks (typically via interrupts).
>>>>
>>>> All that is done in hardware in the xCORE ecosystem.
>>>
>>> That's true of most processors, no?  An I/O event happens and an interrupt
>>> invokes a task called an interrupt handler.
>>>
>>> The tricky part is dealing with the real time requirements of handling the
>>> I/O events.  Which task must be done first, which tasks can be interrupted to
>>> deal with other tasks, what resources are required to process an event, etc.
>>> This is what makes real time multitasking difficult.
>>
>> Those problems mostly disappear with sufficient
>> independent cores.
> 
> Hmmm...  the keyword there is "sufficient".  That was when I mentioned 144 and the cycle began.
> 
> 
>> What remains is the logical dependencies which can result in
>> livelock or deadlock. No implementation technology can solve
>> those, since they are inherent in either the problem or the
>> architectural solution.
> 
> Indeed.  However these problems go away once you decompose your problem for independent processors such as in an FPGA.  I can't remember ever having to even consider a deadlock other than the most simple of issues when designing in an FPGA.
> 
> Multitasking on a single processor (or even on several) can open up a design to very complex interactions which have subtle failure modes.
>

Reply by Tom Gardner ●April 19, 20202020-04-19

On 19/04/20 21:37, Rick C wrote:
> On Sunday, April 19, 2020 at 5:11:22 AM UTC-4, Tom Gardner wrote:
>> On 18/04/20 22:53, Rick C wrote:
>>> On Saturday, April 18, 2020 at 1:07:00 PM UTC-4, Tom Gardner wrote:
>>>> On 18/04/20 16:51, Theo wrote:
>>>>> David Brown <david.brown@hesbynett.no> wrote:
>>>>>> The real power comes from when you want to do something that is
>>>>>> /not/ standard, or at least not common.
>>>>>> 
>>>>>> Implementing a standard UART in a couple of XMOS cores is a
>>>>>> pointless waste of silicon.  Implementing a UART that uses
>>>>>> Manchester encoding for the UART signals so that you can use it on
>>>>>> a balanced line without keeping track of which line is which -
>>>>>> /then/ you've got something that can be done just as easily on an
>>>>>> XMOS and is a big pain to do on a standard microcontroller.
>>>>>> 
>>>>>> Implementing an Ethernet MAC on an XMOS is pointless.  Implementing
>>>>>> an EtherCAT slave is not going to be much harder for the XMOS than
>>>>>> a normal Ethernet MAC, but is impossible on any microcontroller
>>>>>> without specialised peripherals.
>>>>> 
>>>>> The Cypress PSoC has an interesting take on this.  You can specify
>>>>> (with the GUI) that you want a component.  If you specify a simple
>>>>> component (let's say I2C slave) there's a hard IP for that.  But if
>>>>> you specify something that's more complicated (say I2C master and
>>>>> slave on the same pins) it builds it with the existing IP plus some
>>>>> of its FPGA-like logic. Takes more resources but allows you to do
>>>>> many more things than they put in as hard cores.
>>>>> 
>>>>> Unfortunately they don't provide a vast quantity of cells for that
>>>>> logic, so it's fine if you want to add just a few unusual bits to the
>>>>> regular microcontroller, but not a big system.  (PSoC also has the
>>>>> programmable analogue stuff, which presumably constrains the process
>>>>> they can use)
>>>>> 
>>>>> It would be quite interesting to combine that with the XMOS approach
>>>>> - more fluid boundaries between hardware and software.
>>>>> 
>>>>> I'm a bit surprised XMOS don't provide 'soft realtime' virtual cores
>>>>> - lock down the cores running a task that absolutely needs to be 
>>>>> bounded-latency, and then multitask the remaining tasks across the
>>>>> other cores.  If that was provided as an integrated service then it
>>>>> wouldn't need messing about running a scheduler.
>>>> 
>>>> What scheduler?
>>>> 
>>>> To a very useful approximation the an RTOS's functions are encoded in 
>>>> hardware.
>>>> 
>>>> The "select" statement is like a "switch" statement, except that the
>>>> core sleeps until one of the "case" conditions becomes true. In effect
>>>> the case conditions are events. Events can be inputs arriving, outputs
>>>> completing, timeouts, etc.
>>> 
>>> Isn't that what interrupts are about?  Many processors have a "halt and
>>> wait for interrupt" instruction.
>> 
>> Hardware on its own is trivial and has been done many times over the
>> decades.
>> 
>> XMOS have integrated the hardware with the software with the conceptual
>> tools, and have provided excellent development tools.
>> 
>> That has not been achieved elsewhere.
> 
> Not sure what your point is.  XMOS doesn't have programmable hardware, so
> it's not clear what you are referring to "integrating".  The only company
> I've seen integrate the hardware and software is Cypress and that is fairly
> crude.

I'm not going to /poorly/ regurgitate XMOS information for you.

Start with
https://www.xmos.com/download/xCORE-Architecture-Flyer(1.3).pdf
and then you will be able to ask more useful questions.


> What exactly is special or unique about XMOS other than the fact they have N
> processors with a shared memory on a single chip?  The point is if this is
> not the Goldilocks combination for you, the XMOS is not so great a chip.

Sigh, IT ISN'T JUST THE PROCESSOR HARDWARE. Got the point?
See the architecture flyer, and then add
https://www.xmos.com/software/tools/



>>>> N.B. as far as a core is concerned, receiving input from a pin is the
>>>> same as receiving a message from another task/core. Ditto output to a
>>>> pin and sending a message to another task/core. Everything is via the
>>>> "port" abstraction, and that extends down into the hardware and i/o
>>>> system.
>>> 
>>> That sounds a lot like the Transputer.
>> 
>> Exactly.
>> 
>> And xC is a modern successor to Occam, with CSP a core abstraction.
>> 
>> Some of the original Transputer team are in XMOS.
>> 
>> 
>> 
>> 
>>>> For a short intro to the hardware, software, and concepts see the 
>>>> architecture flyer.
>>>> https://www.xmos.com/file/xcore-architecture-flyer/
>>>> 
>>>> FFI see the XMOS programming guide, which is beautifully written,
>>>> succinct, and clear. https://www.xmos.com/file/xmos-programming-guide/
>>>> 
>>>> I wish all documentation was as good!
>>>> 
>>>> 
>>>>> After all, there must be applications with a lot of 
>>>>> do-this-once-a-second tasks, that would be wasted using a whole core?
>>>>> Do they have a scheduler where you can tell a task to sleep until a 
>>>>> particular time?
>>>> 
>>>> See "combinable tasks" in my post in response to David Brown's post.
>>>> 
>>>> 
>>>>> Or is FreeRTOS intended for that?  In which case you presumably have
>>>>> to write the code in a different way compared to the hard-tasks?
>>>> 
>>>> I really don't see the point of FreeRTOS running inside an xCORE chip.
>>>> To be over simplistic, RTOSs are designed to schedule multiple  tasks
>>>> on a single processor and to connect i/o with tasks (typically via
>>>> interrupts).
>>>> 
>>>> All that is done in hardware in the xCORE ecosystem.
>>> 
>>> That's true of most processors, no?  An I/O event happens and an
>>> interrupt invokes a task called an interrupt handler.
>>> 
>>> The tricky part is dealing with the real time requirements of handling
>>> the I/O events.  Which task must be done first, which tasks can be
>>> interrupted to deal with other tasks, what resources are required to
>>> process an event, etc. This is what makes real time multitasking
>>> difficult.
>> 
>> Those problems mostly disappear with sufficient independent cores.
> 
> Hmmm...  the keyword there is "sufficient".  That was when I mentioned 144
> and the cycle began.

Those cores can be regarded as merely "LUTs on steroids" :)



>> What remains is the logical dependencies which can result in livelock or
>> deadlock. No implementation technology can solve those, since they are
>> inherent in either the problem or the architectural solution.
> 
> Indeed.  However these problems go away once you decompose your problem for
> independent processors such as in an FPGA.  I can't remember ever having to
> even consider a deadlock other than the most simple of issues when designing
> in an FPGA.

If you have two independent FSMs sending events to each other,
then you can get livelock and deadlock.

Whether the FSMs are implemented in hardware, software, wetware,
is irrelevant.

Proving such systems of FSM are deadlock-free is an active
research topic (and probably always will be).


> Multitasking on a single processor (or even on several) can open up a design
> to very complex interactions which have subtle failure modes.
Yup. As I said, the implementation technology is irrelevant.

Previous 4 567 8 9 Next

Custom CPU Designs

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group