EmbeddedRelated.com
Forums

Multi-context processor

Started by Rob Finch November 6, 2006
Ah, the momories.

I worked with Transputers - I worte an OS for high speed data gathering and control for an automotive application.

The high priority processes would execute until they blocked on IO (channel coms, timer and such). I am not sure how they ordered the high priority processes. The low priority processes would execute if there were no high priority processes able to execute, for a time slice until they blocked. The time slice was round robin scheduled.

In our system, the OS was built from high priorirty processes and the applications ran in one low priority process. We had our own deadline scheduler written in asm within the one low priority process. Took us more effort in getting the info from Inmos than it did to write the scheduler.

Anyway, I guess the transputer was multi-context but only ran one context at a time.

Fun times.
----- Original Message -----
From: Paul Davis
To: f...
Sent: Friday, November 10, 2006 12:32 AM
Subject: Re: [fpga-cpu] Re: Multi-context processor
Eric Smith wrote:

> The TX-2 hardware does support multiple threads (called "sequences"),
> but not simultaneity. There is a priority system to arbitrate for use
> of the processor.
>
> The PPUs of the CDC 6600 were similar, in that a single physical
> processor serviced ten threads in round-robin fashion (known as
> a barrel processor).
>
> More recent examples are the Ubicom IP3023 and IP5160
> microprocessors, which support multiple threads with provision
> for fixed time slices for "hard real time" threads, and round
> robin with two priority levels for the remaining threads.
> (Disclaimer: I am employed by Ubicom.)

That reminds me of another example - the Transputer (1982/3?), which had
a more sophisticated (of course, it was nearly 20 years later)
scheduler. This carried on executing a thread until it blocked on an I/O
operation. Actually, perhaps it was less sophisticated - my recollection
is that it was difficult to write code because there was no pre-emptive
scheduling, and you had to manually put in instructions to force a swap.
Hi gang! It's nice to see the old familiar faces, so to speak.

I thought the transputer was not multi-context, but rather only had a few
words (3?) of state in regs and the rest was in on-die SRAM, so context
switches were very fast. Perhaps its just a point of view thing.

The Sun Niagara CMP cores are multi-context and schedule the runnable
threads -- those not blocked on cache misses etc. -- a great latency
tolerance technique.

Other notable multi-context processors included the Xerox Alto, the Denelcor
HEP, the MicroUnity MediaProcessor, and the Tera MTA. The latter had 128
thread contexts per core for latency tolerance sans data caches.

I have designed several unpublished multi-context processors in FPGAs to
various degrees of "finished". BRAMs plus V4/V5 BRAM pipelined output
registers mean you can theoretically clock the BRAMs really quickly...
Multithreading is a promising way to get rid of result mux networks...

You should also check out John Jakson's work.

Cheers,
Jan.
Cray has recently announced the rebirth of the MTA, in the form of custom
AMD-socket-compatible 128-way multithreaded processors dropped into the
XT3's 3D torus interconnect:
http://www.cray.com/products/xmt/

With enough FPGA-based pin-compatible modules, maybe we could build our
own:
http://www.drccomputer.com/pages/modules.html
(yeah, right!)

-Jacob

On Tue, 14 Nov 2006, Jan Gray wrote:

> Hi gang! It's nice to see the old familiar faces, so to speak.
>
> I thought the transputer was not multi-context, but rather only had a few
> words (3?) of state in regs and the rest was in on-die SRAM, so context
> switches were very fast. Perhaps its just a point of view thing.
>
> The Sun Niagara CMP cores are multi-context and schedule the runnable
> threads -- those not blocked on cache misses etc. -- a great latency
> tolerance technique.
>
> Other notable multi-context processors included the Xerox Alto, the Denelcor
> HEP, the MicroUnity MediaProcessor, and the Tera MTA. The latter had 128
> thread contexts per core for latency tolerance sans data caches.
>
> I have designed several unpublished multi-context processors in FPGAs to
> various degrees of "finished". BRAMs plus V4/V5 BRAM pipelined output
> registers mean you can theoretically clock the BRAMs really quickly...
> Multithreading is a promising way to get rid of result mux networks...
>
> You should also check out John Jakson's work.
>
> Cheers,
> Jan.
Jan Gray wrote:

> I thought the transputer was not multi-context, but rather only had a few
> words (3?) of state in regs and the rest was in on-die SRAM, so context
> switches were very fast. Perhaps its just a point of view thing.

It's 16 years since I looked at this, but I think it was microcoded and
relied on chains of process descriptors and contexts in RAM. So, yes,
it's a question of where you draw the boundaries. But, obviously not
simultaneous (but what is 'simultaneous' anyway? Again, depends on where
your boundaries are drawn).

> I have designed several unpublished multi-context processors in FPGAs to
> various degrees of "finished". BRAMs plus V4/V5 BRAM pipelined output
> registers mean you can theoretically clock the BRAMs really quickly...

How fast? I did some quick research on ASIC RAM cycle times recently to
try to get a ballpark max frequency figure for a new chip. I wanted
2.0ns or better, but could only get 3.0 or 2.5 in the stuff that I have
data on.

But, in an FPGA, the RAM cycle time is presumably not that important.
Aren't the carry chains going to kill you? Or something else?
> How fast? I did some quick research on ASIC RAM cycle times recently to
> try to get a ballpark max frequency figure for a new chip. I wanted
> 2.0ns or better, but could only get 3.0 or 2.5 in the stuff that I have
> data on.

> But, in an FPGA, the RAM cycle time is presumably not that important.
> Aren't the carry chains going to kill you? Or something else?

You can do over 300 MHz out of the new BRAMs with the output register
enabled. Other things get in the way up at those speeds.

As for slow adder carry chains, it is possible to ping pong two sets of
adders, each with A and B input registers clocked on altenate cycles, doing
two cycle adds at high speed. It's not area or power efficient, though.

Also interesting is to use the 2 port BRAM as a double clocked
1-cycle-latency 4 port BRAM. Put the multiple thread contexts (register
files) in there. You get 16 sets of 32 registers...

Jan.
On Wed, 15 Nov 2006 00:41:16 -0800 (PST), "Jacob Nelson"
said:
>
> With enough FPGA-based pin-compatible modules, maybe we could build our
> own:
> http://www.drccomputer.com/pages/modules.html
> (yeah, right!)
>
> -Jacob
>

Coincidentally, I happen to have one of those modules right in front of
me. Not got to grips with it yet, but the mechanical construction is a
work of art.

Neil
Isn't that HyperThreading from Intel does?

Nelson

> --- In f..., Paul Davis wrote:
>>
>> Rob Finch wrote:
>> > What's new with the new multi-context processor patent ?
>> >
>> > http://www.freepatentsonline.com/5872985.html
>>
>> It's a US patent; it doesn't have to have anything new in it. This is
>> the patent office that has issued 20-odd patents on perpetual motion
>> machines.
>>
>> PD
>
> So who else has a CPU that runs multiple contexts (programs) through
> the pipeline at the same time?
> At least that's what it sounded like it was doing... whenever one
> context stalls the other can run to keep the CPU doing something all
> the time.
>

To post a message, send it to: f...
To unsubscribe, send a blank message to: f...