Sign in

username:

password:



Not a member?

Search fpga-cpu



Search tips

Subscribe to fpga-cpu



fpga-cpu by Keywords

Altera | CISCifying | IDE | ISA | Java | JHDL | JTAG | LBU | MicroBlaze | PAR | PCI | RISC | SoC | Spartan | Transputers | Verilog | VHDL | Virtex | VLIW | WebPack | Xilinx | Xsoc | YARD-1A

Ads

Discussion Groups

Discussion Groups | FPGA-CPU | RE: hardware cpu scheduler, pointers to papers?

This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).

hardware cpu scheduler, pointers to papers? - Sean - Mar 21 21:28:00 2002


I have seen some references to hardware schedulers but can't recall any of the
links at this time. I am familiar with the transputer hardware thread arch. I
was wondering if anyone has experience with hardware schedulers and built in
cpu virtualization.

I just saw a reference to a company in Washington state that sells an ip core
that has a hardware scheduler. Didn't bookmark them and I google for them...
dang.

Some of my questions are.

How many context switches per second could they handle?

What about code running in an interrupt or un interruptable code, polling the
serial port, etc?

How about interfacing the cache with the scheduler, ie switching to a different
context while data is fetched from the main memory to local cache kinda like the
Cray MTA design.

Is the BRAM and DRAM on the xilinx parts useable for a partitioned process
cache? Could each process context get its own space to store registers and a
small amount of data? Then problems of thrashing the cache from high levels of
context switching could be mitigated.

Thanks, Sean.





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )


RE: hardware cpu scheduler, pointers to papers? - Jan Gray - Mar 22 2:12:00 2002

XYRON

> I just saw a reference to a company in Washington state that
> sells an ip core that has a hardware scheduler.

That would be Xyron Semi of Vancouver, WA. I saw their booth at the
Embedded Systems Conf.

See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's
innovation, no processor has ever included the entire interrupt and task
switching operations in hardware."

Xyron offers a DLX ISA (if I recall correctly) soft core with their
"ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a
Virtex-II (no speed grade specified, no LUT count specified).
(MicroBlaze runs at 125 MHz.)

Their literature also states this ZOTS equipped FPGA CPU provides the
equivalent of 16 DMA channels -- this reminded me of the xr16's 16
PCs/DMA address registers.

But is it profitable to add hardware, possibly inserting multiplexers
into critical paths, etc. to reduce the latency and overhead of a time
consuming operation such as task switching? You must do the math: for
your app, what are the overheads of interrupts, context switches,
scheduling, etc., compared to what is the throughput (1/cycle time) or
area impact of the additional hardware in non-task-switching code? And
what are your interrupt and switching *latency* requirements and can
they be met in software?

(Observe that if you can reduce and/or bound the interrupt response
latency you may be able to simplify your peripheral designs, provide
smaller FIFOs, etc.)

Of course, a 100% hardware tasking approach may well be less flexible
than a software or hybrid approach.

Finally (and I don't know if "ZOTS" can be used this way) these days
when an L2 cache miss may cost 100 issue slots or more, it may be
profitable to switch threads on a cache miss or similar event, if the
overhead is low enough. (Common knowledge.) But this doesn't require
hardware scheduling, just multiple runnable threads. BRAM

> Is the BRAM and DRAM on the xilinx parts useable for a
> partitioned process
> cache? Could each process context get its own space to store
> registers and a
> small amount of data? Then problems of thrashing the cache
> from high levels of
> context switching could be mitigated.

BRAM? Certainly. See www.fpgacpu.org/usenet/bb.html. For example,

"* multiple register file contexts, including user/kernel/interrupt
handler
shadow contexts, or multiple threads' contexts; ..."

"* hybrid schemes with small fast multiported register files/stacks
using
fine grained embedded RAMs, backed by larger multiple frame/context
storage
using large embedded RAM blocks, providing fast call/return and/or fast
context switching; ...

"* per-task state tables, including priorities, task state, next-task
info,
attributes, and masks; fast dedicated thread local storage; ..."

Now, separate (per process) partitions in caches may well provide more
predictable latencies to each task, but on a workload where task
switching is not *so* frequent, throughput would presumably be higher
with one much larger unified cache (thrashed equally by all tasks :-).
(Also observe that a partitioned data cache is a headache for mutable
data -- what if several partitions each have the same line of data
cached, and then one thread modifies its copy of the line?)

I suppose a lot depends upon how you value low interrupt or task switch
latency relative to throughput.

By the way, there is no DRAM per se on Xilinx parts. MORE HARDWARE THREADING

See also the recent context switching thread,
http://groups.yahoo.com/group/fpga-cpu/message/746.

Perhaps someone up to speed on the Transputer design can let us know the
details of its task switching design, overheads, and how it did the CSP
rendezvous:

channel!datum -- send datum to receiver
channel?datum -- receive datum from sender

aJile aJ-100: http://www.ajile.com/downloads/aJ-100_PR3.pdf:

"Hardware based Java Threading Support
˛ Hard real-time, multi-threading kernel in hardware
˛ Threading operations are atomic including true Java synchronization
˛ Built-in deterministic scheduling queues
˛ Thread to thread yield in less than 1µsec
˛ Eliminates traditional RTOS layer"

See also http://citeseer.nj.nec.com/fiske95thread.html, Fiske: "Thread
Prioritization: A Thread Scheduling Mechanism for Multiple-Context
Parallel Processors".

See also Ekanadham et al, "An Architecture for Generalized
Synchronization and Fast Switching" in chapter 12 of Iannucci et al,
"Multithreaded Computer Architecture: A Summary of the State of the
Art", Kluwer, 1994 (http://www.wkap.nl/prod/b/0-7923-9477-1?a=1).

Jan Gray, Gray Research LLC




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: hardware cpu scheduler, pointers to papers? - Goran Bilski - Mar 22 10:35:00 2002

A much better idea is to have a small RTOS kernel with all the kernel services in
hardware.
You will not mess up the pipeline of the processor but the whole
scheduling/message queues/....
is done in hardware.
The only benefit of moving in the kernel functions into the processor is that one
can save time on
the register bank switch.
But the overall cost is to much.

i.e. 2000 task switch/s and for each task switch you will save (32+32)*2 downto
maybe (32*32)*1 (unless
all task register is stored in shadow register files but that will definitely
kill the FMax)
so you will save 2000*64 clock cycles => 128 000 clock cycles, if you running
microblaze at 125 MHZ
that will save you 0.1% in performance.
Big deal and the added stuff into the pipeline is probably costing you around 10%
performance degradation.

Göran

Jan Gray wrote:

> XYRON
>
> > I just saw a reference to a company in Washington state that
> > sells an ip core that has a hardware scheduler.
>
> That would be Xyron Semi of Vancouver, WA. I saw their booth at the
> Embedded Systems Conf.
>
> See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's
> innovation, no processor has ever included the entire interrupt and task
> switching operations in hardware."
>
> Xyron offers a DLX ISA (if I recall correctly) soft core with their
> "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a
> Virtex-II (no speed grade specified, no LUT count specified).
> (MicroBlaze runs at 125 MHz.)
>
> Their literature also states this ZOTS equipped FPGA CPU provides the
> equivalent of 16 DMA channels -- this reminded me of the xr16's 16
> PCs/DMA address registers.
>
> But is it profitable to add hardware, possibly inserting multiplexers
> into critical paths, etc. to reduce the latency and overhead of a time
> consuming operation such as task switching? You must do the math: for
> your app, what are the overheads of interrupts, context switches,
> scheduling, etc., compared to what is the throughput (1/cycle time) or
> area impact of the additional hardware in non-task-switching code? And
> what are your interrupt and switching *latency* requirements and can
> they be met in software?
>
> (Observe that if you can reduce and/or bound the interrupt response
> latency you may be able to simplify your peripheral designs, provide
> smaller FIFOs, etc.)
>
> Of course, a 100% hardware tasking approach may well be less flexible
> than a software or hybrid approach.
>
> Finally (and I don't know if "ZOTS" can be used this way) these days
> when an L2 cache miss may cost 100 issue slots or more, it may be
> profitable to switch threads on a cache miss or similar event, if the
> overhead is low enough. (Common knowledge.) But this doesn't require
> hardware scheduling, just multiple runnable threads.
>
> BRAM
>
> > Is the BRAM and DRAM on the xilinx parts useable for a
> > partitioned process
> > cache? Could each process context get its own space to store
> > registers and a
> > small amount of data? Then problems of thrashing the cache
> > from high levels of
> > context switching could be mitigated.
>
> BRAM? Certainly. See www.fpgacpu.org/usenet/bb.html. For example,
>
> "* multiple register file contexts, including user/kernel/interrupt
> handler
> shadow contexts, or multiple threads' contexts; ..."
>
> "* hybrid schemes with small fast multiported register files/stacks
> using
> fine grained embedded RAMs, backed by larger multiple frame/context
> storage
> using large embedded RAM blocks, providing fast call/return and/or fast
> context switching; ...
>
> "* per-task state tables, including priorities, task state, next-task
> info,
> attributes, and masks; fast dedicated thread local storage; ..."
>
> Now, separate (per process) partitions in caches may well provide more
> predictable latencies to each task, but on a workload where task
> switching is not *so* frequent, throughput would presumably be higher
> with one much larger unified cache (thrashed equally by all tasks :-).
> (Also observe that a partitioned data cache is a headache for mutable
> data -- what if several partitions each have the same line of data
> cached, and then one thread modifies its copy of the line?)
>
> I suppose a lot depends upon how you value low interrupt or task switch
> latency relative to throughput.
>
> By the way, there is no DRAM per se on Xilinx parts.
>
> MORE HARDWARE THREADING
>
> See also the recent context switching thread,
> http://groups.yahoo.com/group/fpga-cpu/message/746.
>
> Perhaps someone up to speed on the Transputer design can let us know the
> details of its task switching design, overheads, and how it did the CSP
> rendezvous:
>
> channel!datum -- send datum to receiver
> channel?datum -- receive datum from sender
>
> aJile aJ-100: http://www.ajile.com/downloads/aJ-100_PR3.pdf:
>
> "Hardware based Java Threading Support
> ˛ Hard real-time, multi-threading kernel in hardware
> ˛ Threading operations are atomic including true Java synchronization
> ˛ Built-in deterministic scheduling queues
> ˛ Thread to thread yield in less than 1µsec
> ˛ Eliminates traditional RTOS layer"
>
> See also http://citeseer.nj.nec.com/fiske95thread.html, Fiske: "Thread
> Prioritization: A Thread Scheduling Mechanism for Multiple-Context
> Parallel Processors".
>
> See also Ekanadham et al, "An Architecture for Generalized
> Synchronization and Fast Switching" in chapter 12 of Iannucci et al,
> "Multithreaded Computer Architecture: A Summary of the State of the
> Art", Kluwer, 1994 (http://www.wkap.nl/prod/b/0-7923-9477-1?a=1).
>
> Jan Gray, Gray Research LLC
>
> To post a message, send it to:
> To unsubscribe, send a blank message to:





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: hardware cpu scheduler, pointers to papers? - Goran Bilski - Mar 22 10:50:00 2002

I didn't read through my previous message.
What I mean is that if you really need kernel services done in HW,
then implement them as a peripheral to the processor instead of forcing into the
processor pipeline.
The extra speed that are achieve with moving the services into the processor core is
lost in lower
fmax for the processor.

Sorry for the confusion
Göran

Goran Bilski wrote:

> A much better idea is to have a small RTOS kernel with all the kernel services in
> hardware.
> You will not mess up the pipeline of the processor but the whole
> scheduling/message queues/....
> is done in hardware.
> The only benefit of moving in the kernel functions into the processor is that one
> can save time on
> the register bank switch.
> But the overall cost is to much.
>
> i.e. 2000 task switch/s and for each task switch you will save (32+32)*2 downto
> maybe (32*32)*1 (unless
> all task register is stored in shadow register files but that will definitely
> kill the FMax)
> so you will save 2000*64 clock cycles => 128 000 clock cycles, if you running
> microblaze at 125 MHZ
> that will save you 0.1% in performance.
> Big deal and the added stuff into the pipeline is probably costing you around 10%
> performance degradation.
>
> Göran
>
> Jan Gray wrote:
>
> > XYRON
> >
> > > I just saw a reference to a company in Washington state that
> > > sells an ip core that has a hardware scheduler.
> >
> > That would be Xyron Semi of Vancouver, WA. I saw their booth at the
> > Embedded Systems Conf.
> >
> > See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's
> > innovation, no processor has ever included the entire interrupt and task
> > switching operations in hardware."
> >
> > Xyron offers a DLX ISA (if I recall correctly) soft core with their
> > "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a
> > Virtex-II (no speed grade specified, no LUT count specified).
> > (MicroBlaze runs at 125 MHz.)
> >
> > Their literature also states this ZOTS equipped FPGA CPU provides the
> > equivalent of 16 DMA channels -- this reminded me of the xr16's 16
> > PCs/DMA address registers.
> >
> > But is it profitable to add hardware, possibly inserting multiplexers
> > into critical paths, etc. to reduce the latency and overhead of a time
> > consuming operation such as task switching? You must do the math: for
> > your app, what are the overheads of interrupts, context switches,
> > scheduling, etc., compared to what is the throughput (1/cycle time) or
> > area impact of the additional hardware in non-task-switching code? And
> > what are your interrupt and switching *latency* requirements and can
> > they be met in software?
> >
> > (Observe that if you can reduce and/or bound the interrupt response
> > latency you may be able to simplify your peripheral designs, provide
> > smaller FIFOs, etc.)
> >
> > Of course, a 100% hardware tasking approach may well be less flexible
> > than a software or hybrid approach.
> >
> > Finally (and I don't know if "ZOTS" can be used this way) these days
> > when an L2 cache miss may cost 100 issue slots or more, it may be
> > profitable to switch threads on a cache miss or similar event, if the
> > overhead is low enough. (Common knowledge.) But this doesn't require
> > hardware scheduling, just multiple runnable threads.
> >
> > BRAM
> >
> > > Is the BRAM and DRAM on the xilinx parts useable for a
> > > partitioned process
> > > cache? Could each process context get its own space to store
> > > registers and a
> > > small amount of data? Then problems of thrashing the cache
> > > from high levels of
> > > context switching could be mitigated.
> >
> > BRAM? Certainly. See www.fpgacpu.org/usenet/bb.html. For example,
> >
> > "* multiple register file contexts, including user/kernel/interrupt
> > handler
> > shadow contexts, or multiple threads' contexts; ..."
> >
> > "* hybrid schemes with small fast multiported register files/stacks
> > using
> > fine grained embedded RAMs, backed by larger multiple frame/context
> > storage
> > using large embedded RAM blocks, providing fast call/return and/or fast
> > context switching; ...
> >
> > "* per-task state tables, including priorities, task state, next-task
> > info,
> > attributes, and masks; fast dedicated thread local storage; ..."
> >
> > Now, separate (per process) partitions in caches may well provide more
> > predictable latencies to each task, but on a workload where task
> > switching is not *so* frequent, throughput would presumably be higher
> > with one much larger unified cache (thrashed equally by all tasks :-).
> > (Also observe that a partitioned data cache is a headache for mutable
> > data -- what if several partitions each have the same line of data
> > cached, and then one thread modifies its copy of the line?)
> >
> > I suppose a lot depends upon how you value low interrupt or task switch
> > latency relative to throughput.
> >
> > By the way, there is no DRAM per se on Xilinx parts.
> >
> > MORE HARDWARE THREADING
> >
> > See also the recent context switching thread,
> > http://groups.yahoo.com/group/fpga-cpu/message/746.
> >
> > Perhaps someone up to speed on the Transputer design can let us know the
> > details of its task switching design, overheads, and how it did the CSP
> > rendezvous:
> >
> > channel!datum -- send datum to receiver
> > channel?datum -- receive datum from sender
> >
> > aJile aJ-100: http://www.ajile.com/downloads/aJ-100_PR3.pdf:
> >
> > "Hardware based Java Threading Support
> > ˛ Hard real-time, multi-threading kernel in hardware
> > ˛ Threading operations are atomic including true Java synchronization
> > ˛ Built-in deterministic scheduling queues
> > ˛ Thread to thread yield in less than 1µsec
> > ˛ Eliminates traditional RTOS layer"
> >
> > See also http://citeseer.nj.nec.com/fiske95thread.html, Fiske: "Thread
> > Prioritization: A Thread Scheduling Mechanism for Multiple-Context
> > Parallel Processors".
> >
> > See also Ekanadham et al, "An Architecture for Generalized
> > Synchronization and Fast Switching" in chapter 12 of Iannucci et al,
> > "Multithreaded Computer Architecture: A Summary of the State of the
> > Art", Kluwer, 1994 (http://www.wkap.nl/prod/b/0-7923-9477-1?a=1).
> >
> > Jan Gray, Gray Research LLC
> >
> > To post a message, send it to:
> > To unsubscribe, send a blank message to:
> >
> >
>
> To post a message, send it to:
> To unsubscribe, send a blank message to:





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: hardware cpu scheduler, pointers to papers? - Ben Franchuk - Mar 22 11:20:00 2002

Jan Gray wrote:

> See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's
> innovation, no processor has ever included the entire interrupt and task
> switching operations in hardware."

Did not some of the early (1960's) machines do that. Also microcoded
machines
did they not do that already?

> Xyron offers a DLX ISA (if I recall correctly) soft core with their
> "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a
> Virtex-II (no speed grade specified, no LUT count specified).
> (MicroBlaze runs at 125 MHz.)

If they are going after a real time market, raw speed of the memory not
the cpu could be the limiting factor.

> Their literature also states this ZOTS equipped FPGA CPU provides the
> equivalent of 16 DMA channels -- this reminded me of the xr16's 16
> PCs/DMA address registers.
>
> But is it profitable to add hardware, possibly inserting multiplexers
> into critical paths, etc. to reduce the latency and overhead of a time
> consuming operation such as task switching? You must do the math: for
> your app, what are the overheads of interrupts, context switches,
> scheduling, etc., compared to what is the throughput (1/cycle time) or
> area impact of the additional hardware in non-task-switching code? And
> what are your interrupt and switching *latency* requirements and can
> they be met in software?

This is really the realm of the entire computer system design. One can't
judge that until they know the cpu's use. Real time control is a
different design than a server network or a desktop PC.

> (Observe that if you can reduce and/or bound the interrupt response
> latency you may be able to simplify your peripheral designs, provide
> smaller FIFOs, etc.)
> Of course, a 100% hardware tasking approach may well be less flexible
> than a software or hybrid approach.

But lets not forget most peripheral devices are well defined
today,(Ignoring win products) thus I/O is not likey to have a major
change of usage.

> Finally (and I don't know if "ZOTS" can be used this way) these days
> when an L2 cache miss may cost 100 issue slots or more, it may be
> profitable to switch threads on a cache miss or similar event, if the
> overhead is low enough. (Common knowledge.) But this doesn't require
> hardware scheduling, just multiple runnable threads. It is not the taskswitching that is hard to implement, it is the
re-scheduling
of tasks that is the slow down and how do you specify different
algorithms for different usage -- a real time OS, a server OS , a
desktop OS in hardware?
--
Ben Franchuk - Dawn * 12/24 bit cpu *
www.jetnet.ab.ca/users/bfranchuk/index.html





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

RE: hardware cpu scheduler, pointers to papers? - Devu Pandit - Mar 22 12:21:00 2002

Folks,

So far, I have just been a quiet observer to this group. But since you
started writing about Xyron, I thought I'd contribute a little bit. I
haven't looked at all the e-mails in detail that you guys have sent but
there are some .pdf datasheets I would like to send to you.

In a nutshell, we are a start-up company, currently about 27 people with
technology called ZOTS (Zero Overhead Task Switch). ZOTS moves the task
management and task switching functionality of a microprocessor from
software to hardware. With ZOTS, your microprocessor literally spends 100%
of its time processing tasks as there is no interrupt or task switching
overhead. Latency is also reduced to at most 5 clock cycles, with an
average of 2 or 3 clock cycles.

Please respond back to me personally so I can send you some more information
about our technology. I strongly encourage you to also check out our
website, http://www.xyronsemi.com.

Regards,

Devu

-----Original Message-----
From: Ben Franchuk [mailto:]
Sent: Friday, March 22, 2002 8:20 AM
To:
Subject: Re: [fpga-cpu] hardware cpu scheduler, pointers to papers? Jan Gray wrote:

> See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's
> innovation, no processor has ever included the entire interrupt and
> task switching operations in hardware."

Did not some of the early (1960's) machines do that. Also microcoded
machines did they not do that already?

> Xyron offers a DLX ISA (if I recall correctly) soft core with their
> "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a
> Virtex-II (no speed grade specified, no LUT count specified).
> (MicroBlaze runs at 125 MHz.)

If they are going after a real time market, raw speed of the memory not the
cpu could be the limiting factor.

> Their literature also states this ZOTS equipped FPGA CPU provides the
> equivalent of 16 DMA channels -- this reminded me of the xr16's 16
> PCs/DMA address registers.
>
> But is it profitable to add hardware, possibly inserting multiplexers
> into critical paths, etc. to reduce the latency and overhead of a time
> consuming operation such as task switching? You must do the math: for
> your app, what are the overheads of interrupts, context switches,
> scheduling, etc., compared to what is the throughput (1/cycle time) or
> area impact of the additional hardware in non-task-switching code?
> And what are your interrupt and switching *latency* requirements and
> can they be met in software?

This is really the realm of the entire computer system design. One can't
judge that until they know the cpu's use. Real time control is a different
design than a server network or a desktop PC.

> (Observe that if you can reduce and/or bound the interrupt response
> latency you may be able to simplify your peripheral designs, provide
> smaller FIFOs, etc.) Of course, a 100% hardware tasking approach may
> well be less flexible than a software or hybrid approach.

But lets not forget most peripheral devices are well defined today,(Ignoring
win products) thus I/O is not likey to have a major change of usage.

> Finally (and I don't know if "ZOTS" can be used this way) these days
> when an L2 cache miss may cost 100 issue slots or more, it may be
> profitable to switch threads on a cache miss or similar event, if the
> overhead is low enough. (Common knowledge.) But this doesn't require
> hardware scheduling, just multiple runnable threads. It is not the taskswitching that is hard to implement, it is the
re-scheduling of tasks that is the slow down and how do you specify
different algorithms for different usage -- a real time OS, a server OS , a
desktop OS in hardware?
--
Ben Franchuk - Dawn * 12/24 bit cpu *
www.jetnet.ab.ca/users/bfranchuk/index.html

To post a message, send it to:
To unsubscribe, send a blank message to:




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

RE: hardware cpu scheduler, pointers to papers? - Scott Newell - Mar 22 12:34:00 2002

>technology called ZOTS (Zero Overhead Task Switch). ZOTS moves the task
>management and task switching functionality of a microprocessor from
>software to hardware. With ZOTS, your microprocessor literally spends 100%

I once had the wild idea of task switching processor cores in and out of an
FPGA. Imagine an RTOS where each task not only gets its own processor
core, but each task can have a _different_ core...one runs a Z-80 clone,
another running a PIC clone, etc.

Never said it was a good idea... newell





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: hardware cpu scheduler, pointers to papers? - Goran Bilski - Mar 22 12:40:00 2002

Hi,

This is nothing new.

There is a company is Sweden RealFast that has kernel in HW for FPGA, they have
been doing this for decades.
They have ported OSE and VxWorks to run on top of their HW kernel. I also think
that they are working an Linux port.

Their website is http://www.realfast.se/rfos/products/s16.shtml

Göran

Devu Pandit wrote:

> Folks,
>
> So far, I have just been a quiet observer to this group. But since you
> started writing about Xyron, I thought I'd contribute a little bit. I
> haven't looked at all the e-mails in detail that you guys have sent but
> there are some .pdf datasheets I would like to send to you.
>
> In a nutshell, we are a start-up company, currently about 27 people with
> technology called ZOTS (Zero Overhead Task Switch). ZOTS moves the task
> management and task switching functionality of a microprocessor from
> software to hardware. With ZOTS, your microprocessor literally spends 100%
> of its time processing tasks as there is no interrupt or task switching
> overhead. Latency is also reduced to at most 5 clock cycles, with an
> average of 2 or 3 clock cycles.
>
> Please respond back to me personally so I can send you some more information
> about our technology. I strongly encourage you to also check out our
> website, http://www.xyronsemi.com.
>
> Regards,
>
> Devu
>
> -----Original Message-----
> From: Ben Franchuk [mailto:]
> Sent: Friday, March 22, 2002 8:20 AM
> To:
> Subject: Re: [fpga-cpu] hardware cpu scheduler, pointers to papers?
>
> Jan Gray wrote:
>
> > See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's
> > innovation, no processor has ever included the entire interrupt and
> > task switching operations in hardware."
>
> Did not some of the early (1960's) machines do that. Also microcoded
> machines did they not do that already?
>
> > Xyron offers a DLX ISA (if I recall correctly) soft core with their
> > "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a
> > Virtex-II (no speed grade specified, no LUT count specified).
> > (MicroBlaze runs at 125 MHz.)
>
> If they are going after a real time market, raw speed of the memory not the
> cpu could be the limiting factor.
>
> > Their literature also states this ZOTS equipped FPGA CPU provides the
> > equivalent of 16 DMA channels -- this reminded me of the xr16's 16
> > PCs/DMA address registers.
> >
> > But is it profitable to add hardware, possibly inserting multiplexers
> > into critical paths, etc. to reduce the latency and overhead of a time
> > consuming operation such as task switching? You must do the math: for
> > your app, what are the overheads of interrupts, context switches,
> > scheduling, etc., compared to what is the throughput (1/cycle time) or
> > area impact of the additional hardware in non-task-switching code?
> > And what are your interrupt and switching *latency* requirements and
> > can they be met in software?
>
> This is really the realm of the entire computer system design. One can't
> judge that until they know the cpu's use. Real time control is a different
> design than a server network or a desktop PC.
>
> > (Observe that if you can reduce and/or bound the interrupt response
> > latency you may be able to simplify your peripheral designs, provide
> > smaller FIFOs, etc.) Of course, a 100% hardware tasking approach may
> > well be less flexible than a software or hybrid approach.
>
> But lets not forget most peripheral devices are well defined today,(Ignoring
> win products) thus I/O is not likey to have a major change of usage.
>
> > Finally (and I don't know if "ZOTS" can be used this way) these days
> > when an L2 cache miss may cost 100 issue slots or more, it may be
> > profitable to switch threads on a cache miss or similar event, if the
> > overhead is low enough. (Common knowledge.) But this doesn't require
> > hardware scheduling, just multiple runnable threads.
> >
>
> It is not the taskswitching that is hard to implement, it is the
> re-scheduling of tasks that is the slow down and how do you specify
> different algorithms for different usage -- a real time OS, a server OS , a
> desktop OS in hardware?
> --
> Ben Franchuk - Dawn * 12/24 bit cpu *
> www.jetnet.ab.ca/users/bfranchuk/index.html
>
> To post a message, send it to:
> To unsubscribe, send a blank message to: >
>
> To post a message, send it to:
> To unsubscribe, send a blank message to:





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: hardware cpu scheduler, pointers to papers? - Ben Franchuk - Mar 22 12:56:00 2002

Scott Newell wrote:

> I once had the wild idea of task switching processor cores in and out of an
> FPGA. Imagine an RTOS where each task not only gets its own processor
> core, but each task can have a _different_ core...one runs a Z-80 clone,
> another running a PIC clone, etc.

At one time I had planned on doing a task switching on my cpu by
register bank select but the fact that I needed to also store the flag's
and other current status like the MAR and MDR and IR killed that idea. I
still like the idea of one TASK , one CPU and memory. No need for IRQ's
or task swiching as waiting loops would work fine.Timer flags would be
needed how ever.
--
Ben Franchuk - Dawn * 12/24 bit cpu *
www.jetnet.ab.ca/users/bfranchuk/index.html





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: hardware cpu scheduler, pointers to papers? - Author Unknown - Mar 22 17:49:00 2002

"Jan Gray" <> writes:

> By the way, there is no DRAM per se on Xilinx parts.

I'm guessing Sean used DRAM as an abbreviation for "Distributed RAM".

Carl Witty





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

RE: hardware cpu scheduler, pointers to papers? - Devu Pandit - Mar 25 13:10:00 2002

The principle of HW task management is similar. However, the RealFast
solution still has significant overhead. With the Xyron solution, the
overhead is literally zero cycles.

-----Original Message-----
From: Goran Bilski [mailto:]
Sent: Friday, March 22, 2002 9:41 AM
To:
Subject: Re: [fpga-cpu] hardware cpu scheduler, pointers to papers? Hi,

This is nothing new.

There is a company is Sweden RealFast that has kernel in HW for FPGA, they
have been doing this for decades. They have ported OSE and VxWorks to run on
top of their HW kernel. I also think that they are working an Linux port.

Their website is http://www.realfast.se/rfos/products/s16.shtml

Göran

Devu Pandit wrote:

> Folks,
>
> So far, I have just been a quiet observer to this group. But since
> you started writing about Xyron, I thought I'd contribute a little
> bit. I haven't looked at all the e-mails in detail that you guys have
> sent but there are some .pdf datasheets I would like to send to you.
>
> In a nutshell, we are a start-up company, currently about 27 people
> with technology called ZOTS (Zero Overhead Task Switch). ZOTS moves
> the task management and task switching functionality of a
> microprocessor from software to hardware. With ZOTS, your
> microprocessor literally spends 100% of its time processing tasks as
> there is no interrupt or task switching overhead. Latency is also
> reduced to at most 5 clock cycles, with an average of 2 or 3 clock
> cycles.
>
> Please respond back to me personally so I can send you some more
> information about our technology. I strongly encourage you to also
> check out our website, http://www.xyronsemi.com.
>
> Regards,
>
> Devu
>
> -----Original Message-----
> From: Ben Franchuk [mailto:]
> Sent: Friday, March 22, 2002 8:20 AM
> To:
> Subject: Re: [fpga-cpu] hardware cpu scheduler, pointers to papers?
>
> Jan Gray wrote:
>
> > See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's
> > innovation, no processor has ever included the entire interrupt and
> > task switching operations in hardware."
>
> Did not some of the early (1960's) machines do that. Also microcoded
> machines did they not do that already?
>
> > Xyron offers a DLX ISA (if I recall correctly) soft core with their
> > "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a
> > Virtex-II (no speed grade specified, no LUT count specified).
> > (MicroBlaze runs at 125 MHz.)
>
> If they are going after a real time market, raw speed of the memory
> not the cpu could be the limiting factor.
>
> > Their literature also states this ZOTS equipped FPGA CPU provides
> > the equivalent of 16 DMA channels -- this reminded me of the xr16's
> > 16 PCs/DMA address registers.
> >
> > But is it profitable to add hardware, possibly inserting
> > multiplexers into critical paths, etc. to reduce the latency and
> > overhead of a time consuming operation such as task switching? You
> > must do the math: for your app, what are the overheads of
> > interrupts, context switches, scheduling, etc., compared to what is
> > the throughput (1/cycle time) or area impact of the additional
> > hardware in non-task-switching code? And what are your interrupt and
> > switching *latency* requirements and can they be met in software?
>
> This is really the realm of the entire computer system design. One
> can't judge that until they know the cpu's use. Real time control is a
> different design than a server network or a desktop PC.
>
> > (Observe that if you can reduce and/or bound the interrupt response
> > latency you may be able to simplify your peripheral designs, provide
> > smaller FIFOs, etc.) Of course, a 100% hardware tasking approach may
> > well be less flexible than a software or hybrid approach.
>
> But lets not forget most peripheral devices are well defined
> today,(Ignoring win products) thus I/O is not likey to have a major
> change of usage.
>
> > Finally (and I don't know if "ZOTS" can be used this way) these days
> > when an L2 cache miss may cost 100 issue slots or more, it may be
> > profitable to switch threads on a cache miss or similar event, if
> > the overhead is low enough. (Common knowledge.) But this doesn't
> > require hardware scheduling, just multiple runnable threads.
> >
>
> It is not the taskswitching that is hard to implement, it is the
> re-scheduling of tasks that is the slow down and how do you specify
> different algorithms for different usage -- a real time OS, a server
> OS , a desktop OS in hardware?
> --
> Ben Franchuk - Dawn * 12/24 bit cpu *
> www.jetnet.ab.ca/users/bfranchuk/index.html
>
> To post a message, send it to:
> To unsubscribe, send a blank message to: >
>
> To post a message, send it to:
> To unsubscribe, send a blank message to: To post a message, send it to:
To unsubscribe, send a blank message to:





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: hardware cpu scheduler, pointers to papers? - Author Unknown - Mar 25 18:04:00 2002

Ben Franchuk <> writes:

> At one time I had planned on doing a task switching on my cpu by
> register bank select but the fact that I needed to also store the flag's
> and other current status like the MAR and MDR and IR killed that idea. I
> still like the idea of one TASK , one CPU and memory. No need for IRQ's
> or task swiching as waiting loops would work fine.Timer flags would be
> needed how ever.

I'm in the middle of designing a CPU with nonpreemptive task switching
in the CPU. I switch the program counter between (up to) 16 different
tasks. Since I switch only under program control, I don't need to
save flags (the code doesn't trust the flags after a task switch
instruction) or registers. I plan to have most of the registers be
"scratch", but a task could also have reserved registers that no other
task touches.

I haven't written much code for the thing yet (and it doesn't work
yet), but I think it will work well for the target application.

Carl Witty




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )