This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).
|
I have seen some references to hardware schedulers but can't recall any of the links at this time. I am familiar with the transputer hardware thread arch. I was wondering if anyone has experience with hardware schedulers and built in cpu virtualization. I just saw a reference to a company in Washington state that sells an ip core that has a hardware scheduler. Didn't bookmark them and I google for them... dang. Some of my questions are. How many context switches per second could they handle? What about code running in an interrupt or un interruptable code, polling the serial port, etc? How about interfacing the cache with the scheduler, ie switching to a different context while data is fetched from the main memory to local cache kinda like the Cray MTA design. Is the BRAM and DRAM on the xilinx parts useable for a partitioned process cache? Could each process context get its own space to store registers and a small amount of data? Then problems of thrashing the cache from high levels of context switching could be mitigated. Thanks, Sean. |
|
|
|
XYRON > I just saw a reference to a company in Washington state that > sells an ip core that has a hardware scheduler. That would be Xyron Semi of Vancouver, WA. I saw their booth at the Embedded Systems Conf. See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's innovation, no processor has ever included the entire interrupt and task switching operations in hardware." Xyron offers a DLX ISA (if I recall correctly) soft core with their "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a Virtex-II (no speed grade specified, no LUT count specified). (MicroBlaze runs at 125 MHz.) Their literature also states this ZOTS equipped FPGA CPU provides the equivalent of 16 DMA channels -- this reminded me of the xr16's 16 PCs/DMA address registers. But is it profitable to add hardware, possibly inserting multiplexers into critical paths, etc. to reduce the latency and overhead of a time consuming operation such as task switching? You must do the math: for your app, what are the overheads of interrupts, context switches, scheduling, etc., compared to what is the throughput (1/cycle time) or area impact of the additional hardware in non-task-switching code? And what are your interrupt and switching *latency* requirements and can they be met in software? (Observe that if you can reduce and/or bound the interrupt response latency you may be able to simplify your peripheral designs, provide smaller FIFOs, etc.) Of course, a 100% hardware tasking approach may well be less flexible than a software or hybrid approach. Finally (and I don't know if "ZOTS" can be used this way) these days when an L2 cache miss may cost 100 issue slots or more, it may be profitable to switch threads on a cache miss or similar event, if the overhead is low enough. (Common knowledge.) But this doesn't require hardware scheduling, just multiple runnable threads. BRAM > Is the BRAM and DRAM on the xilinx parts useable for a > partitioned process > cache? Could each process context get its own space to store > registers and a > small amount of data? Then problems of thrashing the cache > from high levels of > context switching could be mitigated. BRAM? Certainly. See www.fpgacpu.org/usenet/bb.html. For example, "* multiple register file contexts, including user/kernel/interrupt handler shadow contexts, or multiple threads' contexts; ..." "* hybrid schemes with small fast multiported register files/stacks using fine grained embedded RAMs, backed by larger multiple frame/context storage using large embedded RAM blocks, providing fast call/return and/or fast context switching; ... "* per-task state tables, including priorities, task state, next-task info, attributes, and masks; fast dedicated thread local storage; ..." Now, separate (per process) partitions in caches may well provide more predictable latencies to each task, but on a workload where task switching is not *so* frequent, throughput would presumably be higher with one much larger unified cache (thrashed equally by all tasks :-). (Also observe that a partitioned data cache is a headache for mutable data -- what if several partitions each have the same line of data cached, and then one thread modifies its copy of the line?) I suppose a lot depends upon how you value low interrupt or task switch latency relative to throughput. By the way, there is no DRAM per se on Xilinx parts. MORE HARDWARE THREADING See also the recent context switching thread, http://groups.yahoo.com/group/fpga-cpu/message/746. Perhaps someone up to speed on the Transputer design can let us know the details of its task switching design, overheads, and how it did the CSP rendezvous: channel!datum -- send datum to receiver channel?datum -- receive datum from sender aJile aJ-100: http://www.ajile.com/downloads/aJ-100_PR3.pdf: "Hardware based Java Threading Support ˛ Hard real-time, multi-threading kernel in hardware ˛ Threading operations are atomic including true Java synchronization ˛ Built-in deterministic scheduling queues ˛ Thread to thread yield in less than 1µsec ˛ Eliminates traditional RTOS layer" See also http://citeseer.nj.nec.com/fiske95thread.html, Fiske: "Thread Prioritization: A Thread Scheduling Mechanism for Multiple-Context Parallel Processors". See also Ekanadham et al, "An Architecture for Generalized Synchronization and Fast Switching" in chapter 12 of Iannucci et al, "Multithreaded Computer Architecture: A Summary of the State of the Art", Kluwer, 1994 (http://www.wkap.nl/prod/b/0-7923-9477-1?a=1). Jan Gray, Gray Research LLC |
|
A much better idea is to have a small RTOS kernel with all the kernel services in hardware. You will not mess up the pipeline of the processor but the whole scheduling/message queues/.... is done in hardware. The only benefit of moving in the kernel functions into the processor is that one can save time on the register bank switch. But the overall cost is to much. i.e. 2000 task switch/s and for each task switch you will save (32+32)*2 downto maybe (32*32)*1 (unless all task register is stored in shadow register files but that will definitely kill the FMax) so you will save 2000*64 clock cycles => 128 000 clock cycles, if you running microblaze at 125 MHZ that will save you 0.1% in performance. Big deal and the added stuff into the pipeline is probably costing you around 10% performance degradation. Göran Jan Gray wrote: > XYRON > > > I just saw a reference to a company in Washington state that > > sells an ip core that has a hardware scheduler. > > That would be Xyron Semi of Vancouver, WA. I saw their booth at the > Embedded Systems Conf. > > See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's > innovation, no processor has ever included the entire interrupt and task > switching operations in hardware." > > Xyron offers a DLX ISA (if I recall correctly) soft core with their > "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a > Virtex-II (no speed grade specified, no LUT count specified). > (MicroBlaze runs at 125 MHz.) > > Their literature also states this ZOTS equipped FPGA CPU provides the > equivalent of 16 DMA channels -- this reminded me of the xr16's 16 > PCs/DMA address registers. > > But is it profitable to add hardware, possibly inserting multiplexers > into critical paths, etc. to reduce the latency and overhead of a time > consuming operation such as task switching? You must do the math: for > your app, what are the overheads of interrupts, context switches, > scheduling, etc., compared to what is the throughput (1/cycle time) or > area impact of the additional hardware in non-task-switching code? And > what are your interrupt and switching *latency* requirements and can > they be met in software? > > (Observe that if you can reduce and/or bound the interrupt response > latency you may be able to simplify your peripheral designs, provide > smaller FIFOs, etc.) > > Of course, a 100% hardware tasking approach may well be less flexible > than a software or hybrid approach. > > Finally (and I don't know if "ZOTS" can be used this way) these days > when an L2 cache miss may cost 100 issue slots or more, it may be > profitable to switch threads on a cache miss or similar event, if the > overhead is low enough. (Common knowledge.) But this doesn't require > hardware scheduling, just multiple runnable threads. > > BRAM > > > Is the BRAM and DRAM on the xilinx parts useable for a > > partitioned process > > cache? Could each process context get its own space to store > > registers and a > > small amount of data? Then problems of thrashing the cache > > from high levels of > > context switching could be mitigated. > > BRAM? Certainly. See www.fpgacpu.org/usenet/bb.html. For example, > > "* multiple register file contexts, including user/kernel/interrupt > handler > shadow contexts, or multiple threads' contexts; ..." > > "* hybrid schemes with small fast multiported register files/stacks > using > fine grained embedded RAMs, backed by larger multiple frame/context > storage > using large embedded RAM blocks, providing fast call/return and/or fast > context switching; ... > > "* per-task state tables, including priorities, task state, next-task > info, > attributes, and masks; fast dedicated thread local storage; ..." > > Now, separate (per process) partitions in caches may well provide more > predictable latencies to each task, but on a workload where task > switching is not *so* frequent, throughput would presumably be higher > with one much larger unified cache (thrashed equally by all tasks :-). > (Also observe that a partitioned data cache is a headache for mutable > data -- what if several partitions each have the same line of data > cached, and then one thread modifies its copy of the line?) > > I suppose a lot depends upon how you value low interrupt or task switch > latency relative to throughput. > > By the way, there is no DRAM per se on Xilinx parts. > > MORE HARDWARE THREADING > > See also the recent context switching thread, > http://groups.yahoo.com/group/fpga-cpu/message/746. > > Perhaps someone up to speed on the Transputer design can let us know the > details of its task switching design, overheads, and how it did the CSP > rendezvous: > > channel!datum -- send datum to receiver > channel?datum -- receive datum from sender > > aJile aJ-100: http://www.ajile.com/downloads/aJ-100_PR3.pdf: > > "Hardware based Java Threading Support > ˛ Hard real-time, multi-threading kernel in hardware > ˛ Threading operations are atomic including true Java synchronization > ˛ Built-in deterministic scheduling queues > ˛ Thread to thread yield in less than 1µsec > ˛ Eliminates traditional RTOS layer" > > See also http://citeseer.nj.nec.com/fiske95thread.html, Fiske: "Thread > Prioritization: A Thread Scheduling Mechanism for Multiple-Context > Parallel Processors". > > See also Ekanadham et al, "An Architecture for Generalized > Synchronization and Fast Switching" in chapter 12 of Iannucci et al, > "Multithreaded Computer Architecture: A Summary of the State of the > Art", Kluwer, 1994 (http://www.wkap.nl/prod/b/0-7923-9477-1?a=1). > > Jan Gray, Gray Research LLC > > To post a message, send it to: > To unsubscribe, send a blank message to: |
|
I didn't read through my previous message. What I mean is that if you really need kernel services done in HW, then implement them as a peripheral to the processor instead of forcing into the processor pipeline. The extra speed that are achieve with moving the services into the processor core is lost in lower fmax for the processor. Sorry for the confusion Göran Goran Bilski wrote: > A much better idea is to have a small RTOS kernel with all the kernel services in > hardware. > You will not mess up the pipeline of the processor but the whole > scheduling/message queues/.... > is done in hardware. > The only benefit of moving in the kernel functions into the processor is that one > can save time on > the register bank switch. > But the overall cost is to much. > > i.e. 2000 task switch/s and for each task switch you will save (32+32)*2 downto > maybe (32*32)*1 (unless > all task register is stored in shadow register files but that will definitely > kill the FMax) > so you will save 2000*64 clock cycles => 128 000 clock cycles, if you running > microblaze at 125 MHZ > that will save you 0.1% in performance. > Big deal and the added stuff into the pipeline is probably costing you around 10% > performance degradation. > > Göran > > Jan Gray wrote: > > > XYRON > > > > > I just saw a reference to a company in Washington state that > > > sells an ip core that has a hardware scheduler. > > > > That would be Xyron Semi of Vancouver, WA. I saw their booth at the > > Embedded Systems Conf. > > > > See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's > > innovation, no processor has ever included the entire interrupt and task > > switching operations in hardware." > > > > Xyron offers a DLX ISA (if I recall correctly) soft core with their > > "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a > > Virtex-II (no speed grade specified, no LUT count specified). > > (MicroBlaze runs at 125 MHz.) > > > > Their literature also states this ZOTS equipped FPGA CPU provides the > > equivalent of 16 DMA channels -- this reminded me of the xr16's 16 > > PCs/DMA address registers. > > > > But is it profitable to add hardware, possibly inserting multiplexers > > into critical paths, etc. to reduce the latency and overhead of a time > > consuming operation such as task switching? You must do the math: for > > your app, what are the overheads of interrupts, context switches, > > scheduling, etc., compared to what is the throughput (1/cycle time) or > > area impact of the additional hardware in non-task-switching code? And > > what are your interrupt and switching *latency* requirements and can > > they be met in software? > > > > (Observe that if you can reduce and/or bound the interrupt response > > latency you may be able to simplify your peripheral designs, provide > > smaller FIFOs, etc.) > > > > Of course, a 100% hardware tasking approach may well be less flexible > > than a software or hybrid approach. > > > > Finally (and I don't know if "ZOTS" can be used this way) these days > > when an L2 cache miss may cost 100 issue slots or more, it may be > > profitable to switch threads on a cache miss or similar event, if the > > overhead is low enough. (Common knowledge.) But this doesn't require > > hardware scheduling, just multiple runnable threads. > > > > BRAM > > > > > Is the BRAM and DRAM on the xilinx parts useable for a > > > partitioned process > > > cache? Could each process context get its own space to store > > > registers and a > > > small amount of data? Then problems of thrashing the cache > > > from high levels of > > > context switching could be mitigated. > > > > BRAM? Certainly. See www.fpgacpu.org/usenet/bb.html. For example, > > > > "* multiple register file contexts, including user/kernel/interrupt > > handler > > shadow contexts, or multiple threads' contexts; ..." > > > > "* hybrid schemes with small fast multiported register files/stacks > > using > > fine grained embedded RAMs, backed by larger multiple frame/context > > storage > > using large embedded RAM blocks, providing fast call/return and/or fast > > context switching; ... > > > > "* per-task state tables, including priorities, task state, next-task > > info, > > attributes, and masks; fast dedicated thread local storage; ..." > > > > Now, separate (per process) partitions in caches may well provide more > > predictable latencies to each task, but on a workload where task > > switching is not *so* frequent, throughput would presumably be higher > > with one much larger unified cache (thrashed equally by all tasks :-). > > (Also observe that a partitioned data cache is a headache for mutable > > data -- what if several partitions each have the same line of data > > cached, and then one thread modifies its copy of the line?) > > > > I suppose a lot depends upon how you value low interrupt or task switch > > latency relative to throughput. > > > > By the way, there is no DRAM per se on Xilinx parts. > > > > MORE HARDWARE THREADING > > > > See also the recent context switching thread, > > http://groups.yahoo.com/group/fpga-cpu/message/746. > > > > Perhaps someone up to speed on the Transputer design can let us know the > > details of its task switching design, overheads, and how it did the CSP > > rendezvous: > > > > channel!datum -- send datum to receiver > > channel?datum -- receive datum from sender > > > > aJile aJ-100: http://www.ajile.com/downloads/aJ-100_PR3.pdf: > > > > "Hardware based Java Threading Support > > ˛ Hard real-time, multi-threading kernel in hardware > > ˛ Threading operations are atomic including true Java synchronization > > ˛ Built-in deterministic scheduling queues > > ˛ Thread to thread yield in less than 1µsec > > ˛ Eliminates traditional RTOS layer" > > > > See also http://citeseer.nj.nec.com/fiske95thread.html, Fiske: "Thread > > Prioritization: A Thread Scheduling Mechanism for Multiple-Context > > Parallel Processors". > > > > See also Ekanadham et al, "An Architecture for Generalized > > Synchronization and Fast Switching" in chapter 12 of Iannucci et al, > > "Multithreaded Computer Architecture: A Summary of the State of the > > Art", Kluwer, 1994 (http://www.wkap.nl/prod/b/0-7923-9477-1?a=1). > > > > Jan Gray, Gray Research LLC > > > > To post a message, send it to: > > To unsubscribe, send a blank message to: > > > > > > To post a message, send it to: > To unsubscribe, send a blank message to: |
|
Jan Gray wrote: > See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's > innovation, no processor has ever included the entire interrupt and task > switching operations in hardware." Did not some of the early (1960's) machines do that. Also microcoded machines did they not do that already? > Xyron offers a DLX ISA (if I recall correctly) soft core with their > "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a > Virtex-II (no speed grade specified, no LUT count specified). > (MicroBlaze runs at 125 MHz.) If they are going after a real time market, raw speed of the memory not the cpu could be the limiting factor. > Their literature also states this ZOTS equipped FPGA CPU provides the > equivalent of 16 DMA channels -- this reminded me of the xr16's 16 > PCs/DMA address registers. > > But is it profitable to add hardware, possibly inserting multiplexers > into critical paths, etc. to reduce the latency and overhead of a time > consuming operation such as task switching? You must do the math: for > your app, what are the overheads of interrupts, context switches, > scheduling, etc., compared to what is the throughput (1/cycle time) or > area impact of the additional hardware in non-task-switching code? And > what are your interrupt and switching *latency* requirements and can > they be met in software? This is really the realm of the entire computer system design. One can't judge that until they know the cpu's use. Real time control is a different design than a server network or a desktop PC. > (Observe that if you can reduce and/or bound the interrupt response > latency you may be able to simplify your peripheral designs, provide > smaller FIFOs, etc.) > Of course, a 100% hardware tasking approach may well be less flexible > than a software or hybrid approach. But lets not forget most peripheral devices are well defined today,(Ignoring win products) thus I/O is not likey to have a major change of usage. > Finally (and I don't know if "ZOTS" can be used this way) these days > when an L2 cache miss may cost 100 issue slots or more, it may be > profitable to switch threads on a cache miss or similar event, if the > overhead is low enough. (Common knowledge.) But this doesn't require > hardware scheduling, just multiple runnable threads. It is not the taskswitching that is hard to implement, it is the re-scheduling of tasks that is the slow down and how do you specify different algorithms for different usage -- a real time OS, a server OS , a desktop OS in hardware? -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
Folks, So far, I have just been a quiet observer to this group. But since you started writing about Xyron, I thought I'd contribute a little bit. I haven't looked at all the e-mails in detail that you guys have sent but there are some .pdf datasheets I would like to send to you. In a nutshell, we are a start-up company, currently about 27 people with technology called ZOTS (Zero Overhead Task Switch). ZOTS moves the task management and task switching functionality of a microprocessor from software to hardware. With ZOTS, your microprocessor literally spends 100% of its time processing tasks as there is no interrupt or task switching overhead. Latency is also reduced to at most 5 clock cycles, with an average of 2 or 3 clock cycles. Please respond back to me personally so I can send you some more information about our technology. I strongly encourage you to also check out our website, http://www.xyronsemi.com. Regards, Devu -----Original Message----- From: Ben Franchuk [mailto:] Sent: Friday, March 22, 2002 8:20 AM To: Subject: Re: [fpga-cpu] hardware cpu scheduler, pointers to papers? Jan Gray wrote: > See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's > innovation, no processor has ever included the entire interrupt and > task switching operations in hardware." Did not some of the early (1960's) machines do that. Also microcoded machines did they not do that already? > Xyron offers a DLX ISA (if I recall correctly) soft core with their > "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a > Virtex-II (no speed grade specified, no LUT count specified). > (MicroBlaze runs at 125 MHz.) If they are going after a real time market, raw speed of the memory not the cpu could be the limiting factor. > Their literature also states this ZOTS equipped FPGA CPU provides the > equivalent of 16 DMA channels -- this reminded me of the xr16's 16 > PCs/DMA address registers. > > But is it profitable to add hardware, possibly inserting multiplexers > into critical paths, etc. to reduce the latency and overhead of a time > consuming operation such as task switching? You must do the math: for > your app, what are the overheads of interrupts, context switches, > scheduling, etc., compared to what is the throughput (1/cycle time) or > area impact of the additional hardware in non-task-switching code? > And what are your interrupt and switching *latency* requirements and > can they be met in software? This is really the realm of the entire computer system design. One can't judge that until they know the cpu's use. Real time control is a different design than a server network or a desktop PC. > (Observe that if you can reduce and/or bound the interrupt response > latency you may be able to simplify your peripheral designs, provide > smaller FIFOs, etc.) Of course, a 100% hardware tasking approach may > well be less flexible than a software or hybrid approach. But lets not forget most peripheral devices are well defined today,(Ignoring win products) thus I/O is not likey to have a major change of usage. > Finally (and I don't know if "ZOTS" can be used this way) these days > when an L2 cache miss may cost 100 issue slots or more, it may be > profitable to switch threads on a cache miss or similar event, if the > overhead is low enough. (Common knowledge.) But this doesn't require > hardware scheduling, just multiple runnable threads. It is not the taskswitching that is hard to implement, it is the re-scheduling of tasks that is the slow down and how do you specify different algorithms for different usage -- a real time OS, a server OS , a desktop OS in hardware? -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html To post a message, send it to: To unsubscribe, send a blank message to: |
|
>technology called ZOTS (Zero Overhead Task Switch). ZOTS moves the task >management and task switching functionality of a microprocessor from >software to hardware. With ZOTS, your microprocessor literally spends 100% I once had the wild idea of task switching processor cores in and out of an FPGA. Imagine an RTOS where each task not only gets its own processor core, but each task can have a _different_ core...one runs a Z-80 clone, another running a PIC clone, etc. Never said it was a good idea... newell |
|
Hi, This is nothing new. There is a company is Sweden RealFast that has kernel in HW for FPGA, they have been doing this for decades. They have ported OSE and VxWorks to run on top of their HW kernel. I also think that they are working an Linux port. Their website is http://www.realfast.se/rfos/products/s16.shtml Göran Devu Pandit wrote: > Folks, > > So far, I have just been a quiet observer to this group. But since you > started writing about Xyron, I thought I'd contribute a little bit. I > haven't looked at all the e-mails in detail that you guys have sent but > there are some .pdf datasheets I would like to send to you. > > In a nutshell, we are a start-up company, currently about 27 people with > technology called ZOTS (Zero Overhead Task Switch). ZOTS moves the task > management and task switching functionality of a microprocessor from > software to hardware. With ZOTS, your microprocessor literally spends 100% > of its time processing tasks as there is no interrupt or task switching > overhead. Latency is also reduced to at most 5 clock cycles, with an > average of 2 or 3 clock cycles. > > Please respond back to me personally so I can send you some more information > about our technology. I strongly encourage you to also check out our > website, http://www.xyronsemi.com. > > Regards, > > Devu > > -----Original Message----- > From: Ben Franchuk [mailto:] > Sent: Friday, March 22, 2002 8:20 AM > To: > Subject: Re: [fpga-cpu] hardware cpu scheduler, pointers to papers? > > Jan Gray wrote: > > > See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's > > innovation, no processor has ever included the entire interrupt and > > task switching operations in hardware." > > Did not some of the early (1960's) machines do that. Also microcoded > machines did they not do that already? > > > Xyron offers a DLX ISA (if I recall correctly) soft core with their > > "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a > > Virtex-II (no speed grade specified, no LUT count specified). > > (MicroBlaze runs at 125 MHz.) > > If they are going after a real time market, raw speed of the memory not the > cpu could be the limiting factor. > > > Their literature also states this ZOTS equipped FPGA CPU provides the > > equivalent of 16 DMA channels -- this reminded me of the xr16's 16 > > PCs/DMA address registers. > > > > But is it profitable to add hardware, possibly inserting multiplexers > > into critical paths, etc. to reduce the latency and overhead of a time > > consuming operation such as task switching? You must do the math: for > > your app, what are the overheads of interrupts, context switches, > > scheduling, etc., compared to what is the throughput (1/cycle time) or > > area impact of the additional hardware in non-task-switching code? > > And what are your interrupt and switching *latency* requirements and > > can they be met in software? > > This is really the realm of the entire computer system design. One can't > judge that until they know the cpu's use. Real time control is a different > design than a server network or a desktop PC. > > > (Observe that if you can reduce and/or bound the interrupt response > > latency you may be able to simplify your peripheral designs, provide > > smaller FIFOs, etc.) Of course, a 100% hardware tasking approach may > > well be less flexible than a software or hybrid approach. > > But lets not forget most peripheral devices are well defined today,(Ignoring > win products) thus I/O is not likey to have a major change of usage. > > > Finally (and I don't know if "ZOTS" can be used this way) these days > > when an L2 cache miss may cost 100 issue slots or more, it may be > > profitable to switch threads on a cache miss or similar event, if the > > overhead is low enough. (Common knowledge.) But this doesn't require > > hardware scheduling, just multiple runnable threads. > > > > It is not the taskswitching that is hard to implement, it is the > re-scheduling of tasks that is the slow down and how do you specify > different algorithms for different usage -- a real time OS, a server OS , a > desktop OS in hardware? > -- > Ben Franchuk - Dawn * 12/24 bit cpu * > www.jetnet.ab.ca/users/bfranchuk/index.html > > To post a message, send it to: > To unsubscribe, send a blank message to: > > > To post a message, send it to: > To unsubscribe, send a blank message to: |
|
Scott Newell wrote: > I once had the wild idea of task switching processor cores in and out of an > FPGA. Imagine an RTOS where each task not only gets its own processor > core, but each task can have a _different_ core...one runs a Z-80 clone, > another running a PIC clone, etc. At one time I had planned on doing a task switching on my cpu by register bank select but the fact that I needed to also store the flag's and other current status like the MAR and MDR and IR killed that idea. I still like the idea of one TASK , one CPU and memory. No need for IRQ's or task swiching as waiting loops would work fine.Timer flags would be needed how ever. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
"Jan Gray" <> writes: > By the way, there is no DRAM per se on Xilinx parts. I'm guessing Sean used DRAM as an abbreviation for "Distributed RAM". Carl Witty |
|
The principle of HW task management is similar. However, the RealFast solution still has significant overhead. With the Xyron solution, the overhead is literally zero cycles. -----Original Message----- From: Goran Bilski [mailto:] Sent: Friday, March 22, 2002 9:41 AM To: Subject: Re: [fpga-cpu] hardware cpu scheduler, pointers to papers? Hi, This is nothing new. There is a company is Sweden RealFast that has kernel in HW for FPGA, they have been doing this for decades. They have ported OSE and VxWorks to run on top of their HW kernel. I also think that they are working an Linux port. Their website is http://www.realfast.se/rfos/products/s16.shtml Göran Devu Pandit wrote: > Folks, > > So far, I have just been a quiet observer to this group. But since > you started writing about Xyron, I thought I'd contribute a little > bit. I haven't looked at all the e-mails in detail that you guys have > sent but there are some .pdf datasheets I would like to send to you. > > In a nutshell, we are a start-up company, currently about 27 people > with technology called ZOTS (Zero Overhead Task Switch). ZOTS moves > the task management and task switching functionality of a > microprocessor from software to hardware. With ZOTS, your > microprocessor literally spends 100% of its time processing tasks as > there is no interrupt or task switching overhead. Latency is also > reduced to at most 5 clock cycles, with an average of 2 or 3 clock > cycles. > > Please respond back to me personally so I can send you some more > information about our technology. I strongly encourage you to also > check out our website, http://www.xyronsemi.com. > > Regards, > > Devu > > -----Original Message----- > From: Ben Franchuk [mailto:] > Sent: Friday, March 22, 2002 8:20 AM > To: > Subject: Re: [fpga-cpu] hardware cpu scheduler, pointers to papers? > > Jan Gray wrote: > > > See http://www.xyronsemi.com/DataSheets/Zots.PDF. "Prior to Xyron's > > innovation, no processor has ever included the entire interrupt and > > task switching operations in hardware." > > Did not some of the early (1960's) machines do that. Also microcoded > machines did they not do that already? > > > Xyron offers a DLX ISA (if I recall correctly) soft core with their > > "ZOTS" (TM) zero-overhead task switching, running at 70 MHz in a > > Virtex-II (no speed grade specified, no LUT count specified). > > (MicroBlaze runs at 125 MHz.) > > If they are going after a real time market, raw speed of the memory > not the cpu could be the limiting factor. > > > Their literature also states this ZOTS equipped FPGA CPU provides > > the equivalent of 16 DMA channels -- this reminded me of the xr16's > > 16 PCs/DMA address registers. > > > > But is it profitable to add hardware, possibly inserting > > multiplexers into critical paths, etc. to reduce the latency and > > overhead of a time consuming operation such as task switching? You > > must do the math: for your app, what are the overheads of > > interrupts, context switches, scheduling, etc., compared to what is > > the throughput (1/cycle time) or area impact of the additional > > hardware in non-task-switching code? And what are your interrupt and > > switching *latency* requirements and can they be met in software? > > This is really the realm of the entire computer system design. One > can't judge that until they know the cpu's use. Real time control is a > different design than a server network or a desktop PC. > > > (Observe that if you can reduce and/or bound the interrupt response > > latency you may be able to simplify your peripheral designs, provide > > smaller FIFOs, etc.) Of course, a 100% hardware tasking approach may > > well be less flexible than a software or hybrid approach. > > But lets not forget most peripheral devices are well defined > today,(Ignoring win products) thus I/O is not likey to have a major > change of usage. > > > Finally (and I don't know if "ZOTS" can be used this way) these days > > when an L2 cache miss may cost 100 issue slots or more, it may be > > profitable to switch threads on a cache miss or similar event, if > > the overhead is low enough. (Common knowledge.) But this doesn't > > require hardware scheduling, just multiple runnable threads. > > > > It is not the taskswitching that is hard to implement, it is the > re-scheduling of tasks that is the slow down and how do you specify > different algorithms for different usage -- a real time OS, a server > OS , a desktop OS in hardware? > -- > Ben Franchuk - Dawn * 12/24 bit cpu * > www.jetnet.ab.ca/users/bfranchuk/index.html > > To post a message, send it to: > To unsubscribe, send a blank message to: > > > To post a message, send it to: > To unsubscribe, send a blank message to: To post a message, send it to: To unsubscribe, send a blank message to: |
|
|
|
Ben Franchuk <> writes: > At one time I had planned on doing a task switching on my cpu by > register bank select but the fact that I needed to also store the flag's > and other current status like the MAR and MDR and IR killed that idea. I > still like the idea of one TASK , one CPU and memory. No need for IRQ's > or task swiching as waiting loops would work fine.Timer flags would be > needed how ever. I'm in the middle of designing a CPU with nonpreemptive task switching in the CPU. I switch the program counter between (up to) 16 different tasks. Since I switch only under program control, I don't need to save flags (the code doesn't trust the flags after a task switch instruction) or registers. I plan to have most of the registers be "scratch", but a task could also have reserved registers that no other task touches. I haven't written much code for the thing yet (and it doesn't work yet), but I think it will work well for the target application. Carl Witty |