This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).
|
> I'd say for sure the main reason MicroBlaze runs 125 MHz is > > ******** Pipelining ********** Very likely. > The MicroBlaze ISA must allow for lots and lots of pipeline stages. It would surprise me if it was much different from the 'canonical' 4/5-stage pipelined RISC (IF, ID, EX/MEM, WB). Hmm. I just took a look at the architecture graph and it does not really look like a pipelined design. Until I hear different I'll assume it is, though, since that makes the most sense (IMHO) for a RISC. > Registers around everything. Several cycles from load > to use, several cycles for taken branches. Forwarding ALU result If it is indeed pipelined, I'd guess it uses 2-3 cycle branch delay slots and probably about the same number of cycles for load-to-use. > into the next cycle's ALU, to allow a pipelined register file. Naturally. > The speed of the fastest grade Virtex-II doesn't hurt either. > I wonder how fast it would go in a Spartan-II -5? I'm aiming for 100 MHz in one of those (Burch Electronics kit) with my own (16 bit at the moment) RISC. Unfortunately, the result MUX (selecting which EX unit result should go to the register file and the EX forwarding MUX) is taking so much time (yes, I've tried TBUFs, but that doesn't seem to help much, if anything) that I might need to pipeline it. That would mean that even with forwarding there would be a one cycle EX-to-use, which isn't too nice (but for my main application it might be OK, especially if the condition flags can be available earlier). Even now, I'm close to the 100 MHz target, but there's quite a bit to do yet (not even the pipeline itself is near finished). In case anyone is interested, my RISC has (in it's current incarnation) a two-operand, predicated (condition codes can cause most instructions to be skipped (well, having no effect at least) without branching) ISA. Most instructions can substitute 4 bit constants (which I intended to make indirect at least for logic instructions) for one of the registers (of which there are 16), and those constants can be extended via a prefix instruction. There is intended to be hardware support for multiplication (getting one result bit per cycle) and I'll probably add zero cycle overhead looping. This processor (especially if I need to pipeline that MUX) is likely not going to be an easy target for a compiler, or perhaps even for hand-written assembly code. I mostly intend it to be a kind of graphics processor, though, so that's not a major worry. I'm doing this in my spare time, for no particular reason, so if the processor ever gets to a usable state it will be released under some kind of free license. -- Chalmers University | Why are these | e-mail: of Technology | .signatures | | so hard to do | WWW: rand.thn.htu.se Gothenburg, Sweden | well? | (fVDI, MGIFv5, QLem) |
|
Just reading the news tonight, can even FPGA's have benchmark that favors nobody? ... only half as many luts as... I can bet $ to donuts that fast ripple carry is used, as well as dual ported memory. RAM FPGA's have fast carry and fuse logic lots of simple gates and multiplexes. I think one needs a set of benchmarks for features all fpga's have in common. Routing too needs to be benchmarked. I have had a early cpu design that used 90% of a ALTERA 10K FPGA. I could not get it route because the logic was too complex,yet looking at the logic block count it fits. What about TTL schematic conversion? Altera has a nice library of TTL macros. Quicklogic on the other hand only a few TTL like macros, yet better small gate logic (like and31 - 3 input and gate with 1 negated input). What about design portability? Can I transfer my design from brand x to brand y. Ben. -- "We do not inherit our time on this planet from our parents... We borrow it from our children." "Luna family of Octal Computers" http://www.jetnet.ab.ca/users/bfranchuk |
|
|
|
I'd say for sure the main reason MicroBlaze runs 125 MHz is ******** Pipelining ********** The MicroBlaze ISA must allow for lots and lots of pipeline stages. Registers around everything. Several cycles from load to use, several cycles for taken branches. Forwarding ALU result into the next cycle's ALU, to allow a pipelined register file. Etc. The speed of the fastest grade Virtex-II doesn't hurt either. I wonder how fast it would go in a Spartan-II -5? --Mike Ben Franchuk wrote: > > Just reading the news tonight, can even FPGA's have benchmark > that favors nobody? ... only half as many luts as... I can bet > $ to donuts that fast ripple carry is used, as well as dual ported > memory. RAM FPGA's have fast carry and fuse logic lots of simple gates and > multiplexes. I think one needs a set of benchmarks for features all > fpga's have in common. Routing too needs to be benchmarked. I have had > a early cpu design that used 90% of a ALTERA 10K FPGA. I could not get it > route because the logic was too complex,yet looking at the logic block count > it fits. What about TTL schematic conversion? Altera has a nice library > of TTL macros. Quicklogic on the other hand only a few TTL like macros, > yet better small gate logic (like and31 - 3 input and gate with 1 negated > input). What about design portability? Can I transfer my design from brand x > to brand y. > Ben. > -- > "We do not inherit our time on this planet from our parents... > We borrow it from our children." > "Luna family of Octal Computers" http://www.jetnet.ab.ca/users/bfranchuk > > To Post a message, send it to: > To Unsubscribe, send a blank message to: |
|
|
|
Mike Butts wrote: > > I'd say for sure the main reason MicroBlaze runs 125 MHz is > > ******** Pipelining ********** > > The MicroBlaze ISA must allow for lots and lots of pipeline stages. > Registers around everything. Several cycles from load > to use, several cycles for taken branches. Forwarding ALU result > into the next cycle's ALU, to allow a pipelined register file. > Etc. You are right about that. 125 MHZ is 8 ns thus very few logic elements are between pipeline stages. -- "We do not inherit our time on this planet from our parents... We borrow it from our children." "Luna family of Octal Computers" http://www.jetnet.ab.ca/users/bfranchuk |
|
|
|
http://www.fpgacpu.org/log/sep00.html#000919: "Smaller is faster: ... An FPGA-optimized RISC processor core should target a minimum cycle time of approximately the execution stage recurrence cycle time (e.g. operand-register clock-to-out delay plus adder delay plus result mux delay plus forwarding mux delay plus operand-register setup time), which is less than 15 ns in a slow Virtex device." Jan Gray, Gray Research LLC |
|
Johan Klockars wrote: > > > I'd say for sure the main reason MicroBlaze runs 125 MHz is > > > > ******** Pipelining ********** > > Very likely. > > > The MicroBlaze ISA must allow for lots and lots of pipeline stages. > > It would surprise me if it was much different from the 'canonical' > 4/5-stage pipelined RISC (IF, ID, EX/MEM, WB). > > Hmm. > I just took a look at the architecture graph and it does not really look > like a pipelined design. > Until I hear different I'll assume it is, though, since that makes the > most sense (IMHO) for a RISC. > I'm aiming for 100 MHz in one of those (Burch Electronics kit) with my own > (16 bit at the moment) RISC. Unfortunately, the result MUX (selecting > which EX unit result should go to the register file and the EX forwarding > MUX) is taking so much time (yes, I've tried TBUFs, but that doesn't seem > to help much, if anything) that I might need to pipeline it. > Even now, I'm close to the 100 MHz target, but there's quite a bit to do > yet (not even the pipeline itself is near finished). Xilinx FPGA's do map well to RISC cpu designs. Dual port memory,fast carry and tristate lines. Other or older FPGA's are not so lucky. But now we get into a gray area, how much of the logic is device specific for Xilinx and while they make a fast RISC cpu how are the FPGA's at other logic styles and random logic? > There is intended to be hardware support for multiplication (getting one > result bit per cycle) and I'll probably add zero cycle overhead looping. What is needed is a 'loop r'/'loop #k' instruction that just repeats the next instruction in the pipeline here. My question to all the people with knowledge is what studies on dynamic code have been done to find the length of basic blocks in typical programs? if 75% of repeated code is 4 instructions of less (including a branch) having a pipeline >6 may be unwise. Ok f-cpu and Fpga guys any thoughts. Ben. -- "We do not inherit our time on this planet from our parents... We borrow it from our children." "Luna family of Octal Computers" http://www.jetnet.ab.ca/users/bfranchuk |