This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).
|
I'd like to practice floor planning and working with Xilinx tools so I grabbed http://doc.union.edu/237/Projects/Mips/Vhdl/ an 8 bit constrained mips isa implementation in VHDL by J Hamblen. I didn't really look at the source that much and I haven't run any test benches against it. My primary goal is to get comfortable with place and route and using the xilinx floor planner. I fixed a couple lines in the source and compiled it with Webpack 4.2 and I get 57Mhz in Spartan2 -6 grade part. The files I used are here http://www.xyke.com/hamblen-mips.00.tar.gz -- Where do I look for bottle necks either in the design or the layout. This is from a place and route and floorplanning perspective, so recognizing where pipelining would help is good but I'd rather restructure the existing logic only minimally and get most of the performance gains from the layout. -- **HOW** do I look for bottle necks using the Xilinx tools. I am not looking for hand holding but just some suggestions from people that already know. -- What techniques does one employ with the floor planner and the automatic place and route to speed up a design? -- Are there easy heuristics to apply to this problem and what are they? I read the online help for the floorplanner as well as http://www.xilinx.com/support/techxclusives/timing6.htm http://support.xilinx.com/support/techtips/documents/timing/presentation/timingcsts3_1i/sld001.htm and of course, the article linked from fpgacpu.org http://www.eedesign.com/isd/OEG20020227S0052 This process seems very black magic and applied. Something you learn by doing rather being all Platonic. Books articles suggestions? It would be great if we flushed out this floor planning problem and turned it from a black art to a previously solved problem. Thanks, Sean. |
|
|
|
Sean wrote: > I'd like to practice floor planning and working with Xilinx tools so I grabbed Know the hardware is the best thing and your logic design. Floor planing may speed things up over a random layout but only if the design FITS well in the FPGA. Also many logic examples could favor features of a specific FPGA family or supplier. A 8 bit adder may fit in say exactly in row of 8 macro cells a be fast, yet a 8 bit adder with carry out could need a extra logic cell for buffering and be only half the speed. It is this type of logic where floor planning has the most room for speed improvement. Also I expect for speed a FPGA can't be more about 33% full as more than that FAST lines could all be used up. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
|
|
Ben wrote: > A 8 bit adder may fit in say > exactly in row of 8 macro cells a be fast, yet a 8 bit adder with carry > out could need a extra logic cell for buffering and be only half the > speed. You're not going to lose half the speed just to get a carry out. If the tools don't do the right thing for you automatically, just force it to be a 9-bit adder and use the ninth output as the carry out (with zeros for the 9th bit of each input). Unless something really strange is going on, this shouldn't slow it down more than 13% over the 8-bit adder without carry out. In my experience, unless you're doing something a lot more exotic than an adder with a carry out, the tools do the mapping just fine. I only have had to tweak the placement. |
|
Eric Smith wrote: > > Ben wrote: > > A 8 bit adder may fit in say > > exactly in row of 8 macro cells a be fast, yet a 8 bit adder with carry > > out could need a extra logic cell for buffering and be only half the > > speed. > > You're not going to lose half the speed just to get a carry out. If the > tools don't do the right thing for you automatically, just force it to be a > 9-bit adder and use the ninth output as the carry out (with zeros for the > 9th bit of each input). Unless something really strange is going on, this > shouldn't slow it down more than 13% over the 8-bit adder without carry > out. > > In my experience, unless you're doing something a lot more exotic than > an adder with a carry out, the tools do the mapping just fine. I only > have had to tweak the placement. Like I said know the hardware. I use the other brand of FPGA. In my case CLB's are arranged in blocks of 8 . The with this FPGA carry out has to skip over the next block of maco cells before it can be routed to a CLB. I have a 24 bit cpu, and no way of floor planning can speed things up because the data path is too wide to fit nicely. The adder example was just picked at random, every FPGA has its own features and flaws. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
Ben wrote: > Like I said know the hardware. I use the other brand of FPGA. In my > case CLB's are arranged in blocks of 8 . The with this FPGA carry out > has to skip over the next block of maco cells before it can be routed > to a CLB. Altera doesn't support hardwired carry chain beyond 8 bits, and has a huge performance penalty for more than 8? I didn't know that, and am now *much* less likely to ever consider Altera parts. Most of my designs have counters and adders wider than 8 bits in the critical paths. It's hard for me to believe that they would really do something this dumb. Eric |
|
Eric Smith wrote: > Altera doesn't support hardwired carry chain beyond 8 bits, and has > a huge performance penalty for more than 8? I didn't know that, and > am now *much* less likely to ever consider Altera parts. Most of my > designs have counters and adders wider than 8 bits in the critical > paths. It's hard for me to believe that they would really do something > this dumb. It is not that you can't have adders larger than 8 bits, it is just that routing for adders could use 'fast' lines quickly. It looks like altera routes slower than but has more consistent delays and could route higher density designs better. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
|
|
Ben wrote: > It is not that you can't have adders larger than 8 bits, it is just > that routing for adders could use 'fast' lines quickly. It looks like > altera routes slower than but has more consistent delays and could > route higher density designs better. Yeah, but you said that a 9-bit adder would run at half the speed of an 8-bit adder. For me, that makes the parts useless. I don't *have* high-density designs with only 8-bit-wide data paths. I'll stick to Xilinx, they understand that people want wide data paths. |
|
|
|
Eric Smith wrote: > Yeah, but you said that a 9-bit adder would run at half the speed of > an 8-bit adder. For me, that makes the parts useless. I don't *have* > high-density designs with only 8-bit-wide data paths. I'll stick > to Xilinx, they understand that people want wide data paths. The logic is 1 standard gate delay every 8 bits. 8 bits : 1 unit delay + ripple carry, 16 bits: 2 units + ripple carry 32: bits 4 units + ripple carry. I have not proved this in any way but that looks to be the case. With my 24 bit cpu I can be 98% full and still route and keep a standard pin layout and a resonable speed. I have not used Xilinx since when I got my FPGA development board Xilinx did not have 1) free development software that was not crippled 2) A low cost FPGA board of about 500 CLB's in a chip. Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
On Sun, 24 Mar 2002, Eric Smith wrote: > Yeah, but you said that a 9-bit adder would run at half the speed of > an 8-bit adder. For me, that makes the parts useless. I don't *have* > high-density designs with only 8-bit-wide data paths. I'll stick > to Xilinx, they understand that people want wide data paths. I don't see a big performance degradation when the carry chain leaves the LAB. I built a little variable-width adder with registered inputs and outputs to explore this. After setting various options to get Max+Plus II to use the carry chain, and setting the device to the (old and slow) Flex 10K20-4 that I have on my UP1 proto board, this is what the timing analyzer says it'll do: 8 bits: 101 MHz 9 bits: 97 MHz 16 bits: 74 MHz 18 bits: 67 MHz 32 bits: 50 MHz 36 bits: 45 MHz The 8 bit version fits in one LAB, so the carry delay path includes 7 inter-LE delays. The 32 bit version fits in 4 LABs, so the carry delay path includes 28 inter-LE delays, and 3 inter-LAB delays. In the floorplanner, you can see the carry chain go through one LAB, jump to the LAB two over, and continue. I don't know how the carry chain is carried between LABs, but it seems to happen reasonably efficiently. In more modern Altera parts, a LAB has 10 LEs, I think. Considering that the board has a 27 MHz oscillator, I'm not complaining. Of course, things will slow down when one adds actual functionality.... jake |
|
|
|
> 8 bits: 101 MHz > 9 bits: 97 MHz That seems quite reasonable; less than 4% degradation for the extra bit. Ben, why were you claiming 50%? |
|
Eric Smith wrote: > > > 8 bits: 101 MHz > > 9 bits: 97 MHz > > That seems quite reasonable; less than 4% degradation for the extra > bit. > > Ben, why were you claiming 50%? > A ball park figure here. Routing plays a large factor here. If the carry out line happens to be placed on a slow line a large distance away things slow down. I have not done floor planning but carry paths could be aided by floor planning. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
Jacob Nelson wrote: > > On Sun, 24 Mar 2002, Eric Smith wrote: > > > Yeah, but you said that a 9-bit adder would run at half the speed of > > an 8-bit adder. For me, that makes the parts useless. I don't *have* > > high-density designs with only 8-bit-wide data paths. I'll stick > > to Xilinx, they understand that people want wide data paths. > > I don't see a big performance degradation when the carry chain leaves the > LAB. > > I built a little variable-width adder with registered inputs and outputs > to explore this. After setting various options to get Max+Plus II to use > the carry chain, and setting the device to the (old and slow) Flex 10K20-4 > that I have on my UP1 proto board, this is what the timing analyzer says > it'll do: > > 8 bits: 101 MHz > 9 bits: 97 MHz > 16 bits: 74 MHz > 18 bits: 67 MHz > 32 bits: 50 MHz > 36 bits: 45 MHz > > The 8 bit version fits in one LAB, so the carry delay path includes 7 > inter-LE delays. The 32 bit version fits in 4 LABs, so the carry delay > path includes 28 inter-LE delays, and 3 inter-LAB delays. In the > floorplanner, you can see the carry chain go through one LAB, jump to the > LAB two over, and continue. I don't know how the carry chain is carried > between LABs, but it seems to happen reasonably efficiently. In more > modern Altera parts, a LAB has 10 LEs, I think. > > Considering that the board has a 27 MHz oscillator, I'm not complaining. > Of course, things will slow down when one adds actual functionality.... Try 24 bits in a 10K10 ! With my cpu I have had a speed range from 20 Mhz to 14 Mhz due to routing delays depending on I/O asignments and how full the FPGA is (95 to 98%). Note I use 6809 style timing so that is a 3.5 Mhz to 5 Mhz real memory cycle. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |