Sign in

username:

password:



Not a member?

Search fpga-cpu



Search tips

Subscribe to fpga-cpu



fpga-cpu by Keywords

Altera | CISCifying | IDE | ISA | Java | JHDL | JTAG | LBU | MicroBlaze | PAR | PCI | RISC | SoC | Spartan | Transputers | Verilog | VHDL | Virtex | VLIW | WebPack | Xilinx | Xsoc | YARD-1A

Discussion Groups

Discussion Groups | FPGA-CPU | Re: Re: A High Performance 32-bit ALU

This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).

A High Performance 32-bit ALU - Tommy Thorn - Jul 26 14:43:33 2007


I decided to post the "best" ALU I've seen so far in the hope some people here can profit from it and help me improve it.

Obviously what is "fast" and "general purpose" is relative, but it should be able to support the operations needed for MIPS and similar, that is, all the basic logical operations (OR, AND, XOR, NOT, ...) as well as add, sub, cmp, ..., and must include a bypass path.

I tried implementing Paul Metzgen's ALU as described in "A High Performance 32-bit ALU for Programmable Logic" in FPGA'04 (though I'm still missing the multi stage shifters). This is the NIOS I ALU.

It does perform better than a naive implementation of the same functionality, but it doesn't seem to map quite as nicely to Cyclone as it did to APEX, but perhaps Cyclone doesn't need as many tweaks. Interestingly, it appears much faster on a Cyclone than on a Spartan where I generally see less of a difference between the two.

I'd love to know how the NIOS II ALU differs from this.

Regards,
Tommy

---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.

[Non-text portions of this message have been removed]


(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: A High Performance 32-bit ALU - Eric Smith - Jul 26 16:16:13 2007

Tommy wrote:
> I decided to post the "best" ALU I've seen so far in the hope some people
> here can profit from it and help me improve it.

It looks like the mailing list stripped your attachment. Perhaps you
could just put it inline in a message. Or, if you like, you could send
it to me, and I could put it on a web page and post the URL here.

In any case, thanks for sharing your work. I consider myself to be
a moderately experienced designer in VHDL and a little less experienced
in Verilog, but what I haven't yet got a handle on is optimization for
FPGAs. My data path designs have all been very straightforward, and I'm
eager to learn what can be done to improve the performance.

Speaking of such things, is there an overview somewhere of what the
Xilinx PlanAhead tool actually does? The Xilinx web site seems to have
plenty of info on what benefits I can expect, but very little in the
way of explanation of how it works.

Eric


(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: A High Performance 32-bit ALU (URL this time) - Tommy Thorn - Jul 26 17:46:48 2007

Eric Smith wrote:It looks like the mailing list stripped your attachment.

Sigh. I hate Yahoo! Mail. Anyhow, you can grab the humble ALU from http://numba-tu.com/alu.v

I realize that this is really not polished, tested, or documented as a professional product, but I'd rather get the bits out there now rather than waiting another century for that to happen.

If this goes well, I'd consider opening up the full MIPS-like clone.

Sorry, Eric, I've never used PlanAhead.

Regards,
Tommy

---------------------------------
Be a better Globetrotter. Get better travel answers from someone who knows.
Yahoo! Answers - Check it out.

[Non-text portions of this message have been removed]


(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: A High Performance 32-bit ALU - Rob Finch - Jul 29 4:27:03 2007

--- In f...@yahoogroups.com, Tommy Thorn wrote:
>
> I decided to post the "best" ALU I've seen so far in the hope some
people here can profit from it and help me improve it.

One thing that won't help the ALU but might help the system performance
is separate outputs from the adder / subtractor / logic / and shifter.

Often the ALU isn't the only thing on the result bus, and has to be
multiplexed with other units. Separate outputs avoid two multiplexer
layers (one in the ALU and one onto the bus).

RF



(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Re: A High Performance 32-bit ALU - John Kent - Jul 29 4:59:32 2007

Rob,

Wouldn't the synthesis tools optimize the multiplexers ?

If multiplexers were built out of CLBs, the optimum number of inputs
would be 4.
You would not necessarily gain anything in nesting levels if you had
more than
4 inputs anyway because the synthesis tool would have to nest the
multiplexers anyway. Wouldn't that be the case ?

The advantage of nesting the multiplexers would be that the multiplexer
control
signals could be localized to a particular group of CLBs. In that sense
it would
be more efficient.

I believe the Microblaze was optimized to take into account the structure
of the CLBs in the Xilinx FPGAs so I assume they based everything around
multiples of 4.

Anyway ... my 2 cents worth.

John.

Rob Finch wrote:

>--- In f...@yahoogroups.com ,
Tommy Thorn wrote:
>>
>> I decided to post the "best" ALU I've seen so far in the hope some
>people here can profit from it and help me improve it.
>
>One thing that won't help the ALU but might help the system performance
>is separate outputs from the adder / subtractor / logic / and shifter.
>
>Often the ALU isn't the only thing on the result bus, and has to be
>multiplexed with other units. Separate outputs avoid two multiplexer
>layers (one in the ALU and one onto the bus).
>
>RF


(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Re: A High Performance 32-bit ALU - Kolja Sulimma - Jul 29 5:41:31 2007

A few more cents:
- having two outputs instead of one removes one mux from the ALU but the
system needs to select one of these outputs to write in a register so you
add
the same mux externaly and not save it?

- most FPGAs have the carry logic after the LUT. Therefore it is important
to
structure your HDL code in a way that places at least one level of
multiplexers
in front of the carries. The synthesis tools will not do this reordering in
general.
This means writing:

- For larger multiplexers you should set all inactive sources to zero and
combines the sources with "OR" instead of multiplxers. This way you get a
4-to-1 reduction
in a 4-LUT instead of the 2-to-1 reduction for muxes.

Kolja Sulimma

2007/7/29, John Kent :
>
> Rob,
>
> Wouldn't the synthesis tools optimize the multiplexers ?
>
> If multiplexers were built out of CLBs, the optimum number of inputs
> would be 4.
> You would not necessarily gain anything in nesting levels if you had
> more than
> 4 inputs anyway because the synthesis tool would have to nest the
> multiplexers anyway. Wouldn't that be the case ?
>
> The advantage of nesting the multiplexers would be that the multiplexer
> control
> signals could be localized to a particular group of CLBs. In that sense
> it would
> be more efficient.
>
> I believe the Microblaze was optimized to take into account the structure
> of the CLBs in the Xilinx FPGAs so I assume they based everything around
> multiples of 4.
>
> Anyway ... my 2 cents worth.
>
> John.
>
> Rob Finch wrote:
>
> >--- In f...@yahoogroups.com > fpga-cpu%40yahoogroups.com>,
> Tommy Thorn wrote:
> >>
> >> I decided to post the "best" ALU I've seen so far in the hope some
> >people here can profit from it and help me improve it.
> >
> >One thing that won't help the ALU but might help the system performance
> >is separate outputs from the adder / subtractor / logic / and shifter.
> >
> >Often the ALU isn't the only thing on the result bus, and has to be
> >multiplexed with other units. Separate outputs avoid two multiplexer
> >layers (one in the ALU and one onto the bus).
> >
> >RF
>
>
>

--
cronologic ohg Frankfurt am Main
HRA 42869 beim Amtsgericht Frankfurt
Telefon 069 38 09 78 254
[Non-text portions of this message have been removed]


(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Re: A High Performance 32-bit ALU - John Kent - Jul 29 8:55:13 2007

Hi Kolja

Kolja Sulimma wrote:
>
>A few more cents:
>- having two outputs instead of one removes one mux from the ALU but the
>system needs to select one of these outputs to write in a register so
you add
>the same mux externaly and not save it?
>

That's what I was getting at. You have to multiplex the signals anyway.
Either you do it with one big mux or you use a hierarchy of smaller
Multiplexers

>- most FPGAs have the carry logic after the LUT. Therefore it is
important to
>structure your HDL code in a way that places at least one level of
>multiplexers in front of the carries.
>The synthesis tools will not do this reordering in general.
>This means writing:
>
>- For larger multiplexers you should set all inactive sources to zero and
>combines the sources with "OR" instead of multiplxers. This way you get a
>4-to-1 reduction
>in a 4-LUT instead of the 2-to-1 reduction for muxes.
>

I assume that is what they have done with the IBM CoreConnect Bus
Architecture
used in the Xilinx EDK and why they provide (or used to provide) wide
OR gates
in the Xilinx FPGAs.

John.

--
http://www.johnkent.com.au
http://members.optushome.com.au/jekent


(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: A High Performance 32-bit ALU - M - Aug 28 5:01:53 2008

Hello All

Any body have a pdf copy of the Paul Metzgen's "A High Performance
32-bit ALU for Programmable Logic" that i could get access to?

Thanks

--- In f...@yahoogroups.com, Tommy Thorn wrote:
>
> I decided to post the "best" ALU I've seen so far in the hope some
people here can profit from it and help me improve it.
>
> Obviously what is "fast" and "general purpose" is relative, but it
should be able to support the operations needed for MIPS and similar,
that is, all the basic logical operations (OR, AND, XOR, NOT, ...) as
well as add, sub, cmp, ..., and must include a bypass path.
>
> I tried implementing Paul Metzgen's ALU as described in "A High
Performance 32-bit ALU for Programmable Logic" in FPGA'04 (though I'm
still missing the multi stage shifters). This is the NIOS I ALU.
>
> It does perform better than a naive implementation of the same
functionality, but it doesn't seem to map quite as nicely to Cyclone
as it did to APEX, but perhaps Cyclone doesn't need as many tweaks.
Interestingly, it appears much faster on a Cyclone than on a Spartan
where I generally see less of a difference between the two.
>
> I'd love to know how the NIOS II ALU differs from this.
>
> Regards,
> Tommy
>
> ---------------------------------
> Looking for a deal? Find great prices on flights and hotels with
Yahoo! FareChase.
>
> [Non-text portions of this message have been removed]
>

------------------------------------

To post a message, send it to: f...@yahoogroups.com
To unsubscribe, send a blank message to: f...@yahoogroups.com



(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )