Reply by Goran Bilski August 2, 20072007-08-02
Hi,

Xilinx has the MUXFx part in every CLB which can be used to create muxes upto 16-1.

But I tend to use the synchronous reset of DFFs to create my mux structures.

If I have a 4-1 mux where the inputs are coming from DFFs (very common in datapaths), I use the synchronous reset of DFFs for creating more efficient muxes.

If I keep all DFFs in reset except for the bus that I select, I can simple OR the outputs from the DFFs to create my mux.

So an 8-1 mux can be done with only two LUTs (using carry-chain for the ORing).

Using normal implementation would take four LUTs. So it's 50% more efficient using the DFFs for muxing.

A processor is mostly muxes (guess around 40-50% of MicroBlaze) so the most important is efficient muxes.

The trick for synchronous reset of DFFs is used a lot in MicroBlaze.

I sometimes also use the synchronous reset and set at the same time for creating constant values from the DFFs (not for muxing).

Since you can apply both set and reset at the same time (reset has higher priority) it's easy to get different constant values out from DFFs.

Gan

________________________________

From: f... [mailto:f...] On Behalf Of Tommy Thorn
Sent: 01 August 2007 20:33
To: f...
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline

Goran,

thanks for unveiling some of the inside of MicroBlaze. I'm sure we'd all love to learn more of it.

Altera's low-end part favors 4:1 muxes and 3:1 registered muxes (optionally with clear). Are there similar rules of thumbs for Xilinx?

Thanks,
Tommy

Goran Bilski > wrote: Hi,

I try to avoid using primitives as much as possible but sometimes it's the easiest and best way to get what I want.
With VHDL it's easy to do a for-generate statement with primitives and I will get exactly what I want. Otherwise you could spend days trying to write HDL in a way that produce the same result and that can be changed with a newer version of the synthesis tool.

MicroBlaze is from v5.00.a not floorplanned. It just was too much work and the tools were starting to do decent result.
Still primitives are needed when I can't get the result from pure HDL.

The synthesis tools usually tries to grab part of the HDL design that maps well to built-in modules, like adders, counter, add/sub, ...
It will not do well on modules that actually is packed together like the ALU example where there is an add/sub and a multiplexer merged into the same LUT.
If you write that as pure HDL you will get one multiplexer followed with one add/sub module.

I usually look at the final netlist and look at all LUT which don't have all 4 inputs used and see if there is something I can pack into that LUT.
Easy to do by creating a VHDL netlist for the final netlist and do a grep on LUT1/LUT2/LUT3.
Sometimes I can move something from a different pipestage or I can pack different functions into the same LUT.

Gan
-----Original Message-----
From: Apache [mailto:a...@ruckus.brouhaha.com ] On Behalf Of Eric Smith
Sent: 30 July 2007 19:33
To: Goran Bilski
Cc: f...
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline

Goran,

Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.

Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?

I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.

Eric

To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
Yahoo! Groups Links

---------------------------------
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search.
Reply by Tommy Thorn August 1, 20072007-08-01
Goran,

thanks for unveiling some of the inside of MicroBlaze. I'm sure we'd all love to learn more of it.

Altera's low-end part favors 4:1 muxes and 3:1 registered muxes (optionally with clear). Are there similar rules of thumbs for Xilinx?

Thanks,
Tommy
Goran Bilski wrote: Hi,
I try to avoid using primitives as much as possible but sometimes it's the easiest and best way to get what I want.
With VHDL it's easy to do a for-generate statement with primitives and I will get exactly what I want. Otherwise you could spend days trying to write HDL in a way that produce the same result and that can be changed with a newer version of the synthesis tool.

MicroBlaze is from v5.00.a not floorplanned. It just was too much work and the tools were starting to do decent result.
Still primitives are needed when I can't get the result from pure HDL.

The synthesis tools usually tries to grab part of the HDL design that maps well to built-in modules, like adders, counter, add/sub, ...
It will not do well on modules that actually is packed together like the ALU example where there is an add/sub and a multiplexer merged into the same LUT.
If you write that as pure HDL you will get one multiplexer followed with one add/sub module.

I usually look at the final netlist and look at all LUT which don't have all 4 inputs used and see if there is something I can pack into that LUT.
Easy to do by creating a VHDL netlist for the final netlist and do a grep on LUT1/LUT2/LUT3.
Sometimes I can move something from a different pipestage or I can pack different functions into the same LUT.

Gan
-----Original Message-----
From: Apache [mailto:a...@ruckus.brouhaha.com] On Behalf Of Eric Smith
Sent: 30 July 2007 19:33
To: Goran Bilski
Cc: f...
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline

Goran,

Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.

Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?

I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.

Eric

To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
Yahoo! Groups Links

---------------------------------
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search.
Reply by Goran Bilski July 31, 20072007-07-31
Hi,
I try to avoid using primitives as much as possible but sometimes it's the easiest and best way to get what I want.
With VHDL it's easy to do a for-generate statement with primitives and I will get exactly what I want. Otherwise you could spend days trying to write HDL in a way that produce the same result and that can be changed with a newer version of the synthesis tool.

MicroBlaze is from v5.00.a not floorplanned. It just was too much work and the tools were starting to do decent result.
Still primitives are needed when I can't get the result from pure HDL.

The synthesis tools usually tries to grab part of the HDL design that maps well to built-in modules, like adders, counter, add/sub, ...
It will not do well on modules that actually is packed together like the ALU example where there is an add/sub and a multiplexer merged into the same LUT.
If you write that as pure HDL you will get one multiplexer followed with one add/sub module.

I usually look at the final netlist and look at all LUT which don't have all 4 inputs used and see if there is something I can pack into that LUT.
Easy to do by creating a VHDL netlist for the final netlist and do a grep on LUT1/LUT2/LUT3.
Sometimes I can move something from a different pipestage or I can pack different functions into the same LUT.

Gan
-----Original Message-----
From: Apache [mailto:a...@ruckus.brouhaha.com] On Behalf Of Eric Smith
Sent: 30 July 2007 19:33
To: Goran Bilski
Cc: f...
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline

Goran,

Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.

Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?

I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.

Eric
Reply by Eric Smith July 30, 20072007-07-30
Goran,

Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.

Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?

I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.

Eric
Reply by Goran Bilski July 30, 20072007-07-30
Hi,

Here is how MicroBlaze ALU looks like.

This part of the logic does B+A, B-A, B, A.

The two last could be replaced with something like B and A, B or A but that logic is done in another part.

The reason why passing through B or A is that many times one operand has to be passed as the result.

It could use the DFF in the CLB for reset/set combo but since the result has to be muxed in the same pipestage this can't be done.

Gan Bilski

-- EX_ALU_Op

-- 00 => EX_Op1

-- 01 => EX_Op2

-- 10 => EX_Op2 + EX_Op1

-- 11 => EX_Op2 - EX_Op1

--

-- Karnough map

--

-- bit 1,0

-- bit 3,2 ALU_Op(MSB), Op2(I)

-- ALU_Op(LSB),Op1(I) 00 01 11 10

-- 00 0 0 1 0 1000

-- 01 1 1 0 1 0111

-- 11 0 1 1 0 1010

-- 10 0 1 0 1 0110

-- Init String = 1010 0110 0111 1000 (A678)

---

-----

-- Handle bits 31 to 1. Bit 0 is different to support CMPU

-----

Not_Last_Bit : if not C_LAST_BIT generate

begin -- generate Not_Last_Bit

-- pass Op1, pass Op2, add or sub bitwise based on EX_ALU_Op

I_ALU_LUT : LUT4

generic map(

INIT => X"A678"

)

port map (

O => alu_AddSub, -- [out]

I0 => EX_Op2, -- [in]

I1 => EX_ALU_Op(EX_ALU_Op'left), -- [in]

I2 => EX_Op1, -- [in]

I3 => EX_ALU_Op(EX_ALU_Op'right)); -- [in]

-- Confirm that Op2 is '1' for carry calculation

MULT_AND_I : MULT_AND

port map (

I0 => EX_Op2, -- [in]

I1 => EX_ALU_Op(EX_ALU_Op'left), -- [in]

LO => op2_is_1); -- [out]

-- If AddSub result is 0 then there must be a carry out if Op2 is '1'

-- If AddSub result is 1 then there must be a carryin for a carry out

MUXCY_I : MUXCY_L

port map (

DI => op2_is_1,

CI => EX_CarryIn,

S => alu_AddSub,

LO => EX_CarryOut);

-- Merge addsub result with carry in to get final result

XOR_I : XORCY

port map (

LI => alu_AddSub,

CI => EX_CarryIn,

O => ex_result_i);

EX_Result <= ex_result_i;

end generate Not_Last_Bit;

________________________________

From: f... [mailto:f...] On Behalf Of Tommy Thorn
Sent: 30 July 2007 04:44
To: f...
Subject: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline

Trying to understand the LAB wide sload and sclear signals better, I happend
upon this gem by Paul Metzgen: http://www.cs.tut.fi/soc/Metzgen04.pdf (I wish I had attended this talk).

Among other things, he shows how on Stratix/Cyclone, a single LE can implement (assuming sclear and sload is shared between all LE in a LAB)

if (sclear)
q <= 0;
else if (sload)
q <= d2;
else
q <= sel ? d0 : d1;

Combining this with a 2 LE 4:1 mux, we can have a register 6:1 mux with clear in just 3 LE. Not bad.

I'm sure there are similar tricks for Spartan/Virtex. Please do let me know how it compares.

Regards,
Tommy

---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
Reply by Tommy Thorn July 29, 20072007-07-29
Trying to understand the LAB wide sload and sclear signals better, I happend
upon this gem by Paul Metzgen: http://www.cs.tut.fi/soc/Metzgen04.pdf (I wish I had attended this talk).

Among other things, he shows how on Stratix/Cyclone, a single LE can implement (assuming sclear and sload is shared between all LE in a LAB)

if (sclear)
q <= 0;
else if (sload)
q <= d2;
else
q <= sel ? d0 : d1;

Combining this with a 2 LE 4:1 mux, we can have a register 6:1 mux with clear in just 3 LE. Not bad.

I'm sure there are similar tricks for Spartan/Virtex. Please do let me know how it compares.

Regards,
Tommy

---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.