This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).
Paul Metzgen on multiplexers and the NIOS II pipeline - Tommy Thorn - Jul 29 22:51:12 2007
Trying to understand the LAB wide sload and sclear signals better, I happend
upon this gem by Paul Metzgen: http://www.cs.tut.fi/soc/Metzgen04.pdf (I wish I had
attended this talk).
Among other things, he shows how on Stratix/Cyclone, a single LE can implement (assuming
sclear and sload is shared between all LE in a LAB)
if (sclear)
q <= 0;
else if (sload)
q <= d2;
else
q <= sel ? d0 : d1;
Combining this with a 2 LE 4:1 mux, we can have a register 6:1 mux with clear in just 3
LE. Not bad.
I'm sure there are similar tricks for Spartan/Virtex. Please do let me know how it
compares.
Regards,
Tommy
---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
[Non-text portions of this message have been removed]

(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )
RE: Paul Metzgen on multiplexers and the NIOS II pipeline - Goran Bilski - Jul 30 11:23:10 2007
Hi,
Here is how MicroBlaze ALU looks like.
This part of the logic does B+A, B-A, B, A.
The two last could be replaced with something like B and A, B or A but that logic is done
in another part.
The reason why passing through B or A is that many times one operand has to be passed as
the result.
It could use the DFF in the CLB for reset/set combo but since the result has to be muxed
in the same pipestage this can't be done.
Göran Bilski
-- EX_ALU_Op
-- 00 => EX_Op1
-- 01 => EX_Op2
-- 10 => EX_Op2 + EX_Op1
-- 11 => EX_Op2 - EX_Op1
--
-- Karnough map
--
-- bit 1,0
-- bit 3,2 ALU_Op(MSB), Op2(I)
-- ALU_Op(LSB),Op1(I) 00 01 11 10
-- 00 0 0 1 0 1000
-- 01 1 1 0 1 0111
-- 11 0 1 1 0 1010
-- 10 0 1 0 1 0110
-- Init String = 1010 0110 0111 1000 (A678)
---------------------------------------------------------------------------
-----------------------------------------------------------------------------
-- Handle bits 31 to 1. Bit 0 is different to support CMPU
-----------------------------------------------------------------------------
Not_Last_Bit : if not C_LAST_BIT generate
begin -- generate Not_Last_Bit
-- pass Op1, pass Op2, add or sub bitwise based on EX_ALU_Op
I_ALU_LUT : LUT4
generic map(
INIT => X"A678"
)
port map (
O => alu_AddSub, -- [out]
I0 => EX_Op2, -- [in]
I1 => EX_ALU_Op(EX_ALU_Op'left), -- [in]
I2 => EX_Op1, -- [in]
I3 => EX_ALU_Op(EX_ALU_Op'right)); -- [in]
-- Confirm that Op2 is '1' for carry calculation
MULT_AND_I : MULT_AND
port map (
I0 => EX_Op2, -- [in]
I1 => EX_ALU_Op(EX_ALU_Op'left), -- [in]
LO => op2_is_1); -- [out]
-- If AddSub result is 0 then there must be a carry out if Op2 is '1'
-- If AddSub result is 1 then there must be a carryin for a carry out
MUXCY_I : MUXCY_L
port map (
DI => op2_is_1,
CI => EX_CarryIn,
S => alu_AddSub,
LO => EX_CarryOut);
-- Merge addsub result with carry in to get final result
XOR_I : XORCY
port map (
LI => alu_AddSub,
CI => EX_CarryIn,
O => ex_result_i);
EX_Result <= ex_result_i;
end generate Not_Last_Bit;
________________________________
From: f...@yahoogroups.com [mailto:f...@yahoogroups.com] On Behalf Of Tommy Thorn
Sent: 30 July 2007 04:44
To: f...@yahoogroups.com
Subject: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Trying to understand the LAB wide sload and sclear signals better, I happend
upon this gem by Paul Metzgen: http://www.cs.tut.fi/soc/Metzgen04.pdf
(I wish I had attended this talk).
Among other things, he shows how on Stratix/Cyclone, a single LE can implement (assuming
sclear and sload is shared between all LE in a LAB)
if (sclear)
q <= 0;
else if (sload)
q <= d2;
else
q <= sel ? d0 : d1;
Combining this with a 2 LE 4:1 mux, we can have a register 6:1 mux with clear in just 3
LE. Not bad.
I'm sure there are similar tricks for Spartan/Virtex. Please do let me know how it
compares.
Regards,
Tommy
---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
[Non-text portions of this message have been removed]
[Non-text portions of this message have been removed]

(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )RE: Paul Metzgen on multiplexers and the NIOS II pipeline - Eric Smith - Jul 30 13:45:57 2007
Goran,
Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.
Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?
I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.
Eric

(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )
RE: Paul Metzgen on multiplexers and the NIOS II pipeline - Goran Bilski - Jul 31 6:43:55 2007
Hi,
I try to avoid using primitives as much as possible but sometimes it's the easiest and
best way to get what I want.
With VHDL it's easy to do a for-generate statement with primitives and I will get exactly
what I want. Otherwise you could spend days trying to write HDL in a way that produce the
same result and that can be changed with a newer version of the synthesis tool.
MicroBlaze is from v5.00.a not floorplanned. It just was too much work and the tools were
starting to do decent result.
Still primitives are needed when I can't get the result from pure HDL.
The synthesis tools usually tries to grab part of the HDL design that maps well to
built-in modules, like adders, counter, add/sub, ...
It will not do well on modules that actually is packed together like the ALU example
where there is an add/sub and a multiplexer merged into the same LUT.
If you write that as pure HDL you will get one multiplexer followed with one add/sub
module.
I usually look at the final netlist and look at all LUT which don't have all 4 inputs
used and see if there is something I can pack into that LUT.
Easy to do by creating a VHDL netlist for the final netlist and do a grep on
LUT1/LUT2/LUT3.
Sometimes I can move something from a different pipestage or I can pack different
functions into the same LUT.
Göran
-----Original Message-----
From: Apache [mailto:a...@ruckus.brouhaha.com] On Behalf Of Eric Smith
Sent: 30 July 2007 19:33
To: Goran Bilski
Cc: f...@yahoogroups.com
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Goran,
Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.
Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?
I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.
Eric

(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )
RE: Paul Metzgen on multiplexers and the NIOS II pipeline - Tommy Thorn - Aug 1 14:34:48 2007
Goran,
thanks for unveiling some of the inside of MicroBlaze. I'm sure we'd all love to learn
more of it.
Altera's low-end part favors 4:1 muxes and 3:1 registered muxes (optionally with clear).
Are there similar rules of thumbs for Xilinx?
Thanks,
Tommy
Goran Bilski
wrote: Hi,
I try to avoid using primitives as much as possible but sometimes it's the easiest and
best way to get what I want.
With VHDL it's easy to do a for-generate statement with primitives and I will get exactly
what I want. Otherwise you could spend days trying to write HDL in a way that produce the
same result and that can be changed with a newer version of the synthesis tool.
MicroBlaze is from v5.00.a not floorplanned. It just was too much work and the tools were
starting to do decent result.
Still primitives are needed when I can't get the result from pure HDL.
The synthesis tools usually tries to grab part of the HDL design that maps well to
built-in modules, like adders, counter, add/sub, ...
It will not do well on modules that actually is packed together like the ALU example
where there is an add/sub and a multiplexer merged into the same LUT.
If you write that as pure HDL you will get one multiplexer followed with one add/sub
module.
I usually look at the final netlist and look at all LUT which don't have all 4 inputs
used and see if there is something I can pack into that LUT.
Easy to do by creating a VHDL netlist for the final netlist and do a grep on
LUT1/LUT2/LUT3.
Sometimes I can move something from a different pipestage or I can pack different
functions into the same LUT.
Göran
-----Original Message-----
From: Apache [mailto:a...@ruckus.brouhaha.com] On Behalf Of Eric Smith
Sent: 30 July 2007 19:33
To: Goran Bilski
Cc: f...@yahoogroups.com
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Goran,
Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.
Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?
I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.
Eric
To post a message, send it to: f...@yahoogroups.com
To unsubscribe, send a blank message to: f...@yahoogroups.com
Yahoo! Groups Links
---------------------------------
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search.
[Non-text portions of this message have been removed]

(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )RE: Paul Metzgen on multiplexers and the NIOS II pipeline - Goran Bilski - Aug 2 7:13:19 2007
Hi,
Xilinx has the MUXFx part in every CLB which can be used to create muxes upto 16-1.
But I tend to use the synchronous reset of DFFs to create my mux structures.
If I have a 4-1 mux where the inputs are coming from DFFs (very common in datapaths), I
use the synchronous reset of DFFs for creating more efficient muxes.
If I keep all DFFs in reset except for the bus that I select, I can simple OR the outputs
from the DFFs to create my mux.
So an 8-1 mux can be done with only two LUTs (using carry-chain for the ORing).
Using normal implementation would take four LUTs. So it's 50% more efficient using the
DFFs for muxing.
A processor is mostly muxes (guess around 40-50% of MicroBlaze) so the most important is
efficient muxes.
The trick for synchronous reset of DFFs is used a lot in MicroBlaze.
I sometimes also use the synchronous reset and set at the same time for creating constant
values from the DFFs (not for muxing).
Since you can apply both set and reset at the same time (reset has higher priority) it's
easy to get different constant values out from DFFs.
Göran
________________________________
From: f...@yahoogroups.com [mailto:f...@yahoogroups.com] On Behalf Of Tommy Thorn
Sent: 01 August 2007 20:33
To: f...@yahoogroups.com
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Goran,
thanks for unveiling some of the inside of MicroBlaze. I'm sure we'd all love to learn
more of it.
Altera's low-end part favors 4:1 muxes and 3:1 registered muxes (optionally with clear).
Are there similar rules of thumbs for Xilinx?
Thanks,
Tommy
Goran Bilski
> wrote: Hi,
I try to avoid using primitives as much as possible but sometimes it's the easiest and
best way to get what I want.
With VHDL it's easy to do a for-generate statement with primitives and I will get exactly
what I want. Otherwise you could spend days trying to write HDL in a way that produce the
same result and that can be changed with a newer version of the synthesis tool.
MicroBlaze is from v5.00.a not floorplanned. It just was too much work and the tools were
starting to do decent result.
Still primitives are needed when I can't get the result from pure HDL.
The synthesis tools usually tries to grab part of the HDL design that maps well to
built-in modules, like adders, counter, add/sub, ...
It will not do well on modules that actually is packed together like the ALU example
where there is an add/sub and a multiplexer merged into the same LUT.
If you write that as pure HDL you will get one multiplexer followed with one add/sub
module.
I usually look at the final netlist and look at all LUT which don't have all 4 inputs
used and see if there is something I can pack into that LUT.
Easy to do by creating a VHDL netlist for the final netlist and do a grep on
LUT1/LUT2/LUT3.
Sometimes I can move something from a different pipestage or I can pack different
functions into the same LUT.
Göran
-----Original Message-----
From: Apache [mailto:a...@ruckus.brouhaha.com ] On
Behalf Of Eric Smith
Sent: 30 July 2007 19:33
To: Goran Bilski
Cc: f...@yahoogroups.com
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Goran,
Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.
Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?
I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.
Eric
To post a message, send it to: f...@yahoogroups.com
To unsubscribe, send a blank message to: f...@yahoogroups.com
Yahoo! Groups Links
---------------------------------
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search.
[Non-text portions of this message have been removed]
[Non-text portions of this message have been removed]

(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )