This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).
So far in May, you have voted 0 times ou of a total of 20 votes by the community.
Please help us clean the archives from unuseful discussion threads by using the voting system! Details here.
Is this thread worth a thumbs up?
Paul Metzgen on multiplexers and the NIOS II pipeline - Tommy Thorn - Jul 29 22:51:12 2007
Trying to understand the LAB wide sload and sclear signals better, I happend
upon this gem by Paul Metzgen:
http://www.cs.tut.fi/soc/Metzgen04.pdf (I wish I
had attended this talk).
Among other things, he shows how on Stratix/Cyclone, a single LE can implement
(assuming sclear and sload is shared between all LE in a LAB)
if (sclear)
q <= 0;
else if (sload)
q <= d2;
else
q <= sel ? d0 : d1;
Combining this with a 2 LE 4:1 mux, we can have a register 6:1 mux with clear in
just 3 LE. Not bad.
I'm sure there are similar tricks for Spartan/Virtex. Please do let me know how
it compares.
Regards,
Tommy
---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo!
FareChase.
(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )Re: Paul Metzgen on multiplexers and the NIOS II pipeline - Goran Bilski - Jul 30 11:23:10 2007
Hi,
Here is how MicroBlaze ALU looks like.
This part of the logic does B+A, B-A, B, A.
The two last could be replaced with something like B and A, B or A but that
logic is done in another part.
The reason why passing through B or A is that many times one operand has to be
passed as the result.
It could use the DFF in the CLB for reset/set combo but since the result has to
be muxed in the same pipestage this can't be done.
Göran Bilski
-- EX_ALU_Op
-- 00 => EX_Op1
-- 01 => EX_Op2
-- 10 => EX_Op2 + EX_Op1
-- 11 => EX_Op2 - EX_Op1
--
-- Karnough map
--
-- bit 1,0
-- bit 3,2 ALU_Op(MSB), Op2(I)
-- ALU_Op(LSB),Op1(I) 00 01 11 10
-- 00 0 0 1 0 1000
-- 01 1 1 0 1 0111
-- 11 0 1 1 0 1010
-- 10 0 1 0 1 0110
-- Init String = 1010 0110 0111 1000 (A678)
---
-----
-- Handle bits 31 to 1. Bit 0 is different to support CMPU
-----
Not_Last_Bit : if not C_LAST_BIT generate
begin -- generate Not_Last_Bit
-- pass Op1, pass Op2, add or sub bitwise based on EX_ALU_Op
I_ALU_LUT : LUT4
generic map(
INIT => X"A678"
)
port map (
O => alu_AddSub, -- [out]
I0 => EX_Op2, -- [in]
I1 => EX_ALU_Op(EX_ALU_Op'left), -- [in]
I2 => EX_Op1, -- [in]
I3 => EX_ALU_Op(EX_ALU_Op'right)); -- [in]
-- Confirm that Op2 is '1' for carry calculation
MULT_AND_I : MULT_AND
port map (
I0 => EX_Op2, -- [in]
I1 => EX_ALU_Op(EX_ALU_Op'left), -- [in]
LO => op2_is_1); -- [out]
-- If AddSub result is 0 then there must be a carry out if Op2 is '1'
-- If AddSub result is 1 then there must be a carryin for a carry out
MUXCY_I : MUXCY_L
port map (
DI => op2_is_1,
CI => EX_CarryIn,
S => alu_AddSub,
LO => EX_CarryOut);
-- Merge addsub result with carry in to get final result
XOR_I : XORCY
port map (
LI => alu_AddSub,
CI => EX_CarryIn,
O => ex_result_i);
EX_Result <= ex_result_i;
end generate Not_Last_Bit;
________________________________
From: f... [mailto:f...] On Behalf Of Tommy Thorn
Sent: 30 July 2007 04:44
To: f...
Subject: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Trying to understand the LAB wide sload and sclear signals better, I happend
upon this gem by Paul Metzgen:
http://www.cs.tut.fi/soc/Metzgen04.pdf
<
http://www.cs.tut.fi/soc/Metzgen04.pdf> (I wish I had attended this talk).
Among other things, he shows how on Stratix/Cyclone, a single LE can implement
(assuming sclear and sload is shared between all LE in a LAB)
if (sclear)
q <= 0;
else if (sload)
q <= d2;
else
q <= sel ? d0 : d1;
Combining this with a 2 LE 4:1 mux, we can have a register 6:1 mux with clear in
just 3 LE. Not bad.
I'm sure there are similar tricks for Spartan/Virtex. Please do let me know how
it compares.
Regards,
Tommy
---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo!
FareChase.
(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )Re: Paul Metzgen on multiplexers and the NIOS II pipeline - Eric Smith - Jul 30 13:45:57 2007
Goran,
Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.
Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?
I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.
Eric
(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )Re: Paul Metzgen on multiplexers and the NIOS II pipeline - Goran Bilski - Jul 31 6:43:55 2007
Hi,
I try to avoid using primitives as much as possible but sometimes it's the
easiest and best way to get what I want.
With VHDL it's easy to do a for-generate statement with primitives and I will
get exactly what I want. Otherwise you could spend days trying to write HDL in a
way that produce the same result and that can be changed with a newer version of
the synthesis tool.
MicroBlaze is from v5.00.a not floorplanned. It just was too much work and the
tools were starting to do decent result.
Still primitives are needed when I can't get the result from pure HDL.
The synthesis tools usually tries to grab part of the HDL design that maps well
to built-in modules, like adders, counter, add/sub, ...
It will not do well on modules that actually is packed together like the ALU
example where there is an add/sub and a multiplexer merged into the same LUT.
If you write that as pure HDL you will get one multiplexer followed with one
add/sub module.
I usually look at the final netlist and look at all LUT which don't have all 4
inputs used and see if there is something I can pack into that LUT.
Easy to do by creating a VHDL netlist for the final netlist and do a grep on
LUT1/LUT2/LUT3.
Sometimes I can move something from a different pipestage or I can pack
different functions into the same LUT.
Göran
-----Original Message-----
From: Apache [mailto:a...@ruckus.brouhaha.com] On Behalf Of Eric Smith
Sent: 30 July 2007 19:33
To: Goran Bilski
Cc: f...
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Goran,
Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.
Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?
I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.
Eric
(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )Re: Paul Metzgen on multiplexers and the NIOS II pipeline - Tommy Thorn - Aug 1 14:34:48 2007
Goran,
thanks for unveiling some of the inside of MicroBlaze. I'm sure we'd all love to
learn more of it.
Altera's low-end part favors 4:1 muxes and 3:1 registered muxes (optionally with
clear). Are there similar rules of thumbs for Xilinx?
Thanks,
Tommy
Goran Bilski
wrote: Hi,
I try to avoid using primitives as much as possible but sometimes it's the
easiest and best way to get what I want.
With VHDL it's easy to do a for-generate statement with primitives and I will
get exactly what I want. Otherwise you could spend days trying to write HDL in a
way that produce the same result and that can be changed with a newer version of
the synthesis tool.
MicroBlaze is from v5.00.a not floorplanned. It just was too much work and the
tools were starting to do decent result.
Still primitives are needed when I can't get the result from pure HDL.
The synthesis tools usually tries to grab part of the HDL design that maps well
to built-in modules, like adders, counter, add/sub, ...
It will not do well on modules that actually is packed together like the ALU
example where there is an add/sub and a multiplexer merged into the same LUT.
If you write that as pure HDL you will get one multiplexer followed with one
add/sub module.
I usually look at the final netlist and look at all LUT which don't have all 4
inputs used and see if there is something I can pack into that LUT.
Easy to do by creating a VHDL netlist for the final netlist and do a grep on
LUT1/LUT2/LUT3.
Sometimes I can move something from a different pipestage or I can pack
different functions into the same LUT.
Göran
-----Original Message-----
From: Apache [mailto:a...@ruckus.brouhaha.com] On Behalf Of Eric Smith
Sent: 30 July 2007 19:33
To: Goran Bilski
Cc: f...
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Goran,
Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.
Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?
I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.
Eric
To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
Yahoo! Groups Links
---------------------------------
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search.

(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )Re: Paul Metzgen on multiplexers and the NIOS II pipeline - Goran Bilski - Aug 2 7:13:19 2007
Hi,
Xilinx has the MUXFx part in every CLB which can be used to create muxes upto
16-1.
But I tend to use the synchronous reset of DFFs to create my mux structures.
If I have a 4-1 mux where the inputs are coming from DFFs (very common in
datapaths), I use the synchronous reset of DFFs for creating more efficient
muxes.
If I keep all DFFs in reset except for the bus that I select, I can simple OR
the outputs from the DFFs to create my mux.
So an 8-1 mux can be done with only two LUTs (using carry-chain for the
ORing).
Using normal implementation would take four LUTs. So it's 50% more efficient
using the DFFs for muxing.
A processor is mostly muxes (guess around 40-50% of MicroBlaze) so the most
important is efficient muxes.
The trick for synchronous reset of DFFs is used a lot in MicroBlaze.
I sometimes also use the synchronous reset and set at the same time for creating
constant values from the DFFs (not for muxing).
Since you can apply both set and reset at the same time (reset has higher
priority) it's easy to get different constant values out from DFFs.
Göran
________________________________
From: f... [mailto:f...] On Behalf Of Tommy Thorn
Sent: 01 August 2007 20:33
To: f...
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Goran,
thanks for unveiling some of the inside of MicroBlaze. I'm sure we'd all love to
learn more of it.
Altera's low-end part favors 4:1 muxes and 3:1 registered muxes (optionally with
clear). Are there similar rules of thumbs for Xilinx?
Thanks,
Tommy
Goran Bilski
> wrote: Hi,
I try to avoid using primitives as much as possible but sometimes it's the
easiest and best way to get what I want.
With VHDL it's easy to do a for-generate statement with primitives and I will
get exactly what I want. Otherwise you could spend days trying to write HDL in a
way that produce the same result and that can be changed with a newer version of
the synthesis tool.
MicroBlaze is from v5.00.a not floorplanned. It just was too much work and the
tools were starting to do decent result.
Still primitives are needed when I can't get the result from pure HDL.
The synthesis tools usually tries to grab part of the HDL design that maps well
to built-in modules, like adders, counter, add/sub, ...
It will not do well on modules that actually is packed together like the ALU
example where there is an add/sub and a multiplexer merged into the same LUT.
If you write that as pure HDL you will get one multiplexer followed with one
add/sub module.
I usually look at the final netlist and look at all LUT which don't have all 4
inputs used and see if there is something I can pack into that LUT.
Easy to do by creating a VHDL netlist for the final netlist and do a grep on
LUT1/LUT2/LUT3.
Sometimes I can move something from a different pipestage or I can pack
different functions into the same LUT.
Göran
-----Original Message-----
From: Apache [mailto:a...@ruckus.brouhaha.com
] On Behalf Of Eric Smith
Sent: 30 July 2007 19:33
To: Goran Bilski
Cc: f...
Subject: RE: [fpga-cpu] Paul Metzgen on multiplexers and the NIOS II pipeline
Goran,
Thanks for posting the MicroBlaze ALU sample. It's helpful to see
how a real-world soft core is optimized for an FPGA.
Can you comment on how much improvement is seen in this sort of
datapath with XST or 3rd party synthesis tools when using the Xilinx
primitives vs. letting the synthesis tool infer the structure?
Or is use of primitives only necessary to support floorplanning?
I've been using "plain" VHDL for my data path, but perhaps I'll try
writing a version using the Xilinx primitives.
Eric
To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
Yahoo! Groups Links
---------------------------------
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search.

(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )