EmbeddedRelated.com
Forums
Memfault Beyond the Launch

16 X 16 multiplier

Started by Manfield Chow July 21, 2002
Dear all,

any one know how to design a 16X16 multiplier with one clock cycle?
I know that some FPGA support embedded 16 X 16 multiplier.
However, is this operated in one clock cycle?
Where can i get the layout/schematic/verilog design of this multiplier?

Thanks
Reala



Hi,

Implementing 16*16 multiplier in one clock is easy,
however the main problem is what clock frequency?

Please clear yourself

Bye,
--- Manfield Chow <>
wrote:
> Dear all,
>
> any one know how to design a 16X16 multiplier with
> one clock cycle?
> I know that some FPGA support embedded 16 X 16
> multiplier.
> However, is this operated in one clock cycle?
> Where can i get the layout/schematic/verilog design
> of this multiplier?
>
> Thanks
> Reala > [Non-text portions of this message have been
> removed] > To post a message, send it to:
>
> To unsubscribe, send a blank message to:
>
>
> ">http://docs.yahoo.com/info/terms/


__________________________________________________




Yes. unless we dont specify frequency, we can not say whether it is possible
to implement or not. please tell the frequency of operation.

--

Regards,
Sridhar Nandula
Design Engineer,
125, Phase1,
Udyog vihar, Gurgaon
Ph: (+91) 0124-6439224 ext 132
Email:
On Wednesday 24 July 2002 05:40 pm, you wrote:
> Hi,
>
> Implementing 16*16 multiplier in one clock is easy,
> however the main problem is what clock frequency?
>
> Please clear yourself
>
> Bye,
> --- Manfield Chow <>
>
> wrote:
> > Dear all,
> >
> > any one know how to design a 16X16 multiplier with
> > one clock cycle?
> > I know that some FPGA support embedded 16 X 16
> > multiplier.
> > However, is this operated in one clock cycle?
> > Where can i get the layout/schematic/verilog design
> > of this multiplier?
> >
> > Thanks
> > Reala
> >
> >
> > [Non-text portions of this message have been
> > removed]
> >
> >
> > To post a message, send it to:
> >
> > To unsubscribe, send a blank message to:
> >
> >
> > ">http://docs.yahoo.com/info/terms/
>
> __________________________________________________
>
> To post a message, send it to:
> To unsubscribe, send a blank message to: > ">http://docs.yahoo.com/info/terms/




Hadi, Sridhar,

Thank you for your reply.
the clock frequency is not too high. It should be 20Mhz to 40Mhz.
My boss tells me that if we implement 16X16 multiplier directly,
the size will be very big. So, I want to know some technic to design a
single cycle 16X16 multiplier to reduce this size.
Moreover, If i implement the design by FPGA, then change to ASIC.
As there are specific blocks in FPGA (eg. Logic block , lookup table), how
can i put this in my ASIC? some tools to do this? or buy library from FPGA's
vender?
or some format of files (eg. netlist) generate by FPGA tools which can be
read by ASIC layout design tools?

Thank you for your help.

With best regards,
Reala
----- Original Message -----
From: "hadi khani" <>
To: <>
Sent: Wednesday, July 24, 2002 8:10 PM
Subject: Re: [fpga-cpu] 16 X 16 multiplier > Hi,
>
> Implementing 16*16 multiplier in one clock is easy,
> however the main problem is what clock frequency?
>
> Please clear yourself
>
> Bye,
> --- Manfield Chow <>
> wrote:
> > Dear all,
> >
> > any one know how to design a 16X16 multiplier with
> > one clock cycle?
> > I know that some FPGA support embedded 16 X 16
> > multiplier.
> > However, is this operated in one clock cycle?
> > Where can i get the layout/schematic/verilog design
> > of this multiplier?
> >
> > Thanks
> > Reala
> >
> >
> > [Non-text portions of this message have been
> > removed]
> >
> >
> > To post a message, send it to:
> >
> > To unsubscribe, send a blank message to:
> >
> >
> > ">http://docs.yahoo.com/info/terms/
> >
> > __________________________________________________
>
> To post a message, send it to:
> To unsubscribe, send a blank message to:

>
> ">http://docs.yahoo.com/info/terms/ >





HI Reala,
Implementing a multiplier in a single clk means
that u gotta use some combinational logic whose delay
is 1/frequncy.It will take some logic(any good digital
book gives an idea abt multipliers(i ahve read one in
smith book (ASIC))also u ahve lot of info in the net
regarding combl multipliers.If u go for sequential
ones it will take less logic and give good freq also
with reduced logic when compared to combl ones.U can
even try pipelining a combl multipliuer urself.
P.S:U also have a dedicated multiplier in virtexII
which will be implemented by leospec if u jus use * in
ur design chk it out too.its faster and fine.
Best of luck :)
Rgds,
Bala.C

--- Manfield Chow <>
wrote:
> Hadi, Sridhar,
>
> Thank you for your reply.
> the clock frequency is not too high. It should be
> 20Mhz to 40Mhz.
> My boss tells me that if we implement 16X16
> multiplier directly,
> the size will be very big. So, I want to know some
> technic to design a
> single cycle 16X16 multiplier to reduce this size.
> Moreover, If i implement the design by FPGA, then
> change to ASIC.
> As there are specific blocks in FPGA (eg. Logic
> block , lookup table), how
> can i put this in my ASIC? some tools to do this? or
> buy library from FPGA's
> vender?
> or some format of files (eg. netlist) generate by
> FPGA tools which can be
> read by ASIC layout design tools?
>
> Thank you for your help.
>
> With best regards,
> Reala


__________________________________________________



Hi Bala,

Thank you for your help.

For more details, we would like to design a multiplier for DSP chip.
One feature of DSP chip is one cycle for multiplication. If i design the
multiplier directly, I afraid that the size of multiplier will be too big.
So, I want to know how to design a multiplier for DSP chip.

Yes, I know that dedicated multiplier in virtexII.
Then, If i want to translate the design from FPGA to ASIC.
How to translate the dedicated multiplier to our ASIC?
Can I get the design of this multiplier?
Or pay the money to buy the design of this multiplier from Xilinx?

Thanks
Reala

----- Original Message -----
From: "Bala Subramani.C" <>
To: <>
Sent: Thursday, July 25, 2002 12:12 PM
Subject: Re: [fpga-cpu] 16 X 16 multiplier > HI Reala,
> Implementing a multiplier in a single clk means
> that u gotta use some combinational logic whose delay
> is 1/frequncy.It will take some logic(any good digital
> book gives an idea abt multipliers(i ahve read one in
> smith book (ASIC))also u ahve lot of info in the net
> regarding combl multipliers.If u go for sequential
> ones it will take less logic and give good freq also
> with reduced logic when compared to combl ones.U can
> even try pipelining a combl multipliuer urself.
> P.S:U also have a dedicated multiplier in virtexII
> which will be implemented by leospec if u jus use * in
> ur design chk it out too.its faster and fine.
> Best of luck :)
> Rgds,
> Bala.C
>
> --- Manfield Chow <>
> wrote:
> > Hadi, Sridhar,
> >
> > Thank you for your reply.
> > the clock frequency is not too high. It should be
> > 20Mhz to 40Mhz.
> > My boss tells me that if we implement 16X16
> > multiplier directly,
> > the size will be very big. So, I want to know some
> > technic to design a
> > single cycle 16X16 multiplier to reduce this size.
> > Moreover, If i implement the design by FPGA, then
> > change to ASIC.
> > As there are specific blocks in FPGA (eg. Logic
> > block , lookup table), how
> > can i put this in my ASIC? some tools to do this? or
> > buy library from FPGA's
> > vender?
> > or some format of files (eg. netlist) generate by
> > FPGA tools which can be
> > read by ASIC layout design tools?
> >
> > Thank you for your help.
> >
> > With best regards,
> > Reala > __________________________________________________
>
> To post a message, send it to:
> To unsubscribe, send a blank message to:

>
> ">http://docs.yahoo.com/info/terms/





> For more details, we would like to design a multiplier for DSP chip.

Do you really need single-cycle latency, or is it acceptable to be able
to start a new multiply every cycle, but get the results of each one
several cycles after it was started? For many DSP tasks such as digital
filters, a multiple cycle latency is acceptable.

Anyhow, pipelining the multiplier may not reduce its size, but it will
certainly let you get a faster cycle time.



Dear Eric,

Actually, my boss request single cycle.
Moreover, MAC - multiply-adder module is another difficult task for me.
If multiple is not single cycle, MAC will be very slow. Am I correct?
As the specification of our DSP chip is not finalized, may be the most
important thing for me is study "how to develop DSP chip".
If you have any site recommand for me to learn DSP development, please tell
me ^_^.
Thanks a lot.

With best regards,
Reala
----- Original Message -----
From: "Eric Smith" <>
To: <>
Sent: Thursday, July 25, 2002 1:24 PM
Subject: Re: [fpga-cpu] 16 X 16 multiplier > > For more details, we would like to design a multiplier for DSP chip.
>
> Do you really need single-cycle latency, or is it acceptable to be able
> to start a new multiply every cycle, but get the results of each one
> several cycles after it was started? For many DSP tasks such as digital
> filters, a multiple cycle latency is acceptable.
>
> Anyhow, pipelining the multiplier may not reduce its size, but it will
> certainly let you get a faster cycle time. >
>
> To post a message, send it to:
> To unsubscribe, send a blank message to:

>
> ">http://docs.yahoo.com/info/terms/





> Actually, my boss request single cycle.

OK, but a pipelined multiplier is still considered single cycle, in
that every cycle you can start a new multiply.

> Moreover, MAC - multiply-adder module is another difficult task for me.
> If multiple is not single cycle, MAC will be very slow. Am I correct?

Pipelined multiply works just fine for MAC.

Let's say you're going to do a series of 50 MACs, and you have a pipelined
multiplier with a three-cycle latency. Your operands are A0..A49 and
B0..B49. Further, let's assume that your accumulator takes one cycle. multiplier multiplier accumulator
inputs output output
(multiplicands) (product) (sum of products)
--------------- ---------- -----------------
Cycle 0: input A0, B0 don't care don't care
Cycle 1: input A1, B1 don't care don't care
Cycle 2: input A2, B2 don't care don't care
Cycle 3: input A3, B3 A0*B0 force zero
Cycle 4: input A4, B4 A1*B1 A0*B0
Cycle 5: input A5, B5 A2*B2 A0*B0+A1*B1
Cycle 6: input A6, B6 A3*B3 Sum for i = 0 to 2 of Ai*Bi
Cycle 7: input A7, B7 A4*B4 Sum for i = 0 to 3 of Ai*Bi
....
Cycle 49: input A49, B49 A46*B46 Sum for i = 0 to 45 of Ai*Bi
Cycle 50: don't care A47*B47 Sum for i = 0 to 46 of Ai*Bi
Cycle 51: don't care A48*B48 Sum for i = 0 to 47 of Ai*Bi
Cycle 52: don't care A49*B49 Sum for i = 0 to 48 of Ai*Bi
Cycle 53: don't care don't care Sum for i = 0 to 49 of Ai*Bi As you can see, you've completed 50 MACs in 54 cycles, even though the
total time to compute one MAC is 4 cycles.

As I said before, a pipelined parallel multiplier will generally take
as much space as a flow-through parallel multiplier. But a pipelined
parallel multiplier with a latency of three can typically be cycled
almost three times faster than the flow-through multiplier, so you get
nearly three times the total data throughput.



Dear Eric,

Thanks a lot. I am more understand now.

Reala
----- Original Message -----
From: "Eric Smith" <>
To: <>
Sent: Thursday, July 25, 2002 2:58 PM
Subject: Re: [fpga-cpu] 16 X 16 multiplier > > Actually, my boss request single cycle.
>
> OK, but a pipelined multiplier is still considered single cycle, in
> that every cycle you can start a new multiply.
>
> > Moreover, MAC - multiply-adder module is another difficult task for me.
> > If multiple is not single cycle, MAC will be very slow. Am I correct?
>
> Pipelined multiply works just fine for MAC.
>
> Let's say you're going to do a series of 50 MACs, and you have a pipelined
> multiplier with a three-cycle latency. Your operands are A0..A49 and
> B0..B49. Further, let's assume that your accumulator takes one cycle. > multiplier multiplier accumulator
> inputs output output
> (multiplicands) (product) (sum of products)
> --------------- ---------- -----------------
> Cycle 0: input A0, B0 don't care don't care
> Cycle 1: input A1, B1 don't care don't care
> Cycle 2: input A2, B2 don't care don't care
> Cycle 3: input A3, B3 A0*B0 force zero
> Cycle 4: input A4, B4 A1*B1 A0*B0
> Cycle 5: input A5, B5 A2*B2 A0*B0+A1*B1
> Cycle 6: input A6, B6 A3*B3 Sum for i = 0 to 2 of Ai*Bi
> Cycle 7: input A7, B7 A4*B4 Sum for i = 0 to 3 of Ai*Bi
> ....
> Cycle 49: input A49, B49 A46*B46 Sum for i = 0 to 45 of Ai*Bi
> Cycle 50: don't care A47*B47 Sum for i = 0 to 46 of Ai*Bi
> Cycle 51: don't care A48*B48 Sum for i = 0 to 47 of Ai*Bi
> Cycle 52: don't care A49*B49 Sum for i = 0 to 48 of Ai*Bi
> Cycle 53: don't care don't care Sum for i = 0 to 49 of Ai*Bi > As you can see, you've completed 50 MACs in 54 cycles, even though the
> total time to compute one MAC is 4 cycles.
>
> As I said before, a pipelined parallel multiplier will generally take
> as much space as a flow-through parallel multiplier. But a pipelined
> parallel multiplier with a latency of three can typically be cycled
> almost three times faster than the flow-through multiplier, so you get
> nearly three times the total data throughput. >
>
> To post a message, send it to:
> To unsubscribe, send a blank message to:

>
> ">http://docs.yahoo.com/info/terms/





Memfault Beyond the Launch