Sign in

username:

password:



Not a member?

Search fpga-cpu



Search tips

Subscribe to fpga-cpu



fpga-cpu by Keywords

Altera | CISCifying | IDE | ISA | Java | JHDL | JTAG | LBU | MicroBlaze | PAR | PCI | RISC | SoC | Spartan | Transputers | Verilog | VHDL | Virtex | VLIW | WebPack | Xilinx | Xsoc | YARD-1A

Ads

Discussion Groups

Discussion Groups | FPGA-CPU | Re: Multiplying, MicroBlaze

This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).

Multiplying, MicroBlaze - Author Unknown - Apr 9 23:56:00 2001

Wow, that Microblaze looks impressive. But I wonder how realisitic
the architecture is and what got "scrapped". 800 LUTs is amazing. I
wonder how fast the xr16 would execute in the same type of part.

I'm wondering about the virtue of adding hardware multiply / divide
support to a processor. It seems to me that multiply / divide is used
so infrequently in general use that having hardware support is not
worthwhile. I can easily add a multiply or divide step instruction to
increase the performance of software multiply / divides
significantly, but why bother ? Increasing the performance of an
instruction that is executed only a fraction of a percentage of the
total instructions executed seems pointless. So why do other general
purpose processors provide hardware multiply / divide support ? Is
this just marketing ? I'm thinking the less hardware there is, the
better for performance...






(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )


RE: Multiplying, MicroBlaze - Tom Kerrigan - Apr 10 0:00:00 2001

When you're making a chip with 30+ M transistors, you might as well put in
multiply and divide. :)

-Tom

> -----Original Message-----
> From: [mailto:]
> Sent: Monday, April 09, 2001 10:57 PM
> To:
> Subject: [fpga-cpu] Multiplying, MicroBlaze > Wow, that Microblaze looks impressive. But I wonder how realisitic
> the architecture is and what got "scrapped". 800 LUTs is amazing. I
> wonder how fast the xr16 would execute in the same type of part.
>
> I'm wondering about the virtue of adding hardware multiply / divide
> support to a processor. It seems to me that multiply / divide is used
> so infrequently in general use that having hardware support is not
> worthwhile. I can easily add a multiply or divide step instruction to
> increase the performance of software multiply / divides
> significantly, but why bother ? Increasing the performance of an
> instruction that is executed only a fraction of a percentage of the
> total instructions executed seems pointless. So why do other general
> purpose processors provide hardware multiply / divide support ? Is
> this just marketing ? I'm thinking the less hardware there is, the
> better for performance... >
>
> To Post a message, send it to:
> To Unsubscribe, send a blank message to:





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Ben Franchuk - Apr 10 3:04:00 2001

Veronica Merryfield wrote:
>
> This reasoning is almost exactly on the same lines that CPUs were origonally
> introduced. Jan's work and WEB site mentions these and it is worth looking
> at the Intel history site. Breifly, a logic collection was made that
> operational code were fed to to perform complex logic functions that would
> have required toomuch dedicated logic to be economic, a specialise CPU. The
> driver over the years has been towards generalise CPUs (RISC/CISC/DSP) but
> Jan's work is showing that this does not have to be the case anymore with
> advances in silicon technology.

I expect advances will be in Custom I/O connected to a generic
CPU. Networking and other custom logic on the outside of the cpu
is where the action in development is. Still one has to remember the
FPGA market is a small market compared to the rest of the computer industry.
Ben.
--
"We do not inherit our time on this planet from our parents...
We borrow it from our children."
"Luna family of Octal Computers" http://www.jetnet.ab.ca/users/bfranchuk





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Veronica Merryfield - Apr 10 4:45:00 2001

A collection of bits from came together thus

>Of course, there are applications that use lots of multiplies. For a
>desktop processor, 3D graphics uses huge numbers of multiplies. In
>the microcontroller world, a friend of mine is building an audio mixer
>that digitally mixes several audio channels (with adjustable volumes);
>another very multiply-intensive application. I would argue that these are not general purpose CPU uses but specific or
specialised uses. These examples do use many multiples and are probably best
served by the creation of an instruction set that has single cycle mulitple
and accumulate and also some interesting looping modes to suit those
applications, perhaps also number scheme representations that work well,
which is exactly the course that DSP manufactures have taken and also that
of some FPGA vendors with DSP cores.

The argument extends to other specialised fields in that if you are
targetting a specific function set then it is wise to craft the instruction
set and supporting tools to best match that function set.

As I read Jan's work, he set out to demonstrate that it is possible to
implement a CPU in an FPGA and produce usefull work from it. If you are in
the market of making or wanting general purpose CPUs within FPGAs for what
ever reasons (Jan's work covers these), then it is more cost effective to
buy in the soft core and tools.

However, the real win here is that for a given specialised application for
those same reasons (quantity, upgradability etc) one has the knowledge that
is feasible to create the instruction set to suit the application along with
tools and refine these to achive the results in a very economic and flexible
manor.

This reasoning is almost exactly on the same lines that CPUs were origonally
introduced. Jan's work and WEB site mentions these and it is worth looking
at the Intel history site. Breifly, a logic collection was made that
operational code were fed to to perform complex logic functions that would
have required toomuch dedicated logic to be economic, a specialise CPU. The
driver over the years has been towards generalise CPUs (RISC/CISC/DSP) but
Jan's work is showing that this does not have to be the case anymore with
advances in silicon technology.

In summery, for general CPUs and thier spread of uses, I doubt a mulitple is
needed. For specailise application where it would be of benfit, certainly.

Veronica





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Kolja Sulimma - Apr 10 6:54:00 2001

If you have a godd compiler you are right. According to HP, 95% of all
multiplies are constant multiplies that can be reduce to a few adds and
shifts. (The Java API for example multiplies with 37 like crazy in its
hashtables)
That is called strength reduction.

If you have a stupid compiler, you end up with 1% or so multiplications in
your code.
If a multiplication step ist something like 5 cycles (shift, add, shift,
compare, jump),
and everything alse needs a single cycle, your runtimne in a 32-Bit system
goes by 159%.
(99 + 32*5 cycles)
With a 1 cycle multiplication step it is only an extra 31%.
(99 + 32 cycles)
A serial multiplier that stops early if the remaining multiplicant is 0
reduzes this to about 5%
(99 + 6 cycles with an everage of 6 bit operands)
A single cycle multiplier will improve this to 0%.
(99 + 1 cycle)

So: The best thing to have is a good compiler. It will achieve about 105% in
most benchmark without extra hardware.
Otherwise a single cycle multiplication step is very worthwile. Dedicated
multiplier only make sense if you do a lot of
arithmetic in your code.

CU,
Kolja

wrote:

> Wow, that Microblaze looks impressive. But I wonder how realisitic
> the architecture is and what got "scrapped". 800 LUTs is amazing. I
> wonder how fast the xr16 would execute in the same type of part.
>
> I'm wondering about the virtue of adding hardware multiply / divide
> support to a processor. It seems to me that multiply / divide is used
> so infrequently in general use that having hardware support is not
> worthwhile. I can easily add a multiply or divide step instruction to
> increase the performance of software multiply / divides
> significantly, but why bother ? Increasing the performance of an
> instruction that is executed only a fraction of a percentage of the
> total instructions executed seems pointless. So why do other general
> purpose processors provide hardware multiply / divide support ? Is
> this just marketing ? I'm thinking the less hardware there is, the
> better for performance...




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Author Unknown - Apr 10 13:23:00 2001

writes:

> I'm wondering about the virtue of adding hardware multiply / divide
> support to a processor. It seems to me that multiply / divide is used
> so infrequently in general use that having hardware support is not
> worthwhile. I can easily add a multiply or divide step instruction to
> increase the performance of software multiply / divides
> significantly, but why bother ? Increasing the performance of an
> instruction that is executed only a fraction of a percentage of the
> total instructions executed seems pointless. So why do other general
> purpose processors provide hardware multiply / divide support ? Is
> this just marketing ? I'm thinking the less hardware there is, the
> better for performance...

Of course, there are applications that use lots of multiplies. For a
desktop processor, 3D graphics uses huge numbers of multiplies. In
the microcontroller world, a friend of mine is building an audio mixer
that digitally mixes several audio channels (with adjustable volumes);
another very multiply-intensive application.

It seems like a good idea to spend some effort to make multiplies
fast, if you're trying to make a general-purpose processor.

Carl Witty





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Josh Fryman - Apr 10 18:40:00 2001

thank you oh so much for this lovely waste of space, bytes, and bandwidth.

please refrain from this when posting to mass groups. some of us are reading
via very slow connections, and this is extremely annoying. > ##
> ## # ## # #
> # # # # # # # #
> # # # # # # # #
> # # # # ## # #
> # # # # ## # # ## #####
> # # # # # # # # # # # # #
> # # ## # # # # # # # # # #
> # # # # # # # # ## # # ### #
> # # # # # # # # # # # ## #
> # # # # # # # # # # ##
> # ## # # # # # # # # # ## #
> # # # ## # # # # # # # # #### #
> # # # # # # # # # # # # # #
> # # # # # # # # # # # # # ##
> ## ### ## ## ## ## ####





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Author Unknown - Apr 10 19:16:00 2001

> In summery, for general CPUs and thier spread of uses, I doubt a mulitple is
> needed. For specailise application where it would be of benfit, certainly.
>
> Veronica

Well, that depends on what you call a general CPU. Perhaps a general
purpose CPU should be able to perform well on almost every field. Not
excellent, but at least well. So if I want to use it on 3D Geometry or to
process some audio signal and later on some word processing, perhaps I will
be wasting some of the CPU power in the latter, but I will thank it on the
first two.

By the way, I think a little CPU designed to fit in a low cost FPGA to
control some embeded system may well lost its multiplier, but then it IS a
specific purpose CPU. (Of course, I'm not criticizing Jan's CPU in this
paragraph)

Salutations, Mike.

##
## # ## # #
# # # # # # # #
# # # # # # # #
# # # # ## # #
# # # # ## # # ## #####
# # # # # # # # # # # # #
# # ## # # # # # # # # # #
# # # # # # # # ## # # ### #
# # # # # # # # # # # ## #
# # # # # # # # # # ##
# ## # # # # # # # # # ## #
# # # ## # # # # # # # # #### #
# # # # # # # # # # # # # #
# # # # # # # # # # # # # ##
## ### ## ## ## ## ####





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Author Unknown - Apr 10 20:16:00 2001

Oh, sorry, really. I forgot that stupid signature. I'll be more carefull
in the future.

Organization: CoC, GaTech
To:
From: Josh Fryman <>
Date sent: Tue, 10 Apr 2001 19:40:50 -0400
Send reply to:
Subject: Re: [fpga-cpu] Multiplying, MicroBlaze

> thank you oh so much for this lovely waste of space, bytes, and bandwidth.
>
> please refrain from this when posting to mass groups. some of us are reading
> via very slow connections, and this is extremely annoying.




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Ben Franchuk - Apr 10 20:29:00 2001

Jan Gray wrote:
> Xilinx has said that the forthcoming "10 M system gate" version of Virtex-II
> will require 500 M transistors.
> Jan Gray, Gray Research LLC

Any guess at the cost of the first one? $10K..$20k comes to mind.
A small Forth cpu is about 10,000 gates. Thats 1K of them in that
mammoth piece of silicon.A lot of power there.
Ben.
--
"We do not inherit our time on this planet from our parents...
We borrow it from our children."
"Luna family of Octal Computers" http://www.jetnet.ab.ca/users/bfranchuk





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Ben Franchuk - Apr 10 21:48:00 2001

Kolja Sulimma wrote:
>
> The amount of multiplication is overestimated. The point is that some kernels
> really depend on multiplication, but most applications do not really spend much
> time in them
The one place multiplication is hidden is in indexing variables, like foo[i].
Most cases this is a simple shift like 1,2,4x but if foo is a array
of structures like stuct foobar foo[k]; you have to have a multiplication.
Ben.
--
"We do not inherit our time on this planet from our parents...
We borrow it from our children."
"Luna family of Octal Computers" http://www.jetnet.ab.ca/users/bfranchuk





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

RE: Multiplying, MicroBlaze - Jan Gray - Apr 10 22:43:00 2001

> Wow, that Microblaze looks impressive. But I wonder how realisitic
> the architecture is and what got "scrapped". 800 LUTs is amazing. I
> wonder how fast the xr16 would execute in the same type of part.

The execute stage would go at approximately the same frequency. Control
might need to be retimed for the faster interconnect relative to logic. The
xr16's 16-bit ISA however would probably not get as much work done per cycle
as a 32-bit instruction word architecture. Remember every imm instruction
is in some sense a wasted issue slot.

> I'm wondering about the virtue of adding hardware multiply / divide
> support to a processor.

Multiply is the bottleneck in some codes, especially in signal processing.
Think sums of weighted inputs; each weighting is a multiplication.

One reason that FPGAs are good at DSP, is these weight coefficients are
constants, and each multiply by constant can be strength-reduced into a
series of adds of certain taps of the input. Even so, Xilinx apparently
thought variable multiplies are so important that Virtex-II provides a fast
18x18=36-bit hard multiplier at each 18 Kb block RAM site: 4 in a 2V40; 40
in a 2V1000; 144 in a 2V6000.

Also note, it is possible to use a multiplier as a limited barrel shifter.
(Barrel shifters are relatively expensive to implement in FPGAs.) Also it
may therefore be possible to use multipliers to perform operand
denormalization and result normalization for floating point addition.

Jan Gray, Gray Research LLC




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

RE: Multiplying, MicroBlaze - Jan Gray - Apr 10 22:44:00 2001

Tom Kerrigan wrote
> When you're making a chip with 30+ M transistors, you might as well put in
> multiply and divide. :)

Xilinx has said that the forthcoming "10 M system gate" version of Virtex-II
will require 500 M transistors.

http://www.eetimes.com/story/OEG20000522S0025

Jan Gray, Gray Research LLC




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Kolja Sulimma - Apr 11 1:28:00 2001



wrote:

> > In summery, for general CPUs and thier spread of uses, I doubt a mulitple is
> > needed. For specailise application where it would be of benfit, certainly.
> >
> > Veronica
>
> Well, that depends on what you call a general CPU. Perhaps a general
> purpose CPU should be able to perform well on almost every field. Not
> excellent, but at least well. So if I want to use it on 3D Geometry or to
> process some audio signal and later on some word processing, perhaps I will
> be wasting some of the CPU power in the latter, but I will thank it on the
> first two.

For the audio processing, strength reduction would do fine, no dedicated
multiplier needed there.
(See also Jans later posting)

> By the way, I think a little CPU designed to fit in a low cost FPGA to

> control some embeded system may well lost its multiplier, but then it IS a
> specific purpose CPU. (Of course, I'm not criticizing Jan's CPU in this
> paragraph)

As I said before, there are some very nice publications by HP why you usually do
not need a multiplier
in integer CPUs. If I recall correctly HP-PA had no integer multiplier up to the
PA8500, and I would not
call a PA8200 a specific purpose CPU.
Also remember, that the i386, which is much larger than a xr16, needed something
like 17 cycles for a multiplications.
68020 needed more than 50 cycles, and so on.

The amount of multiplication is overestimated. The point is that some kernels
really depend on multiplication, but most applications do not really spend much
time in them

I just did a trace on a Jmpg123, an mp3 decoder, and it only has 6% multiplies.
Most of these are constants
(windowing function, etc.) that can be removed by strength reduction.
I can redo my previous calculation for this case, and report the relative
performance for various implementations:
1000 cycles without multiplication support (32*5 cycle mulitplication)
400 cycles with strength reduction
290 cycles with repeat instruction and multiply step
190 cycles with above and strengt reduction
280 cycles with 32 cycle multiplier
200 cycles with i386 multiplier
106 cycles with single cycle multiplier

All this assumes 0 wait state memory, do pipeline stall. Both of which would
reduce the multiplier merit.

This means, in mp3 decoding you get a dactor of 2.5 by using strength reduction.
You get another factor of 2 by adding a multiplication step instruction or a 32
cycle multiplier.
The single cycle multiplier gain another factor of 1.8

The first step is free, the second step is cheap. The third step is very expensive
and will also hurt your cycle time.
(In Virtex a single cycle multiply is more than 20ns)
Two processors with only step 1 and 2 implemented are likely to be faster and
smaller.

CU,
Kolja
Most of these





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Kolja Sulimma - Apr 11 12:23:00 2001

> The one place multiplication is hidden is in indexing variables, like foo[i].
> Most cases this is a simple shift like 1,2,4x but if foo is a array
> of structures like stuct foobar foo[k]; you have to have a multiplication.
> Ben.

A structure would still imply only a constant coefficient multiply which is only a
couple of cycles
anyway if you use lea and shift instructions.
99% of the 16 Bit constant multiplies can be done with 4 additions.
Some compilers optionally align structures to powers of 2.

Arrays of arrays is the intresting stuff.

Kolja




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Kolja Sulimma - Apr 11 12:58:00 2001

> This turns 'mulc rd,ra,6' into
> mov r1,ra
> mov rd,r0
> slli r1,1
> add rd,r1,rd
> slli r1,1
> add rd,r1,rd
>
> Much better than calling _mulu2 or whatever. A single shift+add instruction
> would have been nice but probably not worth the extra area (10% as usual).

This would do the same:

add rd, ra, ra
add rd, rd, ra
slli rd, 1

Finding optimum addition chains is NP-Complete, but with dynamic programming you can find all
chains
for 16 Bit constant multiplies in a couple of minutes.

If a constant has length n with x bits set to one, your code creates n+x+2 instructions.

Here is my small contribution to XSOC.
Untested code that creates only n + x - 1 instructions for the same task.

However, one can write very simple code that uses subtractions to uses allways less than
1.5*n instructions.(If there are more 0s than 1s use subtractions instead of addions.) case MULC:
/* mulc rd,ra,k => mov r1,ra || mov rd,r0 || { [add rd,r1,rd] || slli r1,1 }* */
if (!parse(&p, REG, &rd, ',', REG, &ra, ',', 0) ||
!constant(&p, &con) || !parse(&p, EOL, 0))
continue;

//check for zero
if (con == 0) {
move(rd.u.reg, 0);
break;
}

//find first bit set
while (! (con & bitvalue))
bitvalue >>=1;

mov(rd.u.reg, ra.u.reg);
bitvalue >>= 1;

while ((bitvalue >>= 1) > 0) {
insn(SLLI, INSN_RD(1) | INSN_I4(1));
if (con&bitvalue)
insn(ADD, INSN_RD(rd.u.reg) | INSN_RA(1) | INSN_RB(rd.u.reg));
}
break;




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Author Unknown - Apr 12 18:49:00 2001

Kolja Sulimma <> writes:

> For the audio processing, strength reduction would do fine, no dedicated
> multiplier needed there.
> (See also Jans later posting)

I don't understand that. Surely for something like a mixer, you need
multiplication? (Unless you do something exotic like dynamic
recompilation, which seems a little heavyweight for an embedded
system.)

Carl Witty





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Multiplying, MicroBlaze - Kolja Sulimma - Apr 13 4:28:00 2001



wrote:

> Kolja Sulimma <> writes:
>
> > For the audio processing, strength reduction would do fine, no dedicated
> > multiplier needed there.
> > (See also Jans later posting)
>
> I don't understand that. Surely for something like a mixer, you need
> multiplication? (Unless you do something exotic like dynamic
> recompilation, which seems a little heavyweight for an embedded
> system.)

A mixer is only one multiplication per channel per sample.
That's less than 100k multiplies per second for two stereo channels.
But the equalizer, reverb, etc. can do without. They usually only need one
variable multiplier each for the gain.
Of course if this stuff really starts to dominate your system, you should at
some dsp features to cour cpu.

CU,
Kolja




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )