estimating CPU load /MFLOPS for software emulation of floating point| page 2

Reply by Everett M. Greene ●December 19, 20032003-12-19

(Nick Maclaren) writes:
> Christopher Holmes <a_team_of_scientists@yahoo.com> wrote:
> >In an upcoming hardware design I'm thinking about using a CPU without
> >a floating point unit.  The application uses floating point numbers,
> >so I'll have to do software emulation.  However, I can't seem to find
> >any information on how long these operations might take in software. 
> >I'm trying to figure out how much processing power I need & choose an
> >appropriate CPU.
> >
> >I have plently of info on MIPS ratings for the CPU's, and I figured
> >out how many MFLOPS my application needs, but how do I figure out how
> >many MIPS it takes to do so many MFLOPS?
> >
> >Does anyone know of any info resources or methods?
> 
> Lots of the latter, but the former are mostly in people's heads or
> on paper.  Old paper.
> 
> If you want to emulate a hardware floating-point format, you are
> talking hundreds of instructions or more, depending on how clever
> you are and the interface you use.  If you merely want to implement
> floating-point in software, then you can get it down to tens of
> instructions.  For example, holding floating-point numbers as a
> structure designed for software, like:
> 
>     struct (unsigned long mantissa, int exponent, unsigned char sign)
> 
> is VASTLY easier than emulating IEEE.  It's still thoroughly messy.

And speaking of emulating IEEE 754 float operations, speed and
code size go south in a big hurry if infinities, denormalized
numbers, NaNs, and rounding are handled properly.  Add some
more adverse impact if double-precision float is implemented
instead of or in addition to the usual single-precision float.

Regardless, MFLOPS will be measured in fractions and quite
small fractions at that.  Any relation between MIPS and MFLOPS
will be purely coincidental.

Reply by Mike Cowlishaw ●December 19, 20032003-12-19

Everett M. Greene wrote:
> And speaking of emulating IEEE 754 float operations, speed and
> code size go south in a big hurry if infinities, denormalized
> numbers, NaNs, and rounding are handled properly.

Those are rare cases -- affect code size, yes, but only a small effect on
speed.

> Regardless, MFLOPS will be measured in fractions and quite
> small fractions at that.  Any relation between MIPS and MFLOPS
> will be purely coincidental.

I would expect them to be linearly related.

Mike Cowlishaw

Reply by Terje Mathisen ●December 19, 20032003-12-19

Paul Keinanen wrote:
> However, even if you would have to normalize a 64 bit mantissa with an
> 8 bit processor, you could first test in which byte the first "1" bit
> is located and by byte copying (or preferably pointer arithmetic) move
> that byte to the beginning of the result. After that you have to
> perform 1-7 full sized (64 bit) left shift operations (or 1-4 bit
> left/right shifts) to get into correct positions. Rounding requires up
> to 8 adds with carry.
> 
> Even so, I very much doubt that you would require more than 100
> instruction in addition to the actual integer multiply/add/sub
> operation with the same operand sizes. 

I've done something quite similar when I implemented a full 128-bit fp 
library, based on 32-bit Pentium asm.

I used a slightly non-standard approach, in that I used a 1:31:96 format 
for my numbers, instead of 1:15:112 which is sort-of-standard.

A hw version should at least use a mantissa with more than twice as many 
bits as a double, so 107 bits would be the minimum.
> 
> An 8 by 8 bit multiply instruction would reduce the computational load
> considerably.

If you don't have even that, but a little room in ram,, then I suggest a 
table of squares.

Terje

-- 
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Reply by Christopher Holmes ●December 19, 20032003-12-19

What is "block floating point"?  

nmm1@cus.cam.ac.uk (Nick Maclaren) wrote in message news:<brtdsg$ga$1@pegasus.csx.cam.ac.uk>...
> In article <pan.2003.12.18.21.34.59.500214@gurney.reilly.home>,
> Andrew Reilly  <andrew@gurney.reilly.home> wrote:
> >
> >Why would you muck about with a separate sign, rather than just using a
> >signed mantissa, for a non-standard software implementation?  Does it buy
> >you something in terms of speed?  Precision, I guess, given that long is
> >only 32 bits on many systems, and few have 64x64->128 integer multipliers
> >anyway.  The OP didn't say what the application was, so it's hard to say
> >whether more than 32 bits of mantissa would be needed.
> 
> It buys some convenience, and probably a couple of instructions fewer
> for some operations.  Not a big deal.
> 
> >Frankly, he's almost certainly going to be able to translate to
> >fixed-point or block-floating-point anyway, and not bother with the
> >per-value exponent field.  That's what all of the "multi-media"
> >applications that run on integer-only ARM, MIPS, SH-RISC etc do.  Modern
> >versions of these chips all have strong (low latency, pipelined) integer
> >multipliers, so performance can be quite good.
> 
> See "scaling" in any good 1930s book on numerical analysis :-)
> 
> 
> Regards,
> Nick Maclaren.

Reply by Nick Maclaren ●December 19, 20032003-12-19

In article <brvd9l$nti$1@news.btv.ibm.com>,
Mike Cowlishaw <mfcowli@attglobal.net> wrote:
>Everett M. Greene wrote:
>> And speaking of emulating IEEE 754 float operations, speed and
>> code size go south in a big hurry if infinities, denormalized
>> numbers, NaNs, and rounding are handled properly.
>
>Those are rare cases -- affect code size, yes, but only a small effect on
>speed.

Regrettably not :-(

That has been stated for years, but isn't true.  Yes, it is true, if
measured over the space of all applications on all data.  No, it is
not true for all analyses, even excluding perverse and specially
selected ones.  It isn't all that rare to get into a situation where
5-10% of all floating-point calculations are in a problem area (i.e.
underflowing or denormalised), despite the data and results being
well scaled.

>> Regardless, MFLOPS will be measured in fractions and quite
>> small fractions at that.  Any relation between MIPS and MFLOPS
>> will be purely coincidental.
>
>I would expect them to be linearly related.

Yes and no.  They are only if the characteristics of the machine
remains constant.  As branch misprediction becomes more serious,
MFlops degrades relative to MIPS.

Regards,
Nick Maclaren.

Reply by Everett M. Greene ●December 20, 20032003-12-20

"Mike Cowlishaw" <mfcowli@attglobal.net> writes:
> Everett M. Greene wrote:
> > And speaking of emulating IEEE 754 float operations, speed and
> > code size go south in a big hurry if infinities, denormalized
> > numbers, NaNs, and rounding are handled properly.
> 
> Those are rare cases -- affect code size, yes, but only a small
> effect on speed.

But every operation pays the price of checking for the rare
values whether they are present/occur or not.

> > Regardless, MFLOPS will be measured in fractions and quite
> > small fractions at that.  Any relation between MIPS and MFLOPS
> > will be purely coincidental.
> 
> I would expect them to be linearly related.

Not across processor families...

Reply by Terje Mathisen ●December 21, 20032003-12-21

Everett M. Greene wrote:
> "Mike Cowlishaw" <mfcowli@attglobal.net> writes:
> 
>>Everett M. Greene wrote:
>>
>>>And speaking of emulating IEEE 754 float operations, speed and
>>>code size go south in a big hurry if infinities, denormalized
>>>numbers, NaNs, and rounding are handled properly.
>>
>>Those are rare cases -- affect code size, yes, but only a small
>>effect on speed.
> 
> But every operation pays the price of checking for the rare
> values whether they are present/occur or not.

Not really:

All the special values (NaN, Inf, Zero and Denorm) can be handled (at 
least approximately) with a simple test of the exponent field, before 
falling through with the normal case.

Since the Denorms all would be included in the normal 
'Special_exponent()' test, the overhead is only in the fixup part.

Terje

-- 
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Previous 12Next

estimating CPU load /MFLOPS for software emulation of floating point

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group