FPU vs soft library vs. fixed point| page 3

Reply by Theo Markettos ●May 26, 20142014-05-26

Don Y <this@is.not.me.com> wrote:
> Fixed point solutions mean a lot more up-front work verifying
> no loss of precision throughout the calculations.  Do-able but
> a nightmare for anyone having to maintain the codebase.

One of my colleagues has a C++ library that does precision checking through
the calculations and tells you at which point precision was lost.  Whne
you're happy and ready to go to production, you just turn off the checking
and it's optimised away to just doing the calculations.

That makes it easier to maintain than having a separate fixed-point
codebase.

> So, for a specific question:  anyone have any *real* metrics
> regarding how efficient (power, cost) hardware FPU (or not!)
> is in FP-intensive applications?  (by "FP-intensive", assume
> 20% of the operations performed by the processor fall into
> that category).

It's something we've been looking at (how to do scientific compute on very
constrained processors), but we've been focusing more on accelerating the
fixed point than FP side so don't have any numbers to hand.

Theo

Reply by Don Y ●May 26, 20142014-05-26

Hi,

On 5/25/2014 11:31 PM, upsidedown@downunder.com wrote:
> On Sun, 25 May 2014 13:25:40 -0700, Don Y<this@is.not.me.com>  wrote:
>
>> I'm exploring tradeoffs in implementation of some computationally
>> expensive routines.
>>
>> The easy (coding) solution is to just use doubles everywhere and
>> *assume* the noise floor is sufficiently far down that the ulp's
>> don't impact the results in any meaningful way.
>>
>> But, that requires either hardware support (FPU) or a library
>> implementation or some "trickery" on my part.
>>
>> Hardware FPU adds cost and increases average power consumption
>> (for a generic workload).  It also limits the choices I have
>> (severe cost and space constraints).
>
> Also verify that the FP also supports 64 bits in hardware, not just 32
> bits.

Yes, most cheaper "general purpose" processors don't (though
there seems to be some movement in this direction, of late).

>> OTOH, a straight-forward library implementation burns more CPU
>> cycles to achieve the same result.  Eventually, I will have to
>> instrument a design to see where the tipping point lies -- how
>> many transistors are switching in each case, etc.
>
> If you do not need strict IEEE float/double conformance and can live
> without denormals, infinity and NaN cases, those libraries will
> somewhat be simplified.

You can also *only* support the floating point operators that you
really *need*.  E.g., type conversion, various rounding modes, etc.
But, most significantly, you can adjust the precision to what
you'll need *where* you need it.  And, "unwrap" the various
routines so that you only do the work that you need to do, *now*
(instead of returning a "genuine float/double" at the end of each
operation).

>> Fixed point solutions mean a lot more up-front work verifying
>> no loss of precision throughout the calculations.  Do-able but
>> a nightmare for anyone having to maintain the codebase.
>
> Perhaps some other FP format would be suitable for emulation like the
> 48 bit (6 byte) Borland Turbo Pascal Real data type, which uses the
> integer arithmetic more efficiently.

Yes, I've also been exploring use of rationals in some parts of
the computation.  As I said, tweek the operations to the specific
needs of *this* data (instead of trying to handle *any* possible
set of data).

I suspect something more akin to arbitrary (though driven) precision
will work -- hopefully without having to code too many variants.

> One needs to look careful at the integer instruction set of the
> processor. FMUL is easy, it just needs a fast NxN integer
> multiplication and some auxiliary instructions. FADD/FSUB are more
> complicated, requiring to have a fast shift right by a variable number
> of bits for denormalization and a fast find-first-bit-set instruction
> for normalization. Without these instructions, you may have to do up
> to 64 iteration cycles in a loop with a shift right/left instruction
> and some conditional instructions, which can take a lot of time.

I've written several floating point packages over the past 35+
years so I'm well aware of the mechanics -- as well as a reasonable
set of tricks to work around particular processor shortcomings
(e.g., in the 70's, even an 8x8 MUL was a fantasy in most processors).

You can gain a lot by rethinking the operations you *need* to
perform and alternative (equivalent) forms for them that are
more tolerant of reduced precision, less prone to cancellation,
etc.  E.g., the resonators in the speech synthesizer are
particularly vulnerable at lower frequencies or higher bandwidths
in a "classic" IIR implementation.  So, certain formants have been
reimplemented in alternate/equivalent (though computationally
more complicated -- but not *expensive*) forms to economize there.

[I.e., do *more* to "cost less"]

But, to date, I have been able to move any FP operations out of
time-critical portions of code.  So, my FP implementations could
concentrate on being *small* (code/data) without having to worry
about execution speed.

Now, I'd (ideally) like NOT to worry about speed and still avail
myself of their "ease of use" (wrt support efforts).

Thx,
--don

Reply by Don Y ●May 26, 20142014-05-26

Hi Theo,

On 5/26/2014 9:56 AM, Theo Markettos wrote:
> Don Y<this@is.not.me.com>  wrote:
>> Fixed point solutions mean a lot more up-front work verifying
>> no loss of precision throughout the calculations.  Do-able but
>> a nightmare for anyone having to maintain the codebase.
>
> One of my colleagues has a C++ library that does precision checking through
> the calculations and tells you at which point precision was lost.  Whne
> you're happy and ready to go to production, you just turn off the checking
> and it's optimised away to just doing the calculations.

Not sure I follow.  :<  It's a FP library?  Overloaded operators?
How does *it* know what's optimal?  Or, is it an arbitrary precision
library that *maintains* the required precision, dynamically (so
you never "lose" precision).

Is this something homegrown or is it commercially available?
(I imagine it isn't particularly fast?)

[One of the drool factors of C++'s syntax IMO is operator overloading.
It's *so* much nicer to be able to read something in infix notation
even though the underlying data representation may be completely
different from what the user thinks!]

> That makes it easier to maintain than having a separate fixed-point
> codebase.
>
>> So, for a specific question:  anyone have any *real* metrics
>> regarding how efficient (power, cost) hardware FPU (or not!)
>> is in FP-intensive applications?  (by "FP-intensive", assume
>> 20% of the operations performed by the processor fall into
>> that category).
>
> It's something we've been looking at (how to do scientific compute on very
> constrained processors), but we've been focusing more on accelerating the
> fixed point than FP side so don't have any numbers to hand.

Exactly.  Find a way to move things into the fixed point realm (not
necessarily integers) just to take advantage of the native integer
operations in all processors.

But, often this requires a fair bit of magic and doesn't stand up
to "casual" maintenance (because folks don't bother to understand
the code they are poking with a stick to "maintain").

How do *you* avoid these sorts of screwups?  Or, do you decompose
the algorithms in such a way that there are no sirens calling to
future maintainers to "just tweek this constant" (in a way that
they don't full understand)?

Thx,
--don

Reply by Theo Markettos ●May 26, 20142014-05-26

Don Y <this@is.not.me.com> wrote:
> Hi Theo,
> 
> On 5/26/2014 9:56 AM, Theo Markettos wrote:
> > One of my colleagues has a C++ library that does precision checking
> > through the calculations and tells you at which point precision was
> > lost.  Whne you're happy and ready to go to production, you just turn
> > off the checking and it's optimised away to just doing the calculations.
> 
> Not sure I follow.  :<  It's a FP library?  Overloaded operators?
> How does *it* know what's optimal?  Or, is it an arbitrary precision
> library that *maintains* the required precision, dynamically (so
> you never "lose" precision).

It's a fixed point library that follows the precision through the
calculations (so if you multiply an 4.12 number by a 10.6 number you get a
14.18 number, which then propagates through), and takes note of where any
over/underflows occur.  Each variable is constrained (eg voltage = 0 to
100mV, decided by the limits of the physical system being modelled) so we
know what 'sensible' values are.

So you can run your algorithm in testing mode (slowly) and see what happens
to the arithmetic, and then flip the switch that turns off all the checks
when you want to run for real.

> Is this something homegrown or is it commercially available?
> (I imagine it isn't particularly fast?)

I think it's intended to be open source, but he's currently away so I'm
unclear of current status.

> Exactly.  Find a way to move things into the fixed point realm (not
> necessarily integers) just to take advantage of the native integer
> operations in all processors.
> 
> But, often this requires a fair bit of magic and doesn't stand up
> to "casual" maintenance (because folks don't bother to understand
> the code they are poking with a stick to "maintain").
> 
> How do *you* avoid these sorts of screwups?  Or, do you decompose
> the algorithms in such a way that there are no sirens calling to
> future maintainers to "just tweek this constant" (in a way that
> they don't full understand)?

Our answer is save the scientists (in this case) from writing low level
code.  Give them a representation they understand (eg differential equations
with units), give them enough knobs to tweak (eg pick the numerical
methods), and then generate code for whatever target platform it's intended
to run on.  CS people do the engineering of that, scientists do their
science (which we aren't domain experts in).  That means both ends of the
toolchain are more maintainable.

Theo

Reply by Don Y ●May 26, 20142014-05-26

Hi Theo,

On 5/26/2014 2:28 PM, Theo Markettos wrote:
> Don Y<this@is.not.me.com>  wrote:
>> On 5/26/2014 9:56 AM, Theo Markettos wrote:
>>> One of my colleagues has a C++ library that does precision checking
>>> through the calculations and tells you at which point precision was
>>> lost.  Whne you're happy and ready to go to production, you just turn
>>> off the checking and it's optimised away to just doing the calculations.
>>
>> Not sure I follow.  :<   It's a FP library?  Overloaded operators?
>> How does *it* know what's optimal?  Or, is it an arbitrary precision
>> library that *maintains* the required precision, dynamically (so
>> you never "lose" precision).
>
> It's a fixed point library that follows the precision through the
> calculations (so if you multiply an 4.12 number by a 10.6 number you get a
> 14.18 number, which then propagates through), and takes note of where any
> over/underflows occur.

So, each variable is tagged with it's Q notation?  This is evaluated by
run-time (conditionally enabled?) code?  Or, compiler conditionals?

> Each variable is constrained (eg voltage = 0 to
> 100mV, decided by the limits of the physical system being modelled) so we
> know what 'sensible' values are.

So, the "package" is essentially examining how the limits on
each value/variable -- in the stated Q form -- are affected by
the operations?  E.g., a Q7.9 value of 100 can't be SAFELY doubled
(without moving to Q8.8 or whatever).

The developer, presumably, frowns when told of these things and
makes adjustments, accordingly (to Q form, limits, operators, etc.)
to achieve what he needs?

> So you can run your algorithm in testing mode (slowly) and see what happens
> to the arithmetic, and then flip the switch that turns off all the checks
> when you want to run for real.

"Slowly" because you just want to see the "report" of each operation
(without having to develop yet another tool that collects that data
from a run and reports it ex post factum).

>> Is this something homegrown or is it commercially available?
>> (I imagine it isn't particularly fast?)
>
> I think it's intended to be open source, but he's currently away so I'm
> unclear of current status.

OK.  I'd be curious to see even a "specification" of its functionality,
if he consents (if need be, via PM)

>> Exactly.  Find a way to move things into the fixed point realm (not
>> necessarily integers) just to take advantage of the native integer
>> operations in all processors.
>>
>> But, often this requires a fair bit of magic and doesn't stand up
>> to "casual" maintenance (because folks don't bother to understand
>> the code they are poking with a stick to "maintain").
>>
>> How do *you* avoid these sorts of screwups?  Or, do you decompose
>> the algorithms in such a way that there are no sirens calling to
>> future maintainers to "just tweek this constant" (in a way that
>> they don't full understand)?
>
> Our answer is save the scientists (in this case) from writing low level
> code.  Give them a representation they understand (eg differential equations
> with units), give them enough knobs to tweak (eg pick the numerical
> methods), and then generate code for whatever target platform it's intended
> to run on.  CS people do the engineering of that, scientists do their
> science (which we aren't domain experts in).  That means both ends of the
> toolchain are more maintainable.

So, you expose the wacky encoding formats to the "scientists" (users)?
Or, do you have a set of mapping functions on the front and back ends
that make the data "look nice" for the users (so they don't have to
understand the formats).

But, you still have the maintenance issue -- how do you keep "CS folks"
from making careless changes to the code without fully understanding
what's going on?  E.g., *forcing* them to run their code in this
"test mode" (to ensure no possibility of overflow even after they
changed the 1.2 scalar to 1.3 in the example above)?

[Or, do you just insist on competence?]

Reply by George Neuner ●May 27, 20142014-05-27

On Sun, 25 May 2014 23:20:15 -0700, Don Y <this@is.not.me.com> wrote:

>Hey George!
>
>Finally warm up, (and dry out) there??  :>  Broke 100F last week... :<
>July's gonna be a bitch!

Temps are just getting back to normal (60-70F) - it's been a cold
Spring.  Unlike most of the East, we're running about average for wet
weather here: predictably rains every trash day and most weekends 8-)

>On 5/25/2014 10:06 PM, George Neuner wrote:
>> On Sun, 25 May 2014 13:25:40 -0700, Don Y<this@is.not.me.com>  wrote:
>>
>> They are discussing a closely related question involving the tradeoffs
>> between providing an all-up FPU (e.g., IEEE-754) vs providing
>> sub-units and allowing software to drive them.  I haven't followed the
>> whole thread [it's wandering a lot (even for c.a.)] but there's been
>> some mentions of break even points of HW vs SW for general code.
>
>I think you can get even finer in choosing how little you implement
>based on application domain (of course, I haven't yet read their claims
>but still assume they are operating within some "rational" sense of
>partitioning... e.g., not willing to allow the actual number format
>to be rendered arbitrary, etc.)

The discussion typically starts with "IEEE-754 is broken because ..."
and revolves around the design of the FPU and the memory systems to
feed it: debating implementing an all-up FPU with fixed operation vs
exposing the required sub-units and allowing software use them as
desired - and in the process getting faster operation, better result
consistency, permitting different rounding and error modes, etc.

This is a frequently recurring topic in c.a - if you poke around a bit
you'll find quite a number of posts about it.

George

Reply by Don Y ●May 27, 20142014-05-27

Hi George,

> Temps are just getting back to normal (60-70F) - it's been a cold
> Spring.

Um, wasn't it a cold *winter*?  :>

> Unlike most of the East, we're running about average for wet
> weather here: predictably rains every trash day and most weekends 8-)

I suspect we *might* see our first rain of the *year* sometime in
July.  I don't think we had *any* this Winter.  Then, again, I don't
think we ever saw a "freezing" temperature!  (which was great for
last year's citrus crop but seems to be creating problems for this
*next* crop)

>>> They are discussing a closely related question involving the tradeoffs
>>> between providing an all-up FPU (e.g., IEEE-754) vs providing
>>> sub-units and allowing software to drive them.  I haven't followed the
>>> whole thread [it's wandering a lot (even for c.a.)] but there's been
>>> some mentions of break even points of HW vs SW for general code.
>>
>> I think you can get even finer in choosing how little you implement
>> based on application domain (of course, I haven't yet read their claims
>> but still assume they are operating within some "rational" sense of
>> partitioning... e.g., not willing to allow the actual number format
>> to be rendered arbitrary, etc.)
>
> The discussion typically starts with "IEEE-754 is broken because ..."
> and revolves around the design of the FPU and the memory systems to
> feed it: debating implementing an all-up FPU with fixed operation vs
> exposing the required sub-units and allowing software use them as
> desired - and in the process getting faster operation, better result
> consistency, permitting different rounding and error modes, etc.

<frown>  Sounds like monday morning quarterback.  OTOH (the *fourth*
one!), hopefully *someone* is thinking about these issues -- preferably
people in a position to actually *do* something about it!

I read through most of the thread (sheesh!  talk about *long*!).
The big thing I came away with is how these people are missing
the boat!  Like designing bigger bumpers on cars instead of
figuring out how to get BETTER DRIVERS!

> This is a frequently recurring topic in c.a - if you poke around a bit
> you'll find quite a number of posts about it.

Thx!
--don

Previous 1 23Next

FPU vs soft library vs. fixed point

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group