EmbeddedRelated.com
Forums

FPU vs soft library vs. fixed point

Started by Don Y May 25, 2014
Don Y <this@is.not.me.com> wrote:
> Fixed point solutions mean a lot more up-front work verifying > no loss of precision throughout the calculations. Do-able but > a nightmare for anyone having to maintain the codebase.
One of my colleagues has a C++ library that does precision checking through the calculations and tells you at which point precision was lost. Whne you're happy and ready to go to production, you just turn off the checking and it's optimised away to just doing the calculations. That makes it easier to maintain than having a separate fixed-point codebase.
> So, for a specific question: anyone have any *real* metrics > regarding how efficient (power, cost) hardware FPU (or not!) > is in FP-intensive applications? (by "FP-intensive", assume > 20% of the operations performed by the processor fall into > that category).
It's something we've been looking at (how to do scientific compute on very constrained processors), but we've been focusing more on accelerating the fixed point than FP side so don't have any numbers to hand. Theo
Hi,

On 5/25/2014 11:31 PM, upsidedown@downunder.com wrote:
> On Sun, 25 May 2014 13:25:40 -0700, Don Y<this@is.not.me.com> wrote: > >> I'm exploring tradeoffs in implementation of some computationally >> expensive routines. >> >> The easy (coding) solution is to just use doubles everywhere and >> *assume* the noise floor is sufficiently far down that the ulp's >> don't impact the results in any meaningful way. >> >> But, that requires either hardware support (FPU) or a library >> implementation or some "trickery" on my part. >> >> Hardware FPU adds cost and increases average power consumption >> (for a generic workload). It also limits the choices I have >> (severe cost and space constraints). > > Also verify that the FP also supports 64 bits in hardware, not just 32 > bits.
Yes, most cheaper "general purpose" processors don't (though there seems to be some movement in this direction, of late).
>> OTOH, a straight-forward library implementation burns more CPU >> cycles to achieve the same result. Eventually, I will have to >> instrument a design to see where the tipping point lies -- how >> many transistors are switching in each case, etc. > > If you do not need strict IEEE float/double conformance and can live > without denormals, infinity and NaN cases, those libraries will > somewhat be simplified.
You can also *only* support the floating point operators that you really *need*. E.g., type conversion, various rounding modes, etc. But, most significantly, you can adjust the precision to what you'll need *where* you need it. And, "unwrap" the various routines so that you only do the work that you need to do, *now* (instead of returning a "genuine float/double" at the end of each operation).
>> Fixed point solutions mean a lot more up-front work verifying >> no loss of precision throughout the calculations. Do-able but >> a nightmare for anyone having to maintain the codebase. > > Perhaps some other FP format would be suitable for emulation like the > 48 bit (6 byte) Borland Turbo Pascal Real data type, which uses the > integer arithmetic more efficiently.
Yes, I've also been exploring use of rationals in some parts of the computation. As I said, tweek the operations to the specific needs of *this* data (instead of trying to handle *any* possible set of data). I suspect something more akin to arbitrary (though driven) precision will work -- hopefully without having to code too many variants.
> One needs to look careful at the integer instruction set of the > processor. FMUL is easy, it just needs a fast NxN integer > multiplication and some auxiliary instructions. FADD/FSUB are more > complicated, requiring to have a fast shift right by a variable number > of bits for denormalization and a fast find-first-bit-set instruction > for normalization. Without these instructions, you may have to do up > to 64 iteration cycles in a loop with a shift right/left instruction > and some conditional instructions, which can take a lot of time.
I've written several floating point packages over the past 35+ years so I'm well aware of the mechanics -- as well as a reasonable set of tricks to work around particular processor shortcomings (e.g., in the 70's, even an 8x8 MUL was a fantasy in most processors). You can gain a lot by rethinking the operations you *need* to perform and alternative (equivalent) forms for them that are more tolerant of reduced precision, less prone to cancellation, etc. E.g., the resonators in the speech synthesizer are particularly vulnerable at lower frequencies or higher bandwidths in a "classic" IIR implementation. So, certain formants have been reimplemented in alternate/equivalent (though computationally more complicated -- but not *expensive*) forms to economize there. [I.e., do *more* to "cost less"] But, to date, I have been able to move any FP operations out of time-critical portions of code. So, my FP implementations could concentrate on being *small* (code/data) without having to worry about execution speed. Now, I'd (ideally) like NOT to worry about speed and still avail myself of their "ease of use" (wrt support efforts). Thx, --don
Hi Theo,

On 5/26/2014 9:56 AM, Theo Markettos wrote:
> Don Y<this@is.not.me.com> wrote: >> Fixed point solutions mean a lot more up-front work verifying >> no loss of precision throughout the calculations. Do-able but >> a nightmare for anyone having to maintain the codebase. > > One of my colleagues has a C++ library that does precision checking through > the calculations and tells you at which point precision was lost. Whne > you're happy and ready to go to production, you just turn off the checking > and it's optimised away to just doing the calculations.
Not sure I follow. :< It's a FP library? Overloaded operators? How does *it* know what's optimal? Or, is it an arbitrary precision library that *maintains* the required precision, dynamically (so you never "lose" precision). Is this something homegrown or is it commercially available? (I imagine it isn't particularly fast?) [One of the drool factors of C++'s syntax IMO is operator overloading. It's *so* much nicer to be able to read something in infix notation even though the underlying data representation may be completely different from what the user thinks!]
> That makes it easier to maintain than having a separate fixed-point > codebase. > >> So, for a specific question: anyone have any *real* metrics >> regarding how efficient (power, cost) hardware FPU (or not!) >> is in FP-intensive applications? (by "FP-intensive", assume >> 20% of the operations performed by the processor fall into >> that category). > > It's something we've been looking at (how to do scientific compute on very > constrained processors), but we've been focusing more on accelerating the > fixed point than FP side so don't have any numbers to hand.
Exactly. Find a way to move things into the fixed point realm (not necessarily integers) just to take advantage of the native integer operations in all processors. But, often this requires a fair bit of magic and doesn't stand up to "casual" maintenance (because folks don't bother to understand the code they are poking with a stick to "maintain"). How do *you* avoid these sorts of screwups? Or, do you decompose the algorithms in such a way that there are no sirens calling to future maintainers to "just tweek this constant" (in a way that they don't full understand)? Thx, --don
Don Y <this@is.not.me.com> wrote:
> Hi Theo, > > On 5/26/2014 9:56 AM, Theo Markettos wrote: > > One of my colleagues has a C++ library that does precision checking > > through the calculations and tells you at which point precision was > > lost. Whne you're happy and ready to go to production, you just turn > > off the checking and it's optimised away to just doing the calculations. > > Not sure I follow. :< It's a FP library? Overloaded operators? > How does *it* know what's optimal? Or, is it an arbitrary precision > library that *maintains* the required precision, dynamically (so > you never "lose" precision).
It's a fixed point library that follows the precision through the calculations (so if you multiply an 4.12 number by a 10.6 number you get a 14.18 number, which then propagates through), and takes note of where any over/underflows occur. Each variable is constrained (eg voltage = 0 to 100mV, decided by the limits of the physical system being modelled) so we know what 'sensible' values are. So you can run your algorithm in testing mode (slowly) and see what happens to the arithmetic, and then flip the switch that turns off all the checks when you want to run for real.
> Is this something homegrown or is it commercially available? > (I imagine it isn't particularly fast?)
I think it's intended to be open source, but he's currently away so I'm unclear of current status.
> Exactly. Find a way to move things into the fixed point realm (not > necessarily integers) just to take advantage of the native integer > operations in all processors. > > But, often this requires a fair bit of magic and doesn't stand up > to "casual" maintenance (because folks don't bother to understand > the code they are poking with a stick to "maintain"). > > How do *you* avoid these sorts of screwups? Or, do you decompose > the algorithms in such a way that there are no sirens calling to > future maintainers to "just tweek this constant" (in a way that > they don't full understand)?
Our answer is save the scientists (in this case) from writing low level code. Give them a representation they understand (eg differential equations with units), give them enough knobs to tweak (eg pick the numerical methods), and then generate code for whatever target platform it's intended to run on. CS people do the engineering of that, scientists do their science (which we aren't domain experts in). That means both ends of the toolchain are more maintainable. Theo
Hi Theo,

On 5/26/2014 2:28 PM, Theo Markettos wrote:
> Don Y<this@is.not.me.com> wrote: >> On 5/26/2014 9:56 AM, Theo Markettos wrote: >>> One of my colleagues has a C++ library that does precision checking >>> through the calculations and tells you at which point precision was >>> lost. Whne you're happy and ready to go to production, you just turn >>> off the checking and it's optimised away to just doing the calculations. >> >> Not sure I follow. :< It's a FP library? Overloaded operators? >> How does *it* know what's optimal? Or, is it an arbitrary precision >> library that *maintains* the required precision, dynamically (so >> you never "lose" precision). > > It's a fixed point library that follows the precision through the > calculations (so if you multiply an 4.12 number by a 10.6 number you get a > 14.18 number, which then propagates through), and takes note of where any > over/underflows occur.
So, each variable is tagged with it's Q notation? This is evaluated by run-time (conditionally enabled?) code? Or, compiler conditionals?
> Each variable is constrained (eg voltage = 0 to > 100mV, decided by the limits of the physical system being modelled) so we > know what 'sensible' values are.
So, the "package" is essentially examining how the limits on each value/variable -- in the stated Q form -- are affected by the operations? E.g., a Q7.9 value of 100 can't be SAFELY doubled (without moving to Q8.8 or whatever). The developer, presumably, frowns when told of these things and makes adjustments, accordingly (to Q form, limits, operators, etc.) to achieve what he needs?
> So you can run your algorithm in testing mode (slowly) and see what happens > to the arithmetic, and then flip the switch that turns off all the checks > when you want to run for real.
"Slowly" because you just want to see the "report" of each operation (without having to develop yet another tool that collects that data from a run and reports it ex post factum).
>> Is this something homegrown or is it commercially available? >> (I imagine it isn't particularly fast?) > > I think it's intended to be open source, but he's currently away so I'm > unclear of current status.
OK. I'd be curious to see even a "specification" of its functionality, if he consents (if need be, via PM)
>> Exactly. Find a way to move things into the fixed point realm (not >> necessarily integers) just to take advantage of the native integer >> operations in all processors. >> >> But, often this requires a fair bit of magic and doesn't stand up >> to "casual" maintenance (because folks don't bother to understand >> the code they are poking with a stick to "maintain"). >> >> How do *you* avoid these sorts of screwups? Or, do you decompose >> the algorithms in such a way that there are no sirens calling to >> future maintainers to "just tweek this constant" (in a way that >> they don't full understand)? > > Our answer is save the scientists (in this case) from writing low level > code. Give them a representation they understand (eg differential equations > with units), give them enough knobs to tweak (eg pick the numerical > methods), and then generate code for whatever target platform it's intended > to run on. CS people do the engineering of that, scientists do their > science (which we aren't domain experts in). That means both ends of the > toolchain are more maintainable.
So, you expose the wacky encoding formats to the "scientists" (users)? Or, do you have a set of mapping functions on the front and back ends that make the data "look nice" for the users (so they don't have to understand the formats). But, you still have the maintenance issue -- how do you keep "CS folks" from making careless changes to the code without fully understanding what's going on? E.g., *forcing* them to run their code in this "test mode" (to ensure no possibility of overflow even after they changed the 1.2 scalar to 1.3 in the example above)? [Or, do you just insist on competence?]
On Sun, 25 May 2014 23:20:15 -0700, Don Y <this@is.not.me.com> wrote:

>Hey George! > >Finally warm up, (and dry out) there?? :> Broke 100F last week... :< >July's gonna be a bitch!
Temps are just getting back to normal (60-70F) - it's been a cold Spring. Unlike most of the East, we're running about average for wet weather here: predictably rains every trash day and most weekends 8-)
>On 5/25/2014 10:06 PM, George Neuner wrote: >> On Sun, 25 May 2014 13:25:40 -0700, Don Y<this@is.not.me.com> wrote: >> >> They are discussing a closely related question involving the tradeoffs >> between providing an all-up FPU (e.g., IEEE-754) vs providing >> sub-units and allowing software to drive them. I haven't followed the >> whole thread [it's wandering a lot (even for c.a.)] but there's been >> some mentions of break even points of HW vs SW for general code. > >I think you can get even finer in choosing how little you implement >based on application domain (of course, I haven't yet read their claims >but still assume they are operating within some "rational" sense of >partitioning... e.g., not willing to allow the actual number format >to be rendered arbitrary, etc.)
The discussion typically starts with "IEEE-754 is broken because ..." and revolves around the design of the FPU and the memory systems to feed it: debating implementing an all-up FPU with fixed operation vs exposing the required sub-units and allowing software use them as desired - and in the process getting faster operation, better result consistency, permitting different rounding and error modes, etc. This is a frequently recurring topic in c.a - if you poke around a bit you'll find quite a number of posts about it. George
Hi George,

> Temps are just getting back to normal (60-70F) - it's been a cold > Spring.
Um, wasn't it a cold *winter*? :>
> Unlike most of the East, we're running about average for wet > weather here: predictably rains every trash day and most weekends 8-)
I suspect we *might* see our first rain of the *year* sometime in July. I don't think we had *any* this Winter. Then, again, I don't think we ever saw a "freezing" temperature! (which was great for last year's citrus crop but seems to be creating problems for this *next* crop)
>>> They are discussing a closely related question involving the tradeoffs >>> between providing an all-up FPU (e.g., IEEE-754) vs providing >>> sub-units and allowing software to drive them. I haven't followed the >>> whole thread [it's wandering a lot (even for c.a.)] but there's been >>> some mentions of break even points of HW vs SW for general code. >> >> I think you can get even finer in choosing how little you implement >> based on application domain (of course, I haven't yet read their claims >> but still assume they are operating within some "rational" sense of >> partitioning... e.g., not willing to allow the actual number format >> to be rendered arbitrary, etc.) > > The discussion typically starts with "IEEE-754 is broken because ..." > and revolves around the design of the FPU and the memory systems to > feed it: debating implementing an all-up FPU with fixed operation vs > exposing the required sub-units and allowing software use them as > desired - and in the process getting faster operation, better result > consistency, permitting different rounding and error modes, etc.
<frown> Sounds like monday morning quarterback. OTOH (the *fourth* one!), hopefully *someone* is thinking about these issues -- preferably people in a position to actually *do* something about it! I read through most of the thread (sheesh! talk about *long*!). The big thing I came away with is how these people are missing the boat! Like designing bigger bumpers on cars instead of figuring out how to get BETTER DRIVERS!
> This is a frequently recurring topic in c.a - if you poke around a bit > you'll find quite a number of posts about it.
Thx! --don