Floating point vs fixed arithmetics (signed 64-bit)| page 5

Reply by David T. Ashley ●March 29, 20122012-03-29

On Tue, 27 Mar 2012 15:25:21 +0200, David Brown
<david@westcontrol.removethisbit.com> wrote:
>
>> 2. Is it possible to generate shift based logic to case 5 mentioned above?
>> (Signed 64-bit divide by constant of 2^n)
>
>Yes.
>
>The easiest way to make sure you get signed division right is to 
>separate out the sign, then use unsigned arithmetic.  That way you can't 
>go wrong, and the C code is portable.

Additional note to the OP:  comp.lang.c will point you in the right
direction as far as what is portable and what is not.

From memory, probably wrong ... I should look it up, but too lazy.

For unsigneds, no issues shifting in either direction.  Works as
intuitively expected.

For signeds ...

Signed left shifts work as expected.  0 is always propagated into the
LSB.

Signed right shifts are, from memory, I believe, implementation
dependent.  It isn't guaranteed how the MSB will be populated.

Again, this is from memory and possibly wrong.

The suggestion of separating out the sign certainly prudent.

DTA

Reply by ●March 29, 20122012-03-29

On 29 Mar 2012 11:53:36 GMT, Andrew Reilly <areilly---@bigpond.net.au>
wrote:

>> The main problem trying to write _low_level_ math routines in C is that
>> you do not have access to the carry bit or use any rotate instruction.
>> The C-compiler would have to be very clever to convert a sequence of
>> C-statement into a single rotate instruction or shifting multiple bits
>> into two registers.
>
>It's a funny old world.  I've seen several compilers recognise the pair 
>of shifts and an or combination as a rotate, and emit that instruction.  
>I've also replaced carefully asm-"optimised" maths routines (on x86) that 
>used the carry flag with "vanilla" C equivalents, and the overall effect 
>was a fairly dramatic performance improvement.  Not sure whether it was a 
>side effect of the assembly code pinning registers that could otherwise 
>have been reassigned, or some subtle consequence of reduced dependency, 
>but the result was clear.  Guessing performace on massively superscalar, 
>out-of-order processors like modern x86-64 is very difficult, IMO.

The x86 family is a bit strange case. The number of cycles required by
trivial integer operations (adds, shifts) compared to more complex
instructions like integer mul/div is nearly 1:1 and the floating point
variants are not much worse. Even some complex cases such as floating
point sin/cos are handled quite quickly.

One might even argue that the relative performance for primitive
operations like shifts and adds are quite poor on x86 processors,
compared to computationally intensive operations like sin/cos
(requiring 3rd-8th order polynomial).

Reply by Tim Wescott ●March 29, 20122012-03-29

On Thu, 29 Mar 2012 07:56:50 -0700, Mark Borgerson wrote:

> In article <oKCdnSXl7OPQPe7SnZ2dnUVZ_gednZ2d@web-ster.com>,
> tim@seemywebsite.com says...
>> 
>> On Wed, 28 Mar 2012 22:44:32 +0000, Andrew Reilly wrote:
>> 
> <<SNIP>>
>> > Fast isn't always the only consideration, though.  Floating point is
>> > *always* going to be more power-hungry than fixed point, simply
>> > because it is doing a bunch of extra work at run-time that
>> > fixed-point forces you to hoist to compile-time.
>> 
>> It'll be power hungry twice if you select a chip that has floating
>> point hardware.  I never seem to have the budget -- either dollars or
>> watts -- to use such processors.
> 
> Cortex M4 chips,like the STM32F405 have lowered the bars quite a bit for
> FPU availability.  STm32F405 is about $11.5 qty 1 at DigiKey.  The
> STM32F205 Cortex M3 is about the same price.
> 
> I've got one of the chips, and it's compatible with the F205 board I
> designed, so I'll be trying it out soon.  More RAM, more Flash, faster
> clock----everything we look forward to in a new generation of chips.
> (since I'm not using an OS or big USB or ethernet stacks, I'll have LOTS
> of flash left over for things like lookup tables, etc.)
> 
> Right now, I'm just happy to read an SD card and send bit-banged data to
> an FT232H at about  6MB/second.   I can even use the same drivers and
> host I use with the FT245 chips which do the same thing at about
> 200KB/s.  The 4-bit SD interface on the STM chips can do multi-block
> reads at upwards of 10MB/s.   Hard to match that with SPI mode!
> 
> 
>> > The advice to benchmark is excellent, of course.  Particularly
>> > because the results won't necessarily be what you expect.
>> 
>> Yes.  Even when I expect anti-intuitive results, I can still be
>> astonished by benchmarks.
> 
> I think the FPU availability will greatly simplify coding of things like
> Extended Kalman Filters and digital signal processing apps.  You can
> write and test code on a PC while specifying 32-bit floats and port
> pretty easily to the MPU system.

Be careful of 32-bit floating point.  It is insufficient for a number of 
real-world tasks for which 32-bit fixed point is well suited.  IEEE 
single-precision floating point gives you (effectively) a 25- or 26-bit 
mantissa (I can't remember how many bits it is, plus sign, plus implied 
1).  When integrator gains get low, that's not enough, where the extra 
factor of 128 or 64 available from well-scaled fixed point will save the 
day.

Be _very_ careful of 32-bit floating point in an Extended Kalman filter.  
Particularly if you're not using a square-root algorithm for the 
evolution of the variance matrix.  You can run out of precision 
astonishingly quickly.

-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

Reply by Mark Borgerson ●March 30, 20122012-03-30

In article <rZydncI478wfROnSnZ2dnUVZ_ridnZ2d@web-ster.com>, 
tim@seemywebsite.com says...
> 
> On Thu, 29 Mar 2012 07:56:50 -0700, Mark Borgerson wrote:
> 
> > In article <oKCdnSXl7OPQPe7SnZ2dnUVZ_gednZ2d@web-ster.com>,
> > tim@seemywebsite.com says...
> >> 
> >> On Wed, 28 Mar 2012 22:44:32 +0000, Andrew Reilly wrote:
> >> 
> > <<SNIP>>
> >> > Fast isn't always the only consideration, though.  Floating point is
> >> > *always* going to be more power-hungry than fixed point, simply
> >> > because it is doing a bunch of extra work at run-time that
> >> > fixed-point forces you to hoist to compile-time.
> >> 
> >> It'll be power hungry twice if you select a chip that has floating
> >> point hardware.  I never seem to have the budget -- either dollars or
> >> watts -- to use such processors.
> > 
> > Cortex M4 chips,like the STM32F405 have lowered the bars quite a bit for
> > FPU availability.  STm32F405 is about $11.5 qty 1 at DigiKey.  The
> > STM32F205 Cortex M3 is about the same price.
> > 
> > I've got one of the chips, and it's compatible with the F205 board I
> > designed, so I'll be trying it out soon.  More RAM, more Flash, faster
> > clock----everything we look forward to in a new generation of chips.
> > (since I'm not using an OS or big USB or ethernet stacks, I'll have LOTS
> > of flash left over for things like lookup tables, etc.)
> > 
> > Right now, I'm just happy to read an SD card and send bit-banged data to
> > an FT232H at about  6MB/second.   I can even use the same drivers and
> > host I use with the FT245 chips which do the same thing at about
> > 200KB/s.  The 4-bit SD interface on the STM chips can do multi-block
> > reads at upwards of 10MB/s.   Hard to match that with SPI mode!
> > 
> > 
> >> > The advice to benchmark is excellent, of course.  Particularly
> >> > because the results won't necessarily be what you expect.
> >> 
> >> Yes.  Even when I expect anti-intuitive results, I can still be
> >> astonished by benchmarks.
> > 
> > I think the FPU availability will greatly simplify coding of things like
> > Extended Kalman Filters and digital signal processing apps.  You can
> > write and test code on a PC while specifying 32-bit floats and port
> > pretty easily to the MPU system.
> 
> Be careful of 32-bit floating point.  It is insufficient for a number of 
> real-world tasks for which 32-bit fixed point is well suited.  IEEE 
> single-precision floating point gives you (effectively) a 25- or 26-bit 
> mantissa (I can't remember how many bits it is, plus sign, plus implied 
> 1).  When integrator gains get low, that's not enough, where the extra 
> factor of 128 or 64 available from well-scaled fixed point will save the 
> day.

IIRC, IEEE-854 is 8 bits exponent (offset by  128 ), one bit sign and 
23-bit mantissa with an implied 1 bit as the 24th bit.

That's probably OK for FIR filters working on the results of 16-bit ADCs
as long as the number of terms is reasonable (<30 or so).
OTOH, I handled those calculations nicely on and MSP430 with the onboard
16x16 Bit hardware multiply and accumulate.  When I set up the 
coefficients properly, I didn't even have to do a divide of the sum.  I 
just picked the high 16-bit word----an effective divide by 65536.

Matlab allows me to generate filters with 16 and 32 bit integers and 32
and 64-bit FP.  If I translate from MSP430 to Cortex,  I would probably 
just translate the filters to 32-bit integer and save the FPU for 
things that might exceed the dynamic range of the 32-bit integers.

> 
> Be _very_ careful of 32-bit floating point in an Extended Kalman filter.  
> Particularly if you're not using a square-root algorithm for the 
> evolution of the variance matrix.  You can run out of precision 
> astonishingly quickly.

Thanks for the notes.  I looked up the last time I ported someone else's
code to a StrongArm processor.  They did use doubles (64-bit FP).  The
chip didn't have an FPU and was running Linux.  The standard FP library 
implementation did all the floating point calculations with software
interrupts  and performance truly sucked.  We ended up revising all the
code to use a special library that didn't use SWIs.   It was still
not as fast as we wanted.  I'm not sure how much a 32-bit FPU will help
with 64-bit FP calculations.  One of these days I'll take a closer look
at the IAR and STM signal processing libraries.

Mark Borgerson

Reply by Tim Wescott ●March 30, 20122012-03-30

On Thu, 29 Mar 2012 21:19:03 -0700, Mark Borgerson wrote:

> In article <rZydncI478wfROnSnZ2dnUVZ_ridnZ2d@web-ster.com>,
> tim@seemywebsite.com says...
>> 
>> On Thu, 29 Mar 2012 07:56:50 -0700, Mark Borgerson wrote:
>> 
>> > In article <oKCdnSXl7OPQPe7SnZ2dnUVZ_gednZ2d@web-ster.com>,
>> > tim@seemywebsite.com says...
>> >> 
>> >> On Wed, 28 Mar 2012 22:44:32 +0000, Andrew Reilly wrote:
>> >> 
>> > <<SNIP>>
>> >> > Fast isn't always the only consideration, though.  Floating point
>> >> > is *always* going to be more power-hungry than fixed point, simply
>> >> > because it is doing a bunch of extra work at run-time that
>> >> > fixed-point forces you to hoist to compile-time.
>> >> 
>> >> It'll be power hungry twice if you select a chip that has floating
>> >> point hardware.  I never seem to have the budget -- either dollars
>> >> or watts -- to use such processors.
>> > 
>> > Cortex M4 chips,like the STM32F405 have lowered the bars quite a bit
>> > for FPU availability.  STm32F405 is about $11.5 qty 1 at DigiKey. 
>> > The STM32F205 Cortex M3 is about the same price.
>> > 
>> > I've got one of the chips, and it's compatible with the F205 board I
>> > designed, so I'll be trying it out soon.  More RAM, more Flash,
>> > faster clock----everything we look forward to in a new generation of
>> > chips. (since I'm not using an OS or big USB or ethernet stacks, I'll
>> > have LOTS of flash left over for things like lookup tables, etc.)
>> > 
>> > Right now, I'm just happy to read an SD card and send bit-banged data
>> > to an FT232H at about  6MB/second.   I can even use the same drivers
>> > and host I use with the FT245 chips which do the same thing at about
>> > 200KB/s.  The 4-bit SD interface on the STM chips can do multi-block
>> > reads at upwards of 10MB/s.   Hard to match that with SPI mode!
>> > 
>> > 
>> >> > The advice to benchmark is excellent, of course.  Particularly
>> >> > because the results won't necessarily be what you expect.
>> >> 
>> >> Yes.  Even when I expect anti-intuitive results, I can still be
>> >> astonished by benchmarks.
>> > 
>> > I think the FPU availability will greatly simplify coding of things
>> > like Extended Kalman Filters and digital signal processing apps.  You
>> > can write and test code on a PC while specifying 32-bit floats and
>> > port pretty easily to the MPU system.
>> 
>> Be careful of 32-bit floating point.  It is insufficient for a number
>> of real-world tasks for which 32-bit fixed point is well suited.  IEEE
>> single-precision floating point gives you (effectively) a 25- or 26-bit
>> mantissa (I can't remember how many bits it is, plus sign, plus implied
>> 1).  When integrator gains get low, that's not enough, where the extra
>> factor of 128 or 64 available from well-scaled fixed point will save
>> the day.
> 
> IIRC, IEEE-854 is 8 bits exponent (offset by  128 ), one bit sign and
> 23-bit mantissa with an implied 1 bit as the 24th bit.
> 
> That's probably OK for FIR filters working on the results of 16-bit ADCs
> as long as the number of terms is reasonable (<30 or so). OTOH, I
> handled those calculations nicely on and MSP430 with the onboard 16x16
> Bit hardware multiply and accumulate.  When I set up the coefficients
> properly, I didn't even have to do a divide of the sum.  I just picked
> the high 16-bit word----an effective divide by 65536.
> 
> Matlab allows me to generate filters with 16 and 32 bit integers and 32
> and 64-bit FP.  If I translate from MSP430 to Cortex,  I would probably
> just translate the filters to 32-bit integer and save the FPU for things
> that might exceed the dynamic range of the 32-bit integers.
> 
It gets to be an issue when you're implementing IIR filters or PID 
controllers where the bandwidth of the filter or loop is much smaller 
than the sampling rate: in those circumstances, the difference between 
the maximum size of an accumulator and the size of an increment that 
needs to affect it can get to be a healthy portion of -- or more than -- 
2^25, and then you're screwed.

> 
>> Be _very_ careful of 32-bit floating point in an Extended Kalman
>> filter. Particularly if you're not using a square-root algorithm for
>> the evolution of the variance matrix.  You can run out of precision
>> astonishingly quickly.
> 
> Thanks for the notes.  I looked up the last time I ported someone else's
> code to a StrongArm processor.  They did use doubles (64-bit FP).  The
> chip didn't have an FPU and was running Linux.  The standard FP library
> implementation did all the floating point calculations with software
> interrupts  and performance truly sucked.  We ended up revising all the
> code to use a special library that didn't use SWIs.   It was still not
> as fast as we wanted.  I'm not sure how much a 32-bit FPU will help with
> 64-bit FP calculations.  One of these days I'll take a closer look at
> the IAR and STM signal processing libraries.

If I needed to implement a Kalman filter on a processor that would take a 
significant speed hit going to 64-bit floating point I'd take a close 
look at the square root algorithms.  The basic idea is that you have to 
do more computation to carry the square root of the variance, but because 
it's a square root you pretty much cut your needed precision in half.

On a PC I rather suspect that using a square root algorithm would be a 
stupid waste of time -- but if brand B can do 32-bit floating point 50 
times faster than 64-bit, the square root algorithm would probably win 
hands down.

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Reply by ●March 30, 20122012-03-30

On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
> I did a fixed point support package for our 8 bit embedded systems
> compilers and one interesting metric came out of the project.
> 
> Given a number of bits in a number and similar error checking fixed
> or float took very similar amounts of execution time and code size
> in applications.
> 
> For example 32 bit float and 32 bit fixed point. They are not exact
> but they are close. In the end much to my surprise the choice is
> dynamic range or resolution.

 That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity.

 We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision.

 With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully.

 Compiler suppliers for 32 bit cores, really should provide optimised libraries for Gain/Scale type calibrates, that use a 64 bit result in the intermediate steps.

Reply by Clifford Heath ●April 1, 20122012-04-01

On 03/29/12 03:20, Tim Wescott wrote:
> But on the x86 -- which is the _only_ processor that I've tried it that
> had floating point -- 32-bit fractional arithmetic is slower than 64-bit
> floating point.

I think I recall that transition point occurring around 1994.

I was writing a scalable vector graphics subsystem, and carefully using
integer (sometimes fixed-point) math wherever possible, only to find that,
when I changed the basic type of the coordinate to float (or double, I
can't recall) the system actually rendered *faster*.

The integer unit was busy computing addresses and array offsets, and
being interrupted with *coordinate* math, while the FPU lay idle.

This was still in the Pentium days, before even the 686 and PII.

On a modern note, has anyone tried to use the TI OMAP ARM CPUs?
I haven't looked at the DSP instruction set, but the hardware FP is sweet.

Clifford Heath.

Reply by Mark Borgerson ●April 3, 20122012-04-03

In article <18231389.1481.1333105718864.JavaMail.geo-discussion-
forums@yneo2>, j.m.granville@gmail.com says...
> 
> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
> > I did a fixed point support package for our 8 bit embedded systems
> > compilers and one interesting metric came out of the project.
> > 
> > Given a number of bits in a number and similar error checking fixed
> > or float took very similar amounts of execution time and code size
> > in applications.
> > 
> > For example 32 bit float and 32 bit fixed point. They are not exact
> > but they are close. In the end much to my surprise the choice is
> > dynamic range or resolution.
> 
>  That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity.
> 
>  We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision.
> 
>  With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully.
>
Have you actually found and used a 32-bit ADC?   For and ADC with a 5V 
range, that would mean just a few nanovolts per LSB!!! 
>  Compiler suppliers for 32 bit cores, really should provide optimised libraries for Gain/Scale type calibrates, that use a 64 bit result in the intermediate steps.

My experience is that I'm lucky to get 20 noise-free bits on any system 
actually connected to an MPU (for a single conversion).  Still, that 
would push the limits on FP with only 24 bits in the  mantissa if I were 
to do any significant oversampling.  I remember  professors in
chemistry and physics warning me that the uncertainty in my final result
should have error limits corresponding the the precision of my inputs.  
Still, roundoff errors could eventually degrade the result past the 
limits of the input for some calculations. 

The reality of the oceanographic sensors I work with is that 16 bits 
gets you right into the noise level of the real world for most
experiments.

However, if you are doing long-term integrations of variable inputs,
roundoff error could come back to haunt  you.

Mark Borgerson

Reply by John Devereux ●April 3, 20122012-04-03

Mark Borgerson <mborgerson@comcast.net> writes:

> In article <18231389.1481.1333105718864.JavaMail.geo-discussion-
> forums@yneo2>, j.m.granville@gmail.com says...
>> 
>> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
>> > I did a fixed point support package for our 8 bit embedded systems
>> > compilers and one interesting metric came out of the project.
>> > 
>> > Given a number of bits in a number and similar error checking fixed
>> > or float took very similar amounts of execution time and code size
>> > in applications.
>> > 
>> > For example 32 bit float and 32 bit fixed point. They are not exact
>> > but they are close. In the end much to my surprise the choice is
>> > dynamic range or resolution.
>> 
>>  That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity.
>> 
>>  We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision.
>> 
>>  With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully.
>>
> Have you actually found and used a 32-bit ADC?   For and ADC with a 5V 
> range, that would mean just a few nanovolts per LSB!!! 

Only actual chip I have heard of is a sigma-delta from TI. Of course
8-10 of these bit are marketing. I would look it up for you but the
flash selection tool is still "initializing" for me on their site...

The best ADC I have seen is a HP 3458A meter, the equivalent of a 28 bit
chip ADC.

It might just be possible to make a 32 bit ADC using a josephson
junction array, if you have a liquid helium supply handy :)

[...]


-- 

John Devereux

Reply by ●April 3, 20122012-04-03

John Devereux <john@devereux.me.uk> wrote:

> Only actual chip I have heard of is a sigma-delta from TI. Of course
> 8-10 of these bit are marketing. I would look it up for you but the
> flash selection tool is still "initializing" for me on their site...

Off-topic, but as far as I can tell TI are not using Flash in any of 
their selection tools, only HTML5. Unfortunately their backend sometimes 
glitches out, usually when you need to look up one of their components.

Anyway, their ADS1281/1282 advertise a 31 bit resolution. The ADS1282-HT 
high-temperature variant is even available in DIP packaging for the low, 
low price of $218.75 ea.

-a

Previous 3 456 Next

Floating point vs fixed arithmetics (signed 64-bit)

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group