EmbeddedRelated.com
Forums

Floating point vs fixed arithmetics (signed 64-bit)

Started by kishor March 26, 2012
On Tue, 27 Mar 2012 15:25:21 +0200, David Brown
<david@westcontrol.removethisbit.com> wrote:
> >> 2. Is it possible to generate shift based logic to case 5 mentioned above? >> (Signed 64-bit divide by constant of 2^n) > >Yes. > >The easiest way to make sure you get signed division right is to >separate out the sign, then use unsigned arithmetic. That way you can't >go wrong, and the C code is portable.
Additional note to the OP: comp.lang.c will point you in the right direction as far as what is portable and what is not. From memory, probably wrong ... I should look it up, but too lazy. For unsigneds, no issues shifting in either direction. Works as intuitively expected. For signeds ... Signed left shifts work as expected. 0 is always propagated into the LSB. Signed right shifts are, from memory, I believe, implementation dependent. It isn't guaranteed how the MSB will be populated. Again, this is from memory and possibly wrong. The suggestion of separating out the sign certainly prudent. DTA
On 29 Mar 2012 11:53:36 GMT, Andrew Reilly <areilly---@bigpond.net.au>
wrote:

>> The main problem trying to write _low_level_ math routines in C is that >> you do not have access to the carry bit or use any rotate instruction. >> The C-compiler would have to be very clever to convert a sequence of >> C-statement into a single rotate instruction or shifting multiple bits >> into two registers. > >It's a funny old world. I've seen several compilers recognise the pair >of shifts and an or combination as a rotate, and emit that instruction. >I've also replaced carefully asm-"optimised" maths routines (on x86) that >used the carry flag with "vanilla" C equivalents, and the overall effect >was a fairly dramatic performance improvement. Not sure whether it was a >side effect of the assembly code pinning registers that could otherwise >have been reassigned, or some subtle consequence of reduced dependency, >but the result was clear. Guessing performace on massively superscalar, >out-of-order processors like modern x86-64 is very difficult, IMO.
The x86 family is a bit strange case. The number of cycles required by trivial integer operations (adds, shifts) compared to more complex instructions like integer mul/div is nearly 1:1 and the floating point variants are not much worse. Even some complex cases such as floating point sin/cos are handled quite quickly. One might even argue that the relative performance for primitive operations like shifts and adds are quite poor on x86 processors, compared to computationally intensive operations like sin/cos (requiring 3rd-8th order polynomial).
On Thu, 29 Mar 2012 07:56:50 -0700, Mark Borgerson wrote:

> In article <oKCdnSXl7OPQPe7SnZ2dnUVZ_gednZ2d@web-ster.com>, > tim@seemywebsite.com says... >> >> On Wed, 28 Mar 2012 22:44:32 +0000, Andrew Reilly wrote: >> > <<SNIP>> >> > Fast isn't always the only consideration, though. Floating point is >> > *always* going to be more power-hungry than fixed point, simply >> > because it is doing a bunch of extra work at run-time that >> > fixed-point forces you to hoist to compile-time. >> >> It'll be power hungry twice if you select a chip that has floating >> point hardware. I never seem to have the budget -- either dollars or >> watts -- to use such processors. > > Cortex M4 chips,like the STM32F405 have lowered the bars quite a bit for > FPU availability. STm32F405 is about $11.5 qty 1 at DigiKey. The > STM32F205 Cortex M3 is about the same price. > > I've got one of the chips, and it's compatible with the F205 board I > designed, so I'll be trying it out soon. More RAM, more Flash, faster > clock----everything we look forward to in a new generation of chips. > (since I'm not using an OS or big USB or ethernet stacks, I'll have LOTS > of flash left over for things like lookup tables, etc.) > > Right now, I'm just happy to read an SD card and send bit-banged data to > an FT232H at about 6MB/second. I can even use the same drivers and > host I use with the FT245 chips which do the same thing at about > 200KB/s. The 4-bit SD interface on the STM chips can do multi-block > reads at upwards of 10MB/s. Hard to match that with SPI mode! > > >> > The advice to benchmark is excellent, of course. Particularly >> > because the results won't necessarily be what you expect. >> >> Yes. Even when I expect anti-intuitive results, I can still be >> astonished by benchmarks. > > I think the FPU availability will greatly simplify coding of things like > Extended Kalman Filters and digital signal processing apps. You can > write and test code on a PC while specifying 32-bit floats and port > pretty easily to the MPU system.
Be careful of 32-bit floating point. It is insufficient for a number of real-world tasks for which 32-bit fixed point is well suited. IEEE single-precision floating point gives you (effectively) a 25- or 26-bit mantissa (I can't remember how many bits it is, plus sign, plus implied 1). When integrator gains get low, that's not enough, where the extra factor of 128 or 64 available from well-scaled fixed point will save the day. Be _very_ careful of 32-bit floating point in an Extended Kalman filter. Particularly if you're not using a square-root algorithm for the evolution of the variance matrix. You can run out of precision astonishingly quickly. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
In article <rZydncI478wfROnSnZ2dnUVZ_ridnZ2d@web-ster.com>, 
tim@seemywebsite.com says...
> > On Thu, 29 Mar 2012 07:56:50 -0700, Mark Borgerson wrote: > > > In article <oKCdnSXl7OPQPe7SnZ2dnUVZ_gednZ2d@web-ster.com>, > > tim@seemywebsite.com says... > >> > >> On Wed, 28 Mar 2012 22:44:32 +0000, Andrew Reilly wrote: > >> > > <<SNIP>> > >> > Fast isn't always the only consideration, though. Floating point is > >> > *always* going to be more power-hungry than fixed point, simply > >> > because it is doing a bunch of extra work at run-time that > >> > fixed-point forces you to hoist to compile-time. > >> > >> It'll be power hungry twice if you select a chip that has floating > >> point hardware. I never seem to have the budget -- either dollars or > >> watts -- to use such processors. > > > > Cortex M4 chips,like the STM32F405 have lowered the bars quite a bit for > > FPU availability. STm32F405 is about $11.5 qty 1 at DigiKey. The > > STM32F205 Cortex M3 is about the same price. > > > > I've got one of the chips, and it's compatible with the F205 board I > > designed, so I'll be trying it out soon. More RAM, more Flash, faster > > clock----everything we look forward to in a new generation of chips. > > (since I'm not using an OS or big USB or ethernet stacks, I'll have LOTS > > of flash left over for things like lookup tables, etc.) > > > > Right now, I'm just happy to read an SD card and send bit-banged data to > > an FT232H at about 6MB/second. I can even use the same drivers and > > host I use with the FT245 chips which do the same thing at about > > 200KB/s. The 4-bit SD interface on the STM chips can do multi-block > > reads at upwards of 10MB/s. Hard to match that with SPI mode! > > > > > >> > The advice to benchmark is excellent, of course. Particularly > >> > because the results won't necessarily be what you expect. > >> > >> Yes. Even when I expect anti-intuitive results, I can still be > >> astonished by benchmarks. > > > > I think the FPU availability will greatly simplify coding of things like > > Extended Kalman Filters and digital signal processing apps. You can > > write and test code on a PC while specifying 32-bit floats and port > > pretty easily to the MPU system. > > Be careful of 32-bit floating point. It is insufficient for a number of > real-world tasks for which 32-bit fixed point is well suited. IEEE > single-precision floating point gives you (effectively) a 25- or 26-bit > mantissa (I can't remember how many bits it is, plus sign, plus implied > 1). When integrator gains get low, that's not enough, where the extra > factor of 128 or 64 available from well-scaled fixed point will save the > day.
IIRC, IEEE-854 is 8 bits exponent (offset by 128 ), one bit sign and 23-bit mantissa with an implied 1 bit as the 24th bit. That's probably OK for FIR filters working on the results of 16-bit ADCs as long as the number of terms is reasonable (<30 or so). OTOH, I handled those calculations nicely on and MSP430 with the onboard 16x16 Bit hardware multiply and accumulate. When I set up the coefficients properly, I didn't even have to do a divide of the sum. I just picked the high 16-bit word----an effective divide by 65536. Matlab allows me to generate filters with 16 and 32 bit integers and 32 and 64-bit FP. If I translate from MSP430 to Cortex, I would probably just translate the filters to 32-bit integer and save the FPU for things that might exceed the dynamic range of the 32-bit integers.
> > Be _very_ careful of 32-bit floating point in an Extended Kalman filter. > Particularly if you're not using a square-root algorithm for the > evolution of the variance matrix. You can run out of precision > astonishingly quickly.
Thanks for the notes. I looked up the last time I ported someone else's code to a StrongArm processor. They did use doubles (64-bit FP). The chip didn't have an FPU and was running Linux. The standard FP library implementation did all the floating point calculations with software interrupts and performance truly sucked. We ended up revising all the code to use a special library that didn't use SWIs. It was still not as fast as we wanted. I'm not sure how much a 32-bit FPU will help with 64-bit FP calculations. One of these days I'll take a closer look at the IAR and STM signal processing libraries. Mark Borgerson
On Thu, 29 Mar 2012 21:19:03 -0700, Mark Borgerson wrote:

> In article <rZydncI478wfROnSnZ2dnUVZ_ridnZ2d@web-ster.com>, > tim@seemywebsite.com says... >> >> On Thu, 29 Mar 2012 07:56:50 -0700, Mark Borgerson wrote: >> >> > In article <oKCdnSXl7OPQPe7SnZ2dnUVZ_gednZ2d@web-ster.com>, >> > tim@seemywebsite.com says... >> >> >> >> On Wed, 28 Mar 2012 22:44:32 +0000, Andrew Reilly wrote: >> >> >> > <<SNIP>> >> >> > Fast isn't always the only consideration, though. Floating point >> >> > is *always* going to be more power-hungry than fixed point, simply >> >> > because it is doing a bunch of extra work at run-time that >> >> > fixed-point forces you to hoist to compile-time. >> >> >> >> It'll be power hungry twice if you select a chip that has floating >> >> point hardware. I never seem to have the budget -- either dollars >> >> or watts -- to use such processors. >> > >> > Cortex M4 chips,like the STM32F405 have lowered the bars quite a bit >> > for FPU availability. STm32F405 is about $11.5 qty 1 at DigiKey. >> > The STM32F205 Cortex M3 is about the same price. >> > >> > I've got one of the chips, and it's compatible with the F205 board I >> > designed, so I'll be trying it out soon. More RAM, more Flash, >> > faster clock----everything we look forward to in a new generation of >> > chips. (since I'm not using an OS or big USB or ethernet stacks, I'll >> > have LOTS of flash left over for things like lookup tables, etc.) >> > >> > Right now, I'm just happy to read an SD card and send bit-banged data >> > to an FT232H at about 6MB/second. I can even use the same drivers >> > and host I use with the FT245 chips which do the same thing at about >> > 200KB/s. The 4-bit SD interface on the STM chips can do multi-block >> > reads at upwards of 10MB/s. Hard to match that with SPI mode! >> > >> > >> >> > The advice to benchmark is excellent, of course. Particularly >> >> > because the results won't necessarily be what you expect. >> >> >> >> Yes. Even when I expect anti-intuitive results, I can still be >> >> astonished by benchmarks. >> > >> > I think the FPU availability will greatly simplify coding of things >> > like Extended Kalman Filters and digital signal processing apps. You >> > can write and test code on a PC while specifying 32-bit floats and >> > port pretty easily to the MPU system. >> >> Be careful of 32-bit floating point. It is insufficient for a number >> of real-world tasks for which 32-bit fixed point is well suited. IEEE >> single-precision floating point gives you (effectively) a 25- or 26-bit >> mantissa (I can't remember how many bits it is, plus sign, plus implied >> 1). When integrator gains get low, that's not enough, where the extra >> factor of 128 or 64 available from well-scaled fixed point will save >> the day. > > IIRC, IEEE-854 is 8 bits exponent (offset by 128 ), one bit sign and > 23-bit mantissa with an implied 1 bit as the 24th bit. > > That's probably OK for FIR filters working on the results of 16-bit ADCs > as long as the number of terms is reasonable (<30 or so). OTOH, I > handled those calculations nicely on and MSP430 with the onboard 16x16 > Bit hardware multiply and accumulate. When I set up the coefficients > properly, I didn't even have to do a divide of the sum. I just picked > the high 16-bit word----an effective divide by 65536. > > Matlab allows me to generate filters with 16 and 32 bit integers and 32 > and 64-bit FP. If I translate from MSP430 to Cortex, I would probably > just translate the filters to 32-bit integer and save the FPU for things > that might exceed the dynamic range of the 32-bit integers. >
It gets to be an issue when you're implementing IIR filters or PID controllers where the bandwidth of the filter or loop is much smaller than the sampling rate: in those circumstances, the difference between the maximum size of an accumulator and the size of an increment that needs to affect it can get to be a healthy portion of -- or more than -- 2^25, and then you're screwed.
> >> Be _very_ careful of 32-bit floating point in an Extended Kalman >> filter. Particularly if you're not using a square-root algorithm for >> the evolution of the variance matrix. You can run out of precision >> astonishingly quickly. > > Thanks for the notes. I looked up the last time I ported someone else's > code to a StrongArm processor. They did use doubles (64-bit FP). The > chip didn't have an FPU and was running Linux. The standard FP library > implementation did all the floating point calculations with software > interrupts and performance truly sucked. We ended up revising all the > code to use a special library that didn't use SWIs. It was still not > as fast as we wanted. I'm not sure how much a 32-bit FPU will help with > 64-bit FP calculations. One of these days I'll take a closer look at > the IAR and STM signal processing libraries.
If I needed to implement a Kalman filter on a processor that would take a significant speed hit going to 64-bit floating point I'd take a close look at the square root algorithms. The basic idea is that you have to do more computation to carry the square root of the variance, but because it's a square root you pretty much cut your needed precision in half. On a PC I rather suspect that using a square root algorithm would be a stupid waste of time -- but if brand B can do 32-bit floating point 50 times faster than 64-bit, the square root algorithm would probably win hands down. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
> I did a fixed point support package for our 8 bit embedded systems > compilers and one interesting metric came out of the project. > > Given a number of bits in a number and similar error checking fixed > or float took very similar amounts of execution time and code size > in applications. > > For example 32 bit float and 32 bit fixed point. They are not exact > but they are close. In the end much to my surprise the choice is > dynamic range or resolution.
That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity. We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision. With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully. Compiler suppliers for 32 bit cores, really should provide optimised libraries for Gain/Scale type calibrates, that use a 64 bit result in the intermediate steps.
On 03/29/12 03:20, Tim Wescott wrote:
> But on the x86 -- which is the _only_ processor that I've tried it that > had floating point -- 32-bit fractional arithmetic is slower than 64-bit > floating point.
I think I recall that transition point occurring around 1994. I was writing a scalable vector graphics subsystem, and carefully using integer (sometimes fixed-point) math wherever possible, only to find that, when I changed the basic type of the coordinate to float (or double, I can't recall) the system actually rendered *faster*. The integer unit was busy computing addresses and array offsets, and being interrupted with *coordinate* math, while the FPU lay idle. This was still in the Pentium days, before even the 686 and PII. On a modern note, has anyone tried to use the TI OMAP ARM CPUs? I haven't looked at the DSP instruction set, but the hardware FP is sweet. Clifford Heath.
In article <18231389.1481.1333105718864.JavaMail.geo-discussion-
forums@yneo2>, j.m.granville@gmail.com says...
> > On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote: > > I did a fixed point support package for our 8 bit embedded systems > > compilers and one interesting metric came out of the project. > > > > Given a number of bits in a number and similar error checking fixed > > or float took very similar amounts of execution time and code size > > in applications. > > > > For example 32 bit float and 32 bit fixed point. They are not exact > > but they are close. In the end much to my surprise the choice is > > dynamic range or resolution. > > That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity. > > We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision. > > With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully. >
Have you actually found and used a 32-bit ADC? For and ADC with a 5V range, that would mean just a few nanovolts per LSB!!!
> Compiler suppliers for 32 bit cores, really should provide optimised libraries for Gain/Scale type calibrates, that use a 64 bit result in the intermediate steps.
My experience is that I'm lucky to get 20 noise-free bits on any system actually connected to an MPU (for a single conversion). Still, that would push the limits on FP with only 24 bits in the mantissa if I were to do any significant oversampling. I remember professors in chemistry and physics warning me that the uncertainty in my final result should have error limits corresponding the the precision of my inputs. Still, roundoff errors could eventually degrade the result past the limits of the input for some calculations. The reality of the oceanographic sensors I work with is that 16 bits gets you right into the noise level of the real world for most experiments. However, if you are doing long-term integrations of variable inputs, roundoff error could come back to haunt you. Mark Borgerson
Mark Borgerson <mborgerson@comcast.net> writes:

> In article <18231389.1481.1333105718864.JavaMail.geo-discussion- > forums@yneo2>, j.m.granville@gmail.com says... >> >> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote: >> > I did a fixed point support package for our 8 bit embedded systems >> > compilers and one interesting metric came out of the project. >> > >> > Given a number of bits in a number and similar error checking fixed >> > or float took very similar amounts of execution time and code size >> > in applications. >> > >> > For example 32 bit float and 32 bit fixed point. They are not exact >> > but they are close. In the end much to my surprise the choice is >> > dynamic range or resolution. >> >> That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity. >> >> We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision. >> >> With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully. >> > Have you actually found and used a 32-bit ADC? For and ADC with a 5V > range, that would mean just a few nanovolts per LSB!!!
Only actual chip I have heard of is a sigma-delta from TI. Of course 8-10 of these bit are marketing. I would look it up for you but the flash selection tool is still "initializing" for me on their site... The best ADC I have seen is a HP 3458A meter, the equivalent of a 28 bit chip ADC. It might just be possible to make a 32 bit ADC using a josephson junction array, if you have a liquid helium supply handy :) [...] -- John Devereux
John Devereux <john@devereux.me.uk> wrote:

> Only actual chip I have heard of is a sigma-delta from TI. Of course > 8-10 of these bit are marketing. I would look it up for you but the > flash selection tool is still "initializing" for me on their site...
Off-topic, but as far as I can tell TI are not using Flash in any of their selection tools, only HTML5. Unfortunately their backend sometimes glitches out, usually when you need to look up one of their components. Anyway, their ADS1281/1282 advertise a 31 bit resolution. The ADS1282-HT high-temperature variant is even available in DIP packaging for the low, low price of $218.75 ea. -a