EmbeddedRelated.com
Forums

Floating point vs fixed arithmetics (signed 64-bit)

Started by kishor March 26, 2012
Anders.Montonen@kapsi.spam.stop.fi.invalid writes:

> John Devereux <john@devereux.me.uk> wrote: > >> Only actual chip I have heard of is a sigma-delta from TI. Of course >> 8-10 of these bit are marketing. I would look it up for you but the >> flash selection tool is still "initializing" for me on their site... > > Off-topic, but as far as I can tell TI are not using Flash in any of > their selection tools, only HTML5. Unfortunately their backend sometimes > glitches out, usually when you need to look up one of their > components.
Oh really? Good for them. I apologise to TI, I admit I was using quite an old browser. In fact it seems to work very well in a slightly more modern one. It is one of the few such manufacturer "selection tools" that uses the whole width of the browser window. Most are crippled to uselessness by some stupid marketeers desire to exactly control appearance.
> Anyway, their ADS1281/1282 advertise a 31 bit resolution. The ADS1282-HT > high-temperature variant is even available in DIP packaging for the low, > low price of $218.75 ea. > > -a
-- John Devereux
On Fri, 30 Mar 2012 04:08:38 -0700, j.m.granville wrote:

> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote: >> I did a fixed point support package for our 8 bit embedded systems >> compilers and one interesting metric came out of the project. >> >> Given a number of bits in a number and similar error checking fixed or >> float took very similar amounts of execution time and code size in >> applications. >> >> For example 32 bit float and 32 bit fixed point. They are not exact but >> they are close. In the end much to my surprise the choice is dynamic >> range or resolution. > > That makes sense for 8 bit cores, but there is another issue besides > speed the OP may need to consider and that is granularity. > > We had one application where floating point was more convenient, but > gave lower precision than a 32*32:64/32 because the float uses 23+1 > bits to store the number. The other bits are exponent, and give dynamic > range, but NOT precision. > > With 24b ADCs that may start to matter and certainly with 32 bit ADCs, > you would need to watch it very carefully.
If you do any filtering at all, the 25 bits of precision often matter with a _16_ bit ADC, when they aren't a show-stopper altogether. It wouldn't be sensible to even _think_ about filtering the output of a 24- bit ADC with single-precision floating point data paths unless the ADC had been exceedingly poorly chosen or applied, and had essentially useless content in the last several bits. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
In article <87sjgkj0bs.fsf@devereux.me.uk>, john@devereux.me.uk says...
> > Anders.Montonen@kapsi.spam.stop.fi.invalid writes: > > > John Devereux <john@devereux.me.uk> wrote: > > > >> Only actual chip I have heard of is a sigma-delta from TI. Of course > >> 8-10 of these bit are marketing. I would look it up for you but the > >> flash selection tool is still "initializing" for me on their site... > > > > Off-topic, but as far as I can tell TI are not using Flash in any of > > their selection tools, only HTML5. Unfortunately their backend sometimes > > glitches out, usually when you need to look up one of their > > components. > > Oh really? Good for them. I apologise to TI, I admit I was using quite > an old browser. > > In fact it seems to work very well in a slightly more modern one. It is > one of the few such manufacturer "selection tools" that uses the whole > width of the browser window. Most are crippled to uselessness by some > stupid marketeers desire to exactly control appearance.
Because the marketeer or developer believe everyone has the same system and screen sie as them. Then it looks right when printed out on a piece of paper and handed to the board to look at. Don't even get me on fonts specified in pixels :) -- Paul Carpenter | paul@pcserviceselectronics.co.uk <http://www.pcserviceselectronics.co.uk/> PC Services <http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font <http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny <http://www.badweb.org.uk/> For those web sites you hate
In article <p72dndn_Y4-U2ubSnZ2dnUVZ_r6dnZ2d@web-ster.com>, 
tim@seemywebsite.com says...
> > On Fri, 30 Mar 2012 04:08:38 -0700, j.m.granville wrote: > > > On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote: > >> I did a fixed point support package for our 8 bit embedded systems > >> compilers and one interesting metric came out of the project. > >> > >> Given a number of bits in a number and similar error checking fixed or > >> float took very similar amounts of execution time and code size in > >> applications. > >> > >> For example 32 bit float and 32 bit fixed point. They are not exact but > >> they are close. In the end much to my surprise the choice is dynamic > >> range or resolution. > > > > That makes sense for 8 bit cores, but there is another issue besides > > speed the OP may need to consider and that is granularity. > > > > We had one application where floating point was more convenient, but > > gave lower precision than a 32*32:64/32 because the float uses 23+1 > > bits to store the number. The other bits are exponent, and give dynamic > > range, but NOT precision. > > > > With 24b ADCs that may start to matter and certainly with 32 bit ADCs, > > you would need to watch it very carefully. > > If you do any filtering at all, the 25 bits of precision often matter > with a _16_ bit ADC, when they aren't a show-stopper altogether. It > wouldn't be sensible to even _think_ about filtering the output of a 24- > bit ADC with single-precision floating point data paths unless the ADC > had been exceedingly poorly chosen or applied, and had essentially > useless content in the last several bits.
I agree with your point about filtering with 16-bit ADCs. I generally implement FIRs with about 20 taps---which is easiy done with a 16 x 16 -> 32-bit MAC. There's no real advantage to floating point there, and with 16-bit data inputs, dynamic range is not a problem. I've usually found that getting the full 24 bits from a 24-bit ADC is next to impossible. The CS5534 that I've used comes with a table that lists the effective number of bits vs cycle time. IIRC, need to go to 7-1/2 conversions per second to get over 20 bits. At 30 or 60 conversions per second, you're down in the 18 bits range. However, the built-in 60Hz rejection is quite helpful for some applications. Floating point does have it's uses though--where dynamic range is high and some of the numbers start out very large----as in chemistry calculations where you may start with constants like 6.02245x10^23. 32-bit floating point may not be suitable for exactly counting the hydrogen ions in a beaker of analyte, but it can give you reasonable results within the limits of chemical sensors you might use (Such as pH meter with a 4-digit display.) Mark Borgerson
Mark Borgerson <mborgerson@comcast.net> writes:

> In article <p72dndn_Y4-U2ubSnZ2dnUVZ_r6dnZ2d@web-ster.com>, > tim@seemywebsite.com says... >> >> On Fri, 30 Mar 2012 04:08:38 -0700, j.m.granville wrote: >> >> > On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote: >> >> I did a fixed point support package for our 8 bit embedded systems >> >> compilers and one interesting metric came out of the project. >> >> >> >> Given a number of bits in a number and similar error checking fixed or >> >> float took very similar amounts of execution time and code size in >> >> applications. >> >> >> >> For example 32 bit float and 32 bit fixed point. They are not exact but >> >> they are close. In the end much to my surprise the choice is dynamic >> >> range or resolution. >> > >> > That makes sense for 8 bit cores, but there is another issue besides >> > speed the OP may need to consider and that is granularity. >> > >> > We had one application where floating point was more convenient, but >> > gave lower precision than a 32*32:64/32 because the float uses 23+1 >> > bits to store the number. The other bits are exponent, and give dynamic >> > range, but NOT precision. >> > >> > With 24b ADCs that may start to matter and certainly with 32 bit ADCs, >> > you would need to watch it very carefully. >> >> If you do any filtering at all, the 25 bits of precision often matter >> with a _16_ bit ADC, when they aren't a show-stopper altogether. It >> wouldn't be sensible to even _think_ about filtering the output of a 24- >> bit ADC with single-precision floating point data paths unless the ADC >> had been exceedingly poorly chosen or applied, and had essentially >> useless content in the last several bits. > > I agree with your point about filtering with 16-bit ADCs. I generally > implement FIRs with about 20 taps---which is easiy done > with a 16 x 16 -> 32-bit MAC. There's no real advantage to floating > point there, and with 16-bit data inputs, dynamic range is not > a problem. > > I've usually found that getting the full 24 bits from a 24-bit ADC is > next to impossible. The CS5534 that I've used comes with a table that > lists the effective number of bits vs cycle time. IIRC, need to go to > 7-1/2 conversions per second to get over 20 bits. At 30 or 60 > conversions per second, you're down in the 18 bits range. However, the > built-in 60Hz rejection is quite helpful for some applications. > > Floating point does have it's uses though--where dynamic range is high > and some of the numbers start out very large----as in chemistry > calculations where you may start with constants like 6.02245x10^23. > 32-bit floating point may not be suitable for exactly counting the > hydrogen ions in a beaker of analyte, but it can give you reasonable > results within the limits of chemical sensors you might use > (Such as pH meter with a 4-digit display.)
I find it can be nice for generating the final "result" when a complicated formula is involved. Or even if not that complicated but there is some horrible mixture of units involved, Convert everything to floating point SI unit and just do the calculation, instead of carefully scaling everything and checking for loss of precision and overflows at every sub-step. -- John Devereux