EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Damned floating point

Started by Tim Wescott February 26, 2016
I assume the answer is "yes", but is it normal behavior in an x86 
processor to get slightly different answers when you add up the same 
short vector of doubles at different times?

(Very slightly different -- basically so that sum1 == sum2 isn't true, 
but not noticeable in any other way.)

I assume this has something to do with how the floating-point processor 
state is treated when a calculation is done.  Does anyone know the 
details?

TIA.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott
<seemywebsite@myfooter.really> wrote:

>I assume the answer is "yes", but is it normal behavior in an x86 >processor to get slightly different answers when you add up the same >short vector of doubles at different times? > >(Very slightly different -- basically so that sum1 == sum2 isn't true, >but not noticeable in any other way.) > >I assume this has something to do with how the floating-point processor >state is treated when a calculation is done. Does anyone know the >details?
Floating point addition is not commutative. Never has been, never will be. IOW, (a+b)+c is *not* the same as a+(b+c). In short its a problem with the rounding of intermediate results. Consider a three digit decimal FP system, and you add 100, -100 and .01. If you compute (100 + -100)+.01, you'll get .01. If you compute (100+.01) + -100, you get 0, since the (100+.01) rounds back to 100. It's actually fairly easy to get into a situation (like the one I just illustrated) where the result is *entirely* noise. People doing long sums (matrix multiplication, for example), often take considerable pains to try and minimize the accumulated error, including carefully controlling the order in which things are added. In general, there are few cases where two FP numbers can be meaningfully compared for equality. Usually the correct approach is to do something like "fabs(a-b) < epsilon", although picking epsilon can be a bit tricky. "What Every Computer Scientist Should Know About Floating-Point Arithmetic": https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
On Fri, 26 Feb 2016 13:24:02 -0600, Robert Wessel
<robertwessel2@yahoo.com> wrote:

>On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott ><seemywebsite@myfooter.really> wrote: > >>I assume the answer is "yes", but is it normal behavior in an x86 >>processor to get slightly different answers when you add up the same >>short vector of doubles at different times? >> >>(Very slightly different -- basically so that sum1 == sum2 isn't true, >>but not noticeable in any other way.) >> >>I assume this has something to do with how the floating-point processor >>state is treated when a calculation is done. Does anyone know the >>details? > > >Floating point addition is not commutative. Never has been, never >will be. IOW, (a+b)+c is *not* the same as a+(b+c). > >In short its a problem with the rounding of intermediate results. >Consider a three digit decimal FP system, and you add 100, -100 and >.01. If you compute (100 + -100)+.01, you'll get .01. If you compute >(100+.01) + -100, you get 0, since the (100+.01) rounds back to 100. >It's actually fairly easy to get into a situation (like the one I just >illustrated) where the result is *entirely* noise. People doing long >sums (matrix multiplication, for example), often take considerable >pains to try and minimize the accumulated error, including carefully >controlling the order in which things are added. > >In general, there are few cases where two FP numbers can be >meaningfully compared for equality. Usually the correct approach is >to do something like "fabs(a-b) < epsilon", although picking epsilon >can be a bit tricky. > >"What Every Computer Scientist Should Know About Floating-Point >Arithmetic": > >https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Can't believe I wrote commutative... Argh. FP addition does not follow the usual rules of associativity. OTOH, there have be machines where elementary FP operations have not been commutative (Cray-1 FP multiplication, for example, where a*b often didn't equally b*a).
On Fri, 26 Feb 2016 13:24:02 -0600, Robert Wessel wrote:

> On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott > <seemywebsite@myfooter.really> wrote: > >>I assume the answer is "yes", but is it normal behavior in an x86 >>processor to get slightly different answers when you add up the same >>short vector of doubles at different times? >> >>(Very slightly different -- basically so that sum1 == sum2 isn't true, >>but not noticeable in any other way.) >> >>I assume this has something to do with how the floating-point processor >>state is treated when a calculation is done. Does anyone know the >>details? > > > Floating point addition is not commutative. Never has been, never will > be. IOW, (a+b)+c is *not* the same as a+(b+c). > > In short its a problem with the rounding of intermediate results. > Consider a three digit decimal FP system, and you add 100, -100 and .01. > If you compute (100 + -100)+.01, you'll get .01. If you compute > (100+.01) + -100, you get 0, since the (100+.01) rounds back to 100. > It's actually fairly easy to get into a situation (like the one I just > illustrated) where the result is *entirely* noise. People doing long > sums (matrix multiplication, for example), often take considerable pains > to try and minimize the accumulated error, including carefully > controlling the order in which things are added. > > In general, there are few cases where two FP numbers can be meaningfully > compared for equality. Usually the correct approach is to do something > like "fabs(a-b) < epsilon", although picking epsilon can be a bit > tricky. > > "What Every Computer Scientist Should Know About Floating-Point > Arithmetic": > > https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
In this case, though, I'm computing score = a + b + c. Strictly in that order. Twice, with other floating point calculations in between. And getting different answers. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
In article <UJednYstNY8hN03LnZ2dnUU7-YudnZ2d@giganews.com>,
 Tim Wescott <seemywebsite@myfooter.really> wrote:

> On Fri, 26 Feb 2016 13:24:02 -0600, Robert Wessel wrote: > > > On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott > > <seemywebsite@myfooter.really> wrote: > > > >>I assume the answer is "yes", but is it normal behavior in an x86 > >>processor to get slightly different answers when you add up the same > >>short vector of doubles at different times? > >> > >>(Very slightly different -- basically so that sum1 == sum2 isn't true, > >>but not noticeable in any other way.) > >> > >>I assume this has something to do with how the floating-point processor > >>state is treated when a calculation is done. Does anyone know the > >>details? > > > > > > Floating point addition is not commutative. Never has been, never will > > be. IOW, (a+b)+c is *not* the same as a+(b+c). > > > > In short its a problem with the rounding of intermediate results. > > Consider a three digit decimal FP system, and you add 100, -100 and .01. > > If you compute (100 + -100)+.01, you'll get .01. If you compute > > (100+.01) + -100, you get 0, since the (100+.01) rounds back to 100. > > It's actually fairly easy to get into a situation (like the one I just > > illustrated) where the result is *entirely* noise. People doing long > > sums (matrix multiplication, for example), often take considerable pains > > to try and minimize the accumulated error, including carefully > > controlling the order in which things are added. > > > > In general, there are few cases where two FP numbers can be meaningfully > > compared for equality. Usually the correct approach is to do something > > like "fabs(a-b) < epsilon", although picking epsilon can be a bit > > tricky. > > > > "What Every Computer Scientist Should Know About Floating-Point > > Arithmetic": > > > > https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html > > In this case, though, I'm computing score = a + b + c. Strictly in that > order. Twice, with other floating point calculations in between. And > getting different answers.
That's not at all unusual. There are extra guard bits in the FPU, and if an intermediate value at any times gets put in a CPU register or memory it can affect the rounding. I'll 2nd the idea of reading the docs.oracle.com link. (and never* test floating point values for equality) (*for limited definitions of never)
On Fri, 26 Feb 2016 13:42:52 -0600, Tim Wescott wrote:

> On Fri, 26 Feb 2016 13:24:02 -0600, Robert Wessel wrote: > >> On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott >> <seemywebsite@myfooter.really> wrote: >> >>>I assume the answer is "yes", but is it normal behavior in an x86 >>>processor to get slightly different answers when you add up the same >>>short vector of doubles at different times? >>> >>>(Very slightly different -- basically so that sum1 == sum2 isn't true, >>>but not noticeable in any other way.) >>> >>>I assume this has something to do with how the floating-point processor >>>state is treated when a calculation is done. Does anyone know the >>>details? >> >> >> Floating point addition is not commutative. Never has been, never will >> be. IOW, (a+b)+c is *not* the same as a+(b+c). >> >> In short its a problem with the rounding of intermediate results. >> Consider a three digit decimal FP system, and you add 100, -100 and >> .01. >> If you compute (100 + -100)+.01, you'll get .01. If you compute >> (100+.01) + -100, you get 0, since the (100+.01) rounds back to 100. >> It's actually fairly easy to get into a situation (like the one I just >> illustrated) where the result is *entirely* noise. People doing long >> sums (matrix multiplication, for example), often take considerable >> pains to try and minimize the accumulated error, including carefully >> controlling the order in which things are added. >> >> In general, there are few cases where two FP numbers can be >> meaningfully compared for equality. Usually the correct approach is to >> do something like "fabs(a-b) < epsilon", although picking epsilon can >> be a bit tricky. >> >> "What Every Computer Scientist Should Know About Floating-Point >> Arithmetic": >> >> https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html > > In this case, though, I'm computing score = a + b + c. Strictly in that > order. Twice, with other floating point calculations in between. And > getting different answers.
Make sure you are not mixing floats and doubles. If you are casting ints to floats be careful with your cast ordering. Are you messing with a,b or c between the 2 calculations. There was a test we used to run on Prime Computers that went something like: float f1=1.0 for (i=1;i<1000;i++) { f1=(f1*100.0)/100.0; if (f1 != 1.0) { printf("fail at %d\n",i); break; } It would fail after a few hundred loops. f1 would be something like .99999998 or such. Modern CPUs are much better but a floating point compare can still fail due to rounding errors. Search on "floating point comparision" for some examples -- Chisolm Republic of Texas
On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott
<seemywebsite@myfooter.really> wrote:

>I assume the answer is "yes", but is it normal behavior in an x86 >processor to get slightly different answers when you add up the same >short vector of doubles at different times?
In addition to Robert Wessel's excellent response: If the operations are performed in the same order every time, then the results *could* be the same. However ... The FPU registers carry extra bits internally which are lost (zeroed) when the values are stored to memory and then reloaded. This happens, e.g., when the compiler runs out of registers and needs to spill temporaries. It also can happen during a context switch if you multitask, although this is less predictable because registers can be spilled lazily and (from your POV) non-deterministically depending on the states of the other processes using the FPU. In general, if you want to guarantee consistent results, you need to control when and where intermediate results are stored to memory. In the most extreme cases that could mean storing the result of every individual operation. George
Tim Wescott <seemywebsite@myfooter.really> writes:
> is it normal behavior in an x86 processor to get slightly different > answers when you add up the same short vector of doubles at different > times?
No, not if you add up the same numbers in the same order with the same code. Can you post some sample code? If you're using gcc and the summations are in different parts of the code, did you try -ffloat-store? That can get rid of inconsistencies that can result from doing some computations in the 80-bit FP registers and others in 64-bit machine words. But it can slow down the code. Can you refactor the summation to a non-inlined function (call same function from both places) and see what happens? Something like noinline double sum(int n, double vec[]) { ... } the noinline should make sure it's the same code called from both places.
On Fri, 26 Feb 2016 13:57:50 -0600, Mark Storkamp wrote:

> In article <UJednYstNY8hN03LnZ2dnUU7-YudnZ2d@giganews.com>, > Tim Wescott <seemywebsite@myfooter.really> wrote: > >> On Fri, 26 Feb 2016 13:24:02 -0600, Robert Wessel wrote: >> >> > On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott >> > <seemywebsite@myfooter.really> wrote: >> > >> >>I assume the answer is "yes", but is it normal behavior in an x86 >> >>processor to get slightly different answers when you add up the same >> >>short vector of doubles at different times? >> >> >> >>(Very slightly different -- basically so that sum1 == sum2 isn't >> >>true, but not noticeable in any other way.) >> >> >> >>I assume this has something to do with how the floating-point >> >>processor state is treated when a calculation is done. Does anyone >> >>know the details? >> > >> > >> > Floating point addition is not commutative. Never has been, never >> > will be. IOW, (a+b)+c is *not* the same as a+(b+c). >> > >> > In short its a problem with the rounding of intermediate results. >> > Consider a three digit decimal FP system, and you add 100, -100 and >> > .01. >> > If you compute (100 + -100)+.01, you'll get .01. If you compute >> > (100+.01) + -100, you get 0, since the (100+.01) rounds back to 100. >> > It's actually fairly easy to get into a situation (like the one I >> > just illustrated) where the result is *entirely* noise. People doing >> > long sums (matrix multiplication, for example), often take >> > considerable pains to try and minimize the accumulated error, >> > including carefully controlling the order in which things are added. >> > >> > In general, there are few cases where two FP numbers can be >> > meaningfully compared for equality. Usually the correct approach is >> > to do something like "fabs(a-b) < epsilon", although picking epsilon >> > can be a bit tricky. >> > >> > "What Every Computer Scientist Should Know About Floating-Point >> > Arithmetic": >> > >> > https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html >> >> In this case, though, I'm computing score = a + b + c. Strictly in >> that order. Twice, with other floating point calculations in between. >> And getting different answers. > > That's not at all unusual. There are extra guard bits in the FPU, and if > an intermediate value at any times gets put in a CPU register or memory > it can affect the rounding. I'll 2nd the idea of reading the > docs.oracle.com link. (and never* test floating point values for > equality) (*for limited definitions of never)
Until now I had only ever done it to test if a value had been copied (i.e., a = b). It seemed safe... -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On Fri, 26 Feb 2016 13:56:19 -0800, Paul Rubin wrote:

> Tim Wescott <seemywebsite@myfooter.really> writes: >> is it normal behavior in an x86 processor to get slightly different >> answers when you add up the same short vector of doubles at different >> times? > > No, not if you add up the same numbers in the same order with the same > code. Can you post some sample code? If you're using gcc and the > summations are in different parts of the code, did you try > -ffloat-store? That can get rid of inconsistencies that can result from > doing some computations in the 80-bit FP registers and others in 64-bit > machine words. But it can slow down the code. > > Can you refactor the summation to a non-inlined function (call same > function from both places) and see what happens? Something like > > noinline double sum(int n, double vec[]) { > ... > } > > the noinline should make sure it's the same code called from both > places.
I appreciate what you're trying to say, but it's too much work -- I found a way to do the comparison the "right" way (fabs(a - b) < tolerance) and still meet my other goals, and that's what I'm running with. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
The 2026 Embedded Online Conference