EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Damned floating point

Started by Tim Wescott February 26, 2016
Are a & b & c identicle both times? Maybe rounding 1/2 up half the time 
and 1/2 down the other half...

Hul

Tim Wescott <seemywebsite@myfooter.really> wrote:
> On Fri, 26 Feb 2016 13:24:02 -0600, Robert Wessel wrote:
> > On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott > > <seemywebsite@myfooter.really> wrote: > > > >>I assume the answer is "yes", but is it normal behavior in an x86 > >>processor to get slightly different answers when you add up the same > >>short vector of doubles at different times? > >> > >>(Very slightly different -- basically so that sum1 == sum2 isn't true, > >>but not noticeable in any other way.) > >> > >>I assume this has something to do with how the floating-point processor > >>state is treated when a calculation is done. Does anyone know the > >>details? > > > > > > Floating point addition is not commutative. Never has been, never will > > be. IOW, (a+b)+c is *not* the same as a+(b+c). > > > > In short its a problem with the rounding of intermediate results. > > Consider a three digit decimal FP system, and you add 100, -100 and .01. > > If you compute (100 + -100)+.01, you'll get .01. If you compute > > (100+.01) + -100, you get 0, since the (100+.01) rounds back to 100. > > It's actually fairly easy to get into a situation (like the one I just > > illustrated) where the result is *entirely* noise. People doing long > > sums (matrix multiplication, for example), often take considerable pains > > to try and minimize the accumulated error, including carefully > > controlling the order in which things are added. > > > > In general, there are few cases where two FP numbers can be meaningfully > > compared for equality. Usually the correct approach is to do something > > like "fabs(a-b) < epsilon", although picking epsilon can be a bit > > tricky. > > > > "What Every Computer Scientist Should Know About Floating-Point > > Arithmetic": > > > > https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
> In this case, though, I'm computing score = a + b + c. Strictly in that > order. Twice, with other floating point calculations in between. And > getting different answers.
> --
> Tim Wescott > Wescott Design Services > http://www.wescottdesign.com
On 2016-02-26, George Neuner <gneuner2@comcast.net> wrote:
> On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott <seemywebsite@myfooter.really> wrote: > >>I assume the answer is "yes", but is it normal behavior in an x86 >>processor to get slightly different answers when you add up the same >>short vector of doubles at different times? > > In addition to Robert Wessel's excellent response: > > If the operations are performed in the same order every time, then the > results *could* be the same. However ... > > The FPU registers carry extra bits internally which are lost (zeroed) > when the values are stored to memory and then reloaded. This happens, > e.g., when the compiler runs out of registers and needs to spill > temporaries. > > It also can happen during a context switch if you multitask, although > this is less predictable because registers can be spilled lazily and > (from your POV) non-deterministically depending on the states of the > other processes using the FPU.
Another excellent answer.
> In general, if you want to guarantee consistent results,
In general, if you want to do this, it means you're doing something wrong.
> you need to control when and where intermediate results are stored > to memory.
And it is not easy to do that...
> the most extreme cases that could mean storing the result of every > individual operation.
Which is going to involve never doing more than one operation per statement and declaring all sorts of stuff volatile and/or putting in memory barriers between all the statements. It'll get ugly fast, and you'll never really know it's going to continue to work if the compiler or surrounding code changes. To address the context-switch problem you might have to disable interrupts or lock the scheduler -- unless you know for a fact that your OS preserves guard bits during FPU context switches. Or, you can do your floating point in assembly (again with interrupts disabled... yadda yadda yadda). -- Grant Edwards grant.b.edwards Yow! Am I SHOPLIFTING? at gmail.com
On Fri, 26 Feb 2016 13:42:52 -0600, Tim Wescott
<seemywebsite@myfooter.really> wrote:

>On Fri, 26 Feb 2016 13:24:02 -0600, Robert Wessel wrote: > >> On Fri, 26 Feb 2016 12:58:07 -0600, Tim Wescott >> <seemywebsite@myfooter.really> wrote: >> >>>I assume the answer is "yes", but is it normal behavior in an x86 >>>processor to get slightly different answers when you add up the same >>>short vector of doubles at different times? >>> >>>(Very slightly different -- basically so that sum1 == sum2 isn't true, >>>but not noticeable in any other way.) >>> >>>I assume this has something to do with how the floating-point processor >>>state is treated when a calculation is done. Does anyone know the >>>details? >> >> >> Floating point addition is not commutative. Never has been, never will >> be. IOW, (a+b)+c is *not* the same as a+(b+c). >> >> In short its a problem with the rounding of intermediate results. >> Consider a three digit decimal FP system, and you add 100, -100 and .01. >> If you compute (100 + -100)+.01, you'll get .01. If you compute >> (100+.01) + -100, you get 0, since the (100+.01) rounds back to 100. >> It's actually fairly easy to get into a situation (like the one I just >> illustrated) where the result is *entirely* noise. People doing long >> sums (matrix multiplication, for example), often take considerable pains >> to try and minimize the accumulated error, including carefully >> controlling the order in which things are added. >> >> In general, there are few cases where two FP numbers can be meaningfully >> compared for equality. Usually the correct approach is to do something >> like "fabs(a-b) < epsilon", although picking epsilon can be a bit >> tricky. >> >> "What Every Computer Scientist Should Know About Floating-Point >> Arithmetic": >> >> https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html > >In this case, though, I'm computing score = a + b + c. Strictly in that >order. Twice, with other floating point calculations in between. And >getting different answers.
If you're doing this on the x87 FPU, Mark is right, but his explanation is a bit wrong. Normally the x87 does everything with 80 bit temp reals, and it can lead to odd results if in some cases the compiler decides to store a value back to memory (in which case it'll use the defined format, 64 bits for a double, and then you'll lose some precision that might have been there the first time it was used). So if you have: a =... b =... c =... score = a + b + c; ...other code... score2 = a + b + c; The first sum might be using values of a, b, and/or c still in registers from the calculations that produced them in the 80-bit format, but by the time you get to the second sum, those values may be gone, and the code will reload the saved values of a/b/c from memory, which will have been stored rounded to 64-bit format. There are ways to set the precision of the FPU (so instead of 64 bits of precision you get 53 or 24), but compilers often don't do that (MSVC gives you _control87() so you can do it yourself). Your compiler might also have a "strict IEEE math" option, which will force stores and (re-)loads to avoid the issue (IOW everything gets forced to work in 64-bit format), you might be able to set that ("-fp:strict" in MSVC). If your compiler has an option to do FP with SSE2, that will also eliminate the variable precision problem (and that should be true of all compilers generating x86-64 code). Also note that the two sums do *not* have to be performed in the same order, and that's actually at least somewhat common in practice if one of the values in still in a register. For example if c happened to be used in "...other code...", that register might still be live, and the second sum might just add a and b (from memory) to the c still in the register, where as the first sum might have added the three values in the "normal" order.
On 26.2.2016 &#1075;. 20:58, Tim Wescott wrote:
> I assume the answer is "yes", but is it normal behavior in an x86 > processor to get slightly different answers when you add up the same > short vector of doubles at different times? > > (Very slightly different -- basically so that sum1 == sum2 isn't true, > but not noticeable in any other way.) > > I assume this has something to do with how the floating-point processor > state is treated when a calculation is done. Does anyone know the > details? > > TIA. >
Hi Tim, you already got all these "excellent answers" but I am not sure we know how you are testing this. If you do it using high level then we don't know if this is an FPU rounding issue or a compiler shortcoming. Basically, you want to go down to the lowest level to see what's up, (obviously I know you don't need to be told that). I don't know the x86, but it must also have that FPSCR (floating point status and control register) which gives you control over rounding modes etc. Generally in an isolated environment (i.e. no task switches which might fail to save/restore all of the FPU context, or might be unable to if the FPU design won't allow it, say the guard bits - I have not seen that but I would not be too surprised to see it) you have to get the same result all other things being same. Key being "all other"... Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
On Fri, 26 Feb 2016 17:52:27 -0600, Robert Wessel
<robertwessel2@yahoo.com> wrote:

> >If you're doing this on the x87 FPU, Mark is right, but his >explanation is a bit wrong. Normally the x87 does everything with 80 >bit temp reals, and it can lead to odd results if in some cases the >compiler decides to store a value back to memory (in which case it'll >use the defined format, 64 bits for a double, and then you'll lose >some precision that might have been there the first time it was used).
The x87 floating point stack is small, so you may run out of stack registers whit complex arithmetic expression. If part of the expression is calculated in a floating point function, you easily can run out of stack registers. If the function is declared locally, the compiler will know if the whole calculation can be done in the FP stack. However, if the function is defined external, the compiler doesn't know the stack usage, so it may need to save FP stack register across the call. Floating point processors also have various rounding/truncation modes set by some flags. If these bits are altered between calculations, the result might be different. The MMS multimedia instructions use the same FP registers so the float values need to be saved during the MMS calculations and the compiler may fail to set the original rounding/truncate mode during register restore.
On Sat, 27 Feb 2016 04:41:36 +0200, Dimiter_Popoff wrote:

> On 26.2.2016 &#1075;. 20:58, Tim Wescott wrote: >> I assume the answer is "yes", but is it normal behavior in an x86 >> processor to get slightly different answers when you add up the same >> short vector of doubles at different times? >> >> (Very slightly different -- basically so that sum1 == sum2 isn't true, >> but not noticeable in any other way.) >> >> I assume this has something to do with how the floating-point processor >> state is treated when a calculation is done. Does anyone know the >> details? >> >> TIA. >> >> > Hi Tim, > you already got all these "excellent answers" but I am not sure we know > how you are testing this. > If you do it using high level then we don't know if this is an FPU > rounding issue or a compiler shortcoming. > Basically, you want to go down to the lowest level to see what's up, > (obviously I know you don't need to be told that). I don't know the x86, > but it must also have that FPSCR (floating point status and control > register) which gives you control over rounding modes etc. > > Generally in an isolated environment (i.e. no task switches which might > fail to save/restore all of the FPU context, or might be unable to if > the FPU design won't allow it, say the guard bits - I have not seen that > but I would not be too surprised to see it) you have to get the same > result all other things being same. Key being "all other"...
It's on a PC, so I'm pretty much going to treat it as an appliance, say "oh well, I can't compare for equality in this instance", and move on. I'm just curious about what might be going on under the hood. When I revamped the code I actually ended up making it tighter, which was a happy happenstance. -- www.wescottdesign.com
On 27.2.2016 &#1075;. 08:06, Tim Wescott wrote:
> On Sat, 27 Feb 2016 04:41:36 +0200, Dimiter_Popoff wrote: > >> On 26.2.2016 &#1075;. 20:58, Tim Wescott wrote: >>> I assume the answer is "yes", but is it normal behavior in an x86 >>> processor to get slightly different answers when you add up the same >>> short vector of doubles at different times? >>> >>> (Very slightly different -- basically so that sum1 == sum2 isn't true, >>> but not noticeable in any other way.) >>> >>> I assume this has something to do with how the floating-point processor >>> state is treated when a calculation is done. Does anyone know the >>> details? >>> >>> TIA. >>> >>> >> Hi Tim, >> you already got all these "excellent answers" but I am not sure we know >> how you are testing this. >> If you do it using high level then we don't know if this is an FPU >> rounding issue or a compiler shortcoming. >> Basically, you want to go down to the lowest level to see what's up, >> (obviously I know you don't need to be told that). I don't know the x86, >> but it must also have that FPSCR (floating point status and control >> register) which gives you control over rounding modes etc. >> >> Generally in an isolated environment (i.e. no task switches which might >> fail to save/restore all of the FPU context, or might be unable to if >> the FPU design won't allow it, say the guard bits - I have not seen that >> but I would not be too surprised to see it) you have to get the same >> result all other things being same. Key being "all other"... > > It's on a PC, so I'm pretty much going to treat it as an appliance, say > "oh well, I can't compare for equality in this instance", and move on.
Oh it is the right thing to do anyway, instead of comparing FP values for equality the practical thing to do is to compare the difference between the two values for being lower than some threshold of your choice.
> > I'm just curious about what might be going on under the hood. >
I am not sure there is a practical way to do it as things seem to be, it just won't be worth the effort. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
On Fri, 26 Feb 2016 23:38:17 +0000 (UTC), Grant Edwards
<invalid@invalid.invalid> wrote:

> >To address the context-switch problem you might have to disable >interrupts or lock the scheduler -- unless you know for a fact that >your OS preserves guard bits during FPU context switches.
OK, I'm baffled. At least three people have mentioned failure to store guard bits as a possible source for this error. Unless there's a *very* unconventional usage equating the extra bits of precision in the 80-bit format used by default in the x87 to "guard bits", it's just not so. Guard bits are using within a single FP operation to prevent loss of precision when the two operands need to be scaled. These are never visible in the ISA, and never persist between operations (except in an indirect way in that an operation may have a more precise result than it would otherwise). Context switches are also non-issues. You can save the state of the entire x87 FPU, and restore it later. That stores the contents of all 80 bits of each FP register. Now it's theoretically possible that an OS might manually step through the individual registers, and store only (say) 64-bit values for each, but I've never heard of such a thing - not only would it be a lot more complex, it would likely be quite a bit slower too, especially when the context would need to be restored.
On Fri, 26 Feb 2016 23:38:17 +0000 (UTC), Grant Edwards
<invalid@invalid.invalid> wrote:

>On 2016-02-26, George Neuner <gneuner2@comcast.net> wrote: > >> In general, if you want to guarantee consistent [FP] results, > >In general, if you want to do this, it means you're doing something >wrong.
In general yes. However there are applications that really do need floating point, high speed, and reproducible results. As you noted, getting consistent results from the hardware is not easy. But fixed point using the ALU is not always a viable option, and almost any use of software FP will be slower than even the slowest FPU code. Case in point: some years ago I worked on a system for rendering MRI data. At that time, MRI "pixels" were 32-bit floats - now they are 64-bit doubles. There are differing levels of USFDA certification for tools ... not all require reproducible results ... but receiving certification for diagnostic/surgical use does require that identical inputs produce identical outputs. And because MRI sometimes is used _during_ surgery, there is a medical need to produce pictures as quickly as possible. George
On Sat, 27 Feb 2016 02:12:52 -0600, Robert Wessel
<robertwessel2@yahoo.com> wrote:

>OK, I'm baffled. At least three people have mentioned failure to >store guard bits as a possible source for this error. Unless there's >a *very* unconventional usage equating the extra bits of precision in >the 80-bit format used by default in the x87 to "guard bits", it's >just not so.
When the compiler needs to spill what it sees to be a "double" value, it will save only 64 bits even when the x87 is in (default) extended mode. Unless you deliberately set the x87 into double mode, you don't just lose a few guard bits, you lose 11 bits of actual value when the 64-bit extended mantissa is truncated to 53 bits for storage. Not all x86 compilers recognize "long double" as an 80-bit value, some treat it as a synonym for "double". And AFAIK, every x86-64 compiler treats "long double" = "double" ... they don't use the x87 at all but rather use XMM instead.
>Guard bits are using within a single FP operation to prevent loss of >precision when the two operands need to be scaled. These are never >visible in the ISA, and never persist between operations (except in an >indirect way in that an operation may have a more precise result than >it would otherwise).
The extra ~3 bits of guard precision are available as long as the value remains in an FPU register.
>Context switches are also non-issues. You can save the state of the >entire x87 FPU, and restore it later. That stores the contents of all >80 bits of each FP register. Now it's theoretically possible that an >OS might manually step through the individual registers, and store >only (say) 64-bit values for each, but I've never heard of such a >thing - not only would it be a lot more complex, it would likely be >quite a bit slower too, especially when the context would need to be >restored.
Both Windows and Linux save XMM registers lazily - but they can do this because there are [many] hidden rename registers. As long as the CPU doesn't run out of rename registers, it can preserve XMM values across a context switch without spilling them to memory. This goes for ALU values also. On x86-64, at least theoretically, it is possible to context switch without spilling any values at all. In practice it is CPU model and code dependent [and likely very hard to achieve in any case]. I don't know whether x87 state is similarly protected. I don't know whether the early Pentiums ever did FPU renaming, and since the Pentium 4 (SSE-2) x87 use has been highly discouraged, so I tend to doubt that there has been much (or any) enhancement since then. George
The 2026 Embedded Online Conference