EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

floating point calculations.

Started by knightslancer February 10, 2009
On Wed, 11 Feb 2009 10:51:42 +0100, David Brown
<david@westcontrol.removethisbit.com> wrote:

>Paul Keinanen wrote: >> On Wed, 11 Feb 2009 08:33:54 +0100, David Brown >> <david@westcontrol.removethisbit.com> wrote: >>
>> For really small systems a 3 byte format with 8 bit exponent and 16 >> bit mantissa is often enough and easy to implement. Such format give >> about 4-5 significant digits, which is often enough, when the >> application is interfacing with the external world with 12-16 bit A/D >> and D/A converters. >> > >Yes, such extra formats can be very efficient. However, you'll probably >lose all benefits of having clear and understandable source code (which >is often the reason to choose floating point in the first place), unless >your compiler supports such formats directly.
Use a language with operator overloading, such as C++. Paul
Paul Keinanen wrote:
> On Wed, 11 Feb 2009 10:51:42 +0100, David Brown > <david@westcontrol.removethisbit.com> wrote: > >> Paul Keinanen wrote: >>> On Wed, 11 Feb 2009 08:33:54 +0100, David Brown >>> <david@westcontrol.removethisbit.com> wrote: >>> > >>> For really small systems a 3 byte format with 8 bit exponent and 16 >>> bit mantissa is often enough and easy to implement. Such format give >>> about 4-5 significant digits, which is often enough, when the >>> application is interfacing with the external world with 12-16 bit A/D >>> and D/A converters. >>> >> Yes, such extra formats can be very efficient. However, you'll probably >> lose all benefits of having clear and understandable source code (which >> is often the reason to choose floating point in the first place), unless >> your compiler supports such formats directly. > > Use a language with operator overloading, such as C++. >
That can help significantly with the syntax, but it can still be hard to get an optimal implementation. Depending on your class (or template) structure and your compiler, you might end up with significant overhead to using the class. Even if you can arrange for an optimal balance between inlining and function calls, you will still not get as efficient an implementation as the compiler's native types, because you don't (for most compilers) have access to the cpu's flags, and the compiler will be unable to do extra optimisation such as pre-calculating values, strength reduction (such as changing a division by a constant into a multiplication), and efficient register allocation.
On Tue, 10 Feb 2009 23:05:20 -0800, Jack wrote:

> On 10 Feb, 18:11, Tim Wescott <t...@seemywebsite.com> wrote: > >> >> 200 * 5.25. >> >> > = 200 * 21 / 4 >> >> > It's fastest if you can reduce your problem to integer operations. >> > Please check if you _really_ need floating point operations. >> >> 200 * 21 >> >> Right shift the answer by 2. >> >> (remember, this is assembly...) > > optimizing compiler should already convert /2^i to >>i. Useless to make > the code less readable if the compiler optimize in the right manner. ;) > > Bye Jack
Hard for it to do so when it isn't invoked. As I said, this is assembly we're talking about. (and yes, any halfway decent compiler will turn x / 4 into (x >> 2), assuming that it is used). -- http://www.wescottdesign.com
Frank-Christian Kr&#4294967295;gel wrote:
> knightslancer schrieb: >
... snip ...
> >> I am trying to do an arithmetic calculations that involve >> multiplying a 32 bit integer with a floating point numbers. >> For example: >> >> 200 * 5.25. > > = 200 * 21 / 4 > > It's fastest if you can reduce your problem to integer operations. > Please check if you _really_ need floating point operations.
However, ensure that the multiplication doesn't overflow, and don't allow rearrangement. I.e., in C, write: (200 * 21) / 4. That is after assuring the 200 * 21 won't overflow. You may need to use longs for the calculation. The parentheses above ensure that the computation is not rearranged in to 200 / 4 * 21. Check the assembly code. That should work, but the compiler may be faulty. Then you will have to use: thing = 200; /* done by earlier code */ ... /* code to ensure no overflow */ temp = thing * 21; ans = temp / 4; -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section.
Paul Keinanen wrote:
> "knightslancer" <knightslancer@gmail.com> wrote: >
... snip ...
> >> How can I write an assembly code for this on ARM7 LPC2292 boards. >> Please help me out with this. I am a started in assembly language >> programming. > > If the processor does not have a floating point instruction set, > just convert the integer to the same floating point notation as > your floating point numbers (whatever notation you have chosen). > Doing the actual floating point multiplication is just multiplying > the significands and adding the exponents and correcting the bias.
In general, when using software floating point, you will find that addition (or subtraction) is the slowest basic operation, due to the need to find a common 'size' to inflict on both operands. Division is the next slowest, and multiplication the fastest. -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section.
On Wed, 11 Feb 2009 21:47:30 -0500, CBFalconer <cbfalconer@yahoo.com>
wrote:

>Paul Keinanen wrote: >> "knightslancer" <knightslancer@gmail.com> wrote: >> >... snip ... >> >>> How can I write an assembly code for this on ARM7 LPC2292 boards. >>> Please help me out with this. I am a started in assembly language >>> programming. >> >> If the processor does not have a floating point instruction set, >> just convert the integer to the same floating point notation as >> your floating point numbers (whatever notation you have chosen). >> Doing the actual floating point multiplication is just multiplying >> the significands and adding the exponents and correcting the bias. > >In general, when using software floating point, you will find that >addition (or subtraction) is the slowest basic operation, due to >the need to find a common 'size' to inflict on both operands. >Division is the next slowest, and multiplication the fastest.
Floating point addition/subtractions can be quite slow due to the need to denormalize the smaller operand by shifting it right by up to 24 bits for a 32 bit single precision float and normalizing the sum/difference by shifting it left by up to 24 bits. On an 8 bitter, an initial bulk shift with 8 or 16 bits can be done with byte moves and then doing the remaining 1-7 bit shift the traditional way. Floating multiply can be faster, if the processor has a decent 8x8=>16, 16x16=>32 or 32x32=>64 bit single cycle unsigned integer multiply instruction. With only an 8x8=>16 (and 16x16=>16) bit HW multiplication instruction, nine such multiplications are needed for the single precision case. Even with 8x8 multiply instructions, it might still be more effective in doing the 24x24=>24 bit mantissa multiplication the traditional way by shifts and adds. Paul
On 2009-02-12, CBFalconer <cbfalconer@yahoo.com> wrote:

> In general, when using software floating point, you will find that > addition (or subtraction) is the slowest basic operation, due to > the need to find a common 'size' to inflict on both operands. > Division is the next slowest, and multiplication the fastest.
I've not found that to be true on any of the platforms I've benchmarked. For example, I timed the four operations on a 6800, and add/sub was about 1ms, and mult/div was about 4ms. -- Grant Edwards grante Yow! Please come home with at me ... I have Tylenol!! visi.com

Grant Edwards wrote:

> On 2009-02-12, CBFalconer <cbfalconer@yahoo.com> wrote: > > >>In general, when using software floating point, you will find that >>addition (or subtraction) is the slowest basic operation, due to >>the need to find a common 'size' to inflict on both operands. >>Division is the next slowest, and multiplication the fastest. > > > I've not found that to be true on any of the platforms I've > benchmarked. For example, I timed the four operations on a > 6800, and add/sub was about 1ms, and mult/div was about 4ms.
I compared the fixed point math to the emulated floating point on AVR, HC12, TMS28xx and BlackFin. For the same control algorithms implemented in C/C++, the floating point variant can be expected somewhat 15 times slower then the integer. The float add/sub/mul speed is in the same ballpark, however the division is much slower, being somewhat x4..x10 of the other operations. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
knightslancer wrote:
> Hi Friends, > > I have tried a lot about this. I donno if my mind is not good at this. > > I am trying to do an arithmetic calculations that involve multiplying > a 32 bit integer with a floating point numbers. For example: > > 200 * 5.25. > > How can I write an assembly code for this on ARM7 LPC2292 boards. > Please help me out with this. I am a started in assembly language > programming. > > Thanks > knight
You seem to have spawned a bunch of conversations, most of which don't seem to help. We have used conversion to scaled integers to accomplish this on low end micros. This has limitations in that ideally you need to know the number of decimal places that you have, or at least the number that are significant.. Using you example numbers, and a limit of 2 decimals, you mulitply the float by 100 and convert to an integer. 5.25 now becomes 525. Do the multiplication as shown and you have the answer scaled up by 100, (200 * 525 = 105,000). Simply divide it down by the applicable factors and subtracting intermediate values to get the real answer. 105,000 / 100 = 1050 (units) 105,000 - 105,000 = 0, so you are done. 123 * 1.23 = ? 123 * (1.23 * 100) = ? 123 * 123 = 15129 15129 / 100 = 151 (units) 15129 / (151*100) = ? 15129 - 15100 = 29 (non zero, so keep going) 29 / 10 = 2 (tenths) 29 - (2*10) = ? 29 - 20 = 9 (non zero, so keep going) Next division is by 1, so you are done, the 9 is now hundreths. Now reassemble the pieces. 151 units + 2 tenths + 9 hundreths = 151.29 Scott
> I compared the fixed point math to the emulated floating point on AVR, > HC12, TMS28xx and BlackFin. For the same control algorithms implemented > in C/C++, the floating point variant can be expected somewhat 15 times > slower then the integer. The float add/sub/mul speed is in the same > ballpark, however the division is much slower, being somewhat x4..x10 of > the other operations.
We implemented a fixed point library a couple years ago and were surprised to find that for transcendental functions that for the same data sizes 4 byte float and 8:24 fixed for example the execution time was remarkably similar. From an application point of view fixed point gave increased precision and floating point gave larger dynamic range. As several people have pointed out the biggest time issue is normalization on processors that don't have a barrel shifter. Regards, -- Walter Banks Byte Craft Limited http://www.bytecraft.com

The 2024 Embedded Online Conference