How would my SoC calculate an integer square root?

Hello everyone,

I am working with a Nordic nRF52 SoC.  The CPU subsystem of the nRF52 is an ARM Cortex-M4.  My development system is a Linux box running GCC.  My embedded systems toolchain is gcc-arm-none-eabi-4_9-2015q1, as recommended by Nordic.

I have written a fair amount of code for this device, but now I have to explore a bit deeper.  I have an algorithm that may require me to calculate a lot of square roots, and quickly.  However, the numbers in question are all small integers.

Even though the ARM Cortex-M4 has an FPU, it shouldn't need to get involved in this square root calculation.  Optimized algorithms for various math functions exist.  A nice discussion of the optimization of the integer square root algorithm, written by Dr. Jack Crenshaw, is found here:

http://www.embedded.com/electronics-blogs/programmer-s-toolbox/4219659/Integer-Square-Roots

I am interested in understanding exactly what code will be compiled if I #include <math.h> in my program, and then call sqrt() with an int32 argument.  It stands to reason that optimized code, like that shown in the blog above, is what should be used.  How can I determine this?  

I am poking around in the gcc-arm-none-eabi-4_9-2015q1 folder for clues.  The only math-related file that I can find there is called tgmath.h, and it just #defines aliases which point back to math.h.  I am looking for source code.

Thanks for any advice you can share.

Reply by Clifford Heath ●January 4, 20172017-01-04

On 05/01/17 10:27, jladasky@itu.edu wrote:
> I am interested in understanding exactly what code will be compiled if I #include <math.h> in my program,
> and then call sqrt() with an int32 argument.

sqrt() takes a double-precision floating point number, so
your int32 will be promoted to double before sqrt is called,
and you'll get a double back.

Reply by ●January 4, 20172017-01-04

On Wednesday, January 4, 2017 at 3:46:43 PM UTC-8, Clifford Heath wrote:
> On 05/01/17 10:27, j...@itu.edu wrote:
> > I am interested in understanding exactly what code will be compiled if I #include <math.h> in my program,
> > and then call sqrt() with an int32 argument.
> 
> sqrt() takes a double-precision floating point number, so
> your int32 will be promoted to double before sqrt is called,
> and you'll get a double back.

Thanks for your reply, Clifford.  That's actually not an encouraging answer though!  If that is the case, I think I will write the square root function that was shown in Listing 4 of Jack Crenshaw's blog.  The floating-point function will spend time calculating digits of precision that I do not need for my application.

Reply by David Brown ●January 4, 20172017-01-04

On 05/01/17 00:27, jladasky@itu.edu wrote:
> Hello everyone,
>
> I am working with a Nordic nRF52 SoC.  The CPU subsystem of the nRF52
> is an ARM Cortex-M4.  My development system is a Linux box running
> GCC.  My embedded systems toolchain is gcc-arm-none-eabi-4_9-2015q1,
> as recommended by Nordic.
>
> I have written a fair amount of code for this device, but now I have
> to explore a bit deeper.  I have an algorithm that may require me to
> calculate a lot of square roots, and quickly.  However, the numbers
> in question are all small integers.
>
> Even though the ARM Cortex-M4 has an FPU, it shouldn't need to get
> involved in this square root calculation.  Optimized algorithms for
> various math functions exist.  A nice discussion of the optimization
> of the integer square root algorithm, written by Dr. Jack Crenshaw,
> is found here:
>
> http://www.embedded.com/electronics-blogs/programmer-s-toolbox/4219659/Integer-Square-Roots
>
>  I am interested in understanding exactly what code will be compiled
> if I #include <math.h> in my program, and then call sqrt() with an
> int32 argument.  It stands to reason that optimized code, like that
> shown in the blog above, is what should be used.  How can I determine
> this?

No, it does not "stand to reason" at all.  The C standard library header 
<math.h> goes along with an implementation of the C standard library 
function for sqrt() - this will do the calculations in double-precision 
floating point, as required by the C standards, and giving the answer to 
full IEEE accuracy.

There are a lot of ways you can calculate square roots, with various 
algorithms, lookup tables, etc., providing different balances between 
accuracy, run-time, and code space.  There is no one "perfect" answer.

And don't dismiss the floating point unit in the M4 too quickly - for 
some sorts of calculations, it may be faster than using integer 
arithmetic, even though you are using integer operands at the start.

>
> I am poking around in the gcc-arm-none-eabi-4_9-2015q1 folder for
> clues.  The only math-related file that I can find there is called
> tgmath.h, and it just #defines aliases which point back to math.h.  I
> am looking for source code.
>
> Thanks for any advice you can share.
>

Reply by ●January 4, 20172017-01-04

Den torsdag den 5. januar 2017 kl. 01.17.19 UTC+1 skrev jlad...@itu.edu:
> On Wednesday, January 4, 2017 at 3:46:43 PM UTC-8, Clifford Heath wrote:
> > On 05/01/17 10:27, j...@itu.edu wrote:
> > > I am interested in understanding exactly what code will be compiled if I #include <math.h> in my program,
> > > and then call sqrt() with an int32 argument.
> > 
> > sqrt() takes a double-precision floating point number, so
> > your int32 will be promoted to double before sqrt is called,
> > and you'll get a double back.
> 
> Thanks for your reply, Clifford.  That's actually not an encouraging answer though!  If that is the case, I think I will write the square root function that was shown in Listing 4 of Jack Crenshaw's blog.  The floating-point function will spend time calculating digits of precision that I do not need for my application.


the fpu instruction VSQRT.F32 takes 14 cycles if floats are enough that is probably hard to beat

Reply by ●January 4, 20172017-01-04

jladasky@itu.edu wrote:
> Hello everyone,
> 
> I am working with a Nordic nRF52 SoC.  The CPU subsystem of the nRF52 is an ARM Cortex-M4.  My development system is a Linux box running GCC.  My embedded systems toolchain is gcc-arm-none-eabi-4_9-2015q1, as recommended by Nordic.
> 
> I have written a fair amount of code for this device, but now I have to explore a bit deeper.  I have an algorithm that may require me to calculate a lot of square roots, and quickly.  However, the numbers in question are all small integers.
> 
> Even though the ARM Cortex-M4 has an FPU, it shouldn't need to get involved in this square root calculation.  Optimized algorithms for various math functions exist.  A nice discussion of the optimization of the integer square root algorithm, written by Dr. Jack Crenshaw, is found here:
> 
> http://www.embedded.com/electronics-blogs/programmer-s-toolbox/4219659/Integer-Square-Roots

Actually, this method is good if you do not have hardware multiplication,
or possibly if code size is more important than speed.  However,
otherwise method based on table lookup for initial approximation
and Newton iteration is likely to be faster.

> I am interested in understanding exactly what code will be compiled if I #include <math.h> in my program, and then call sqrt() with an int32 argument.  It stands to reason that optimized code, like that shown in the blog above, is what should be used.  How can I determine this?  

Use something like:

/mnt/m1/pom/usr/bin/arm-none-eabi-gcc -Wall -mthumb -mcpu=cortex-m4 -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O3 -S your_file.c

(replace /mnt/m1/pom/usr/bin/arm-none-eabi-gcc by path to your gcc and
use your comiler options).  In file 'your_file.s' you will find
resulting assembly code.   Using gcc-5.3 I get:

       bl      __aeabi_i2d
       vmov    d0, r0, r1
       bl      sqrt
       vmov    r0, r1, d0
       bl      __aeabi_d2iz

AFAICS this is: convertion from integer to double, moving argument to
flating point register, calling sqrt from C library, moving argument
back to integer registers, convertion from double to integer.

What sqrt from library will do depends on C library, I do not
use any...

> I am poking around in the gcc-arm-none-eabi-4_9-2015q1 folder for clues.  The only math-related file that I can find there is called tgmath.h, and it just #defines aliases which point back to math.h.  I am looking for source code.

'sqrt' is in C library.  Some frequently used library functions
are buit into gcc, but apparenty 'sqrt' is not one of them.
So do not look inside gcc.  Instad look for C library.  It
seems that newlib uses implementation which converts back to
integers and is doing most calculations on integers.  Seem
highly suboptimal.

-- 
                              Waldek Hebisch

Reply by Reinhardt Behm ●January 4, 20172017-01-04

AT Thursday 05 January 2017 08:17, jladasky@itu.edu wrote:

> On Wednesday, January 4, 2017 at 3:46:43 PM UTC-8, Clifford Heath wrote:
>> On 05/01/17 10:27, j...@itu.edu wrote:
>> > I am interested in understanding exactly what code will be compiled if
>> > I #include <math.h> in my program, and then call sqrt() with an int32
>> > argument.
>> 
>> sqrt() takes a double-precision floating point number, so
>> your int32 will be promoted to double before sqrt is called,
>> and you'll get a double back.
> 
> Thanks for your reply, Clifford.  That's actually not an encouraging
> answer though!  If that is the case, I think I will write the square root
> function that was shown in Listing 4 of Jack Crenshaw's blog.  The
> floating-point function will spend time calculating digits of precision
> that I do not need for my application.

One of the rules of optimization is: Only do it after you have numbers to 
compare. You are jumping to conclusions without any data. First benchmark 
and measure, then optimize, measure again and compare.

-- 
Reinhardt

Reply by rickman ●January 4, 20172017-01-04

On 1/4/2017 8:22 PM, Reinhardt Behm wrote:
> AT Thursday 05 January 2017 08:17, jladasky@itu.edu wrote:
>
>> On Wednesday, January 4, 2017 at 3:46:43 PM UTC-8, Clifford Heath wrote:
>>> On 05/01/17 10:27, j...@itu.edu wrote:
>>>> I am interested in understanding exactly what code will be compiled if
>>>> I #include <math.h> in my program, and then call sqrt() with an int32
>>>> argument.
>>>
>>> sqrt() takes a double-precision floating point number, so
>>> your int32 will be promoted to double before sqrt is called,
>>> and you'll get a double back.
>>
>> Thanks for your reply, Clifford.  That's actually not an encouraging
>> answer though!  If that is the case, I think I will write the square root
>> function that was shown in Listing 4 of Jack Crenshaw's blog.  The
>> floating-point function will spend time calculating digits of precision
>> that I do not need for my application.
>
> One of the rules of optimization is: Only do it after you have numbers to
> compare. You are jumping to conclusions without any data. First benchmark
> and measure, then optimize, measure again and compare.

The first rule of optimizing is, "don't" until you find you clearly need 
it.

-- 

Rick C

Reply by Tim Wescott ●January 4, 20172017-01-04

On Wed, 04 Jan 2017 15:27:10 -0800, jladasky wrote:

> Hello everyone,
> 
> I am working with a Nordic nRF52 SoC.  The CPU subsystem of the nRF52 is
> an ARM Cortex-M4.  My development system is a Linux box running GCC.  My
> embedded systems toolchain is gcc-arm-none-eabi-4_9-2015q1, as
> recommended by Nordic.
> 
> I have written a fair amount of code for this device, but now I have to
> explore a bit deeper.  I have an algorithm that may require me to
> calculate a lot of square roots, and quickly.  However, the numbers in
> question are all small integers.
> 
> Even though the ARM Cortex-M4 has an FPU, it shouldn't need to get
> involved in this square root calculation.  Optimized algorithms for
> various math functions exist.  A nice discussion of the optimization of
> the integer square root algorithm, written by Dr. Jack Crenshaw, is
> found here:
> 
> http://www.embedded.com/electronics-blogs/programmer-s-toolbox/4219659/
Integer-Square-Roots
> 
> I am interested in understanding exactly what code will be compiled if I
> #include <math.h> in my program, and then call sqrt() with an int32
> argument.  It stands to reason that optimized code, like that shown in
> the blog above, is what should be used.  How can I determine this?
> 
> I am poking around in the gcc-arm-none-eabi-4_9-2015q1 folder for clues.
>  The only math-related file that I can find there is called tgmath.h,
> and it just #defines aliases which point back to math.h.  I am looking
> for source code.
> 
> Thanks for any advice you can share.

The FPU hardware is almost certainly faster than anything you could do in 
software.  Unless the largest-value integer you need to take the square 
root of is small enough that you can just implement a look-up table.

Benchmark, and see.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Reply by David Brown ●January 5, 20172017-01-05

On 05/01/17 01:28, lasselangwadtchristensen@gmail.com wrote:
> Den torsdag den 5. januar 2017 kl. 01.17.19 UTC+1 skrev
> jlad...@itu.edu:
>> On Wednesday, January 4, 2017 at 3:46:43 PM UTC-8, Clifford Heath
>> wrote:
>>> On 05/01/17 10:27, j...@itu.edu wrote:
>>>> I am interested in understanding exactly what code will be
>>>> compiled if I #include <math.h> in my program, and then call
>>>> sqrt() with an int32 argument.
>>> 
>>> sqrt() takes a double-precision floating point number, so your
>>> int32 will be promoted to double before sqrt is called, and
>>> you'll get a double back.
>> 
>> Thanks for your reply, Clifford.  That's actually not an
>> encouraging answer though!  If that is the case, I think I will
>> write the square root function that was shown in Listing 4 of Jack
>> Crenshaw's blog.  The floating-point function will spend time
>> calculating digits of precision that I do not need for my
>> application.
> 
> 
> the fpu instruction VSQRT.F32 takes 14 cycles if floats are enough
> that is probably hard to beat
> 

As long as "-ffast-math" and at least -O1 are enabled, as well as
appropriate cpu flags, the "sqrtf" function should generate a VSQRT.F32
automatically.  Note that "sqrtf" is not the same as "sqrt".

Previous12 3 4 Next

How would my SoC calculate an integer square root?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group