Does GCC use single-precision FP hardware on Cortex-M4 parts?

Hi - Can anybody tell me for what, if anything, does GCC use the
single-precision hardware floating point units on some Cortex-M4 parts?
At all? Single-precision float only? Presumably not double?

The PIC32MZ EF series parts have hardware double precision;
presumably GCC uses this for double?

Thanks!
Best Regards, Dave

Reply by Tim Wescott ●September 24, 20162016-09-24

On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:

> Hi - Can anybody tell me for what, if anything, does GCC use the
> single-precision hardware floating point units on some Cortex-M4 parts?
> At all? Single-precision float only? Presumably not double?
> 
> The PIC32MZ EF series parts have hardware double precision;
> presumably GCC uses this for double?
> 
> Thanks!
> Best Regards, Dave

Been there, done that, used correctly it works a charm:

GCC uses the single-precision floating point processor if you tell it 
to.  And yes, it uses it for single-precision float only, so the usual 
math library stuff (sin, cos, etc.) is not, to my knowledge, supported.

(In C++ one _could_ have a smart math library that would see the type of 
the argument and call the correct library function -- I don't believe 
that's done, and I could see it causing all sorts of trouble when things 
get inadvertently up-cast to double as you maintain the code.)

At least on the Cortex M4 part that I used, you needed to turn on the 
math coprocessor -- it was not on by default.  I'm not sure if this is 
par for the course, or if it's just an ST thing.  If you don't turn on 
the math coprocessor then the first time you hit a coprocessor function 
you'll get a fault and the machine will hang.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Reply by Tim Wescott ●September 24, 20162016-09-24

On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:

< snip >

> The PIC32MZ EF series parts have hardware double precision;
> presumably GCC uses this for double?

I would check.  You may have to direct the compiler to do so.  Check also 
to see if you need to do any special initialization on the chip to make 
the hardware double precision work.  If you're using an RTOS, make sure 
that either there's no special care and feeding needed of the coprocessor, 
or that the RTOS does it, or that you only have one magic (and VERY well 
commented) task that's ever allowed to do floating point operations.

On RTOS and floating point: every processor that I've ever worked with 
that has hardware floating point, and even some floating point libraries 
that I've used, have required that the RTOS has knowledge of the floating 
point "stuff".  In these cases, the floating point engine (HW or SW) 
needs to have it's state saved off independently of the rest of the 
processor and/or software.  This means that task switches will be slower 
and the task control blocks bigger -- so you have to take that into 
account when you're choosing to use floating point.  Note that most new 
software floating point libraries _don't_ require special attention from 
the RTOS -- but you never know.  Caveat emptor.  Speak softly, but carry 
a big stick.  'ware dragons.  Etc.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Reply by Don Y ●September 24, 20162016-09-24

On 9/24/2016 1:03 PM, Tim Wescott wrote:
> On RTOS and floating point: every processor that I've ever worked with
> that has hardware floating point, and even some floating point libraries
> that I've used, have required that the RTOS has knowledge of the floating
> point "stuff".  In these cases, the floating point engine (HW or SW)
> needs to have it's state saved off independently of the rest of the
> processor and/or software.  This means that task switches will be slower
> and the task control blocks bigger -- so you have to take that into
> account when you're choosing to use floating point.

The typical approach is to handle the floating point unit independently
of the normal task state.  On a task switch, enable the FPU trap.  When/*if*
the new task attempts to run a floating point operation, the trap saves the
FP state in a "FP control block" (i.e., only tasks that use the FPU need
to bear the cost of that state plus the costs of storing/restoring it).

This can lead to some pathological behaviors but no worse than storing and
restoring for every task switch (or, for tasks identified, a priori at
config time, as using the FPU).

This approach is particularly useful for providing a framework to support
*any* resource that can "co-execute" alongside the main CPU (i.e., shared
resources with asynchronous interfaces).  E.g., in cases where the
FPU is an emulation library with potentially GREATER state than a genuine
FPU.

> Note that most new
> software floating point libraries _don't_ require special attention from
> the RTOS -- but you never know.  Caveat emptor.  Speak softly, but carry
> a big stick.  'ware dragons.  Etc.

Think: errno.

Reply by Dave Nadler ●September 24, 20162016-09-24

On Saturday, September 24, 2016 at 4:03:22 PM UTC-4, Tim Wescott wrote:
> On RTOS and floating point: every processor that I've ever worked with 
> that has hardware floating point, and even some floating point libraries 
> that I've used, have required that the RTOS has knowledge of the floating 
> point "stuff".  In these cases, the floating point engine (HW or SW) 
> needs to have it's state saved off independently of the rest of the 
> processor and/or software.  This means that task switches will be slower 
> and the task control blocks bigger -- so you have to take that into 
> account when you're choosing to use floating point.  Note that most new 
> software floating point libraries _don't_ require special attention from 
> the RTOS -- but you never know.  Caveat emptor.  Speak softly, but carry 
> a big stick.  'ware dragons.  Etc.

Yup. I wrote an RTOS where the cost of saving/restoring the FP emulation
status was too high, so limited FP to only one task. When other tasks are
active the trap vectors used to implement FP are redirected to a fault,
so accidents don't get past development. 

Cortex M4F can cause FP to fault except in a specific context to implement
the above one-task-only, and can also do 'lazy stacking' where
FP-context-switch cost can is incurred only when required.
I need to check to see if FreeRTOS supports these two options out of the box...
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwjtsNTI7qjPAhVp3IMKHT0zAAIQFggrMAI&url=http%3A%2F%2Finfocenter.arm.com%2Fhelp%2Ftopic%2Fcom.arm.doc.dai0298a%2FDAFCBJJB.html&usg=AFQjCNHzPIVU-gxR7E1IB8S4tREfC0hycw&sig2=274orehfxdcufniHB0kO8w&bvm=bv.133700528,d.dmo

Thanks Tim,
Best Regards, Dave

Reply by Don Y ●September 24, 20162016-09-24

On 9/24/2016 1:52 PM, Dave Nadler wrote:
> On Saturday, September 24, 2016 at 4:03:22 PM UTC-4, Tim Wescott wrote:
>> On RTOS and floating point: every processor that I've ever worked with
>> that has hardware floating point, and even some floating point libraries
>> that I've used, have required that the RTOS has knowledge of the floating
>> point "stuff".  In these cases, the floating point engine (HW or SW)
>> needs to have it's state saved off independently of the rest of the
>> processor and/or software.  This means that task switches will be slower
>> and the task control blocks bigger -- so you have to take that into
>> account when you're choosing to use floating point.  Note that most new
>> software floating point libraries _don't_ require special attention from
>> the RTOS -- but you never know.  Caveat emptor.  Speak softly, but carry
>> a big stick.  'ware dragons.  Etc.
>
> Yup. I wrote an RTOS where the cost of saving/restoring the FP emulation
> status was too high, so limited FP to only one task. When other tasks are
> active the trap vectors used to implement FP are redirected to a fault,
> so accidents don't get past development.

You can similarly "hook" the individual entry points to a floating
point emulator/associated "helper routines" (common in supporting exnhanced
data types in legacy compilers) to conditionally save/restore the floating
point context "as needed".  I.e., the FPE takes the form of a "monitor".

[I wrote an old Z80 MTOS that does this -- allowing any task to use
floats/doubles without preconfiguring the task to have such access;
prior to this, I used to set a flag in each task "at development
time" indicating whether or not it was allowed to use floats so
a float task scheduled after another float task incurred the additional
context switch costs but non-float tasks did not]

> Cortex M4F can cause FP to fault except in a specific context to implement
> the above one-task-only, and can also do 'lazy stacking' where
> FP-context-switch cost can is incurred only when required.
> I need to check to see if FreeRTOS supports these two options out of the box...

With foreknowledge of a particular compiler's handling of floats
and the nature of a floating point *emulation*, you can be much more
fine-grained in supporting only the necessary *portions* of the
FP state that need to be preserved (instead of unilaterally storing
and restoring the *entire* state).

In some cases, you can simply swap a "FP_context" pointer and let the
FP state reside in task-specific memory.  (i.e., the FPE becomes reentrant)

Of course, this weds you to a particular implementation.  But, for targets
without FP hardware, it allows reasonably high performance without
inconveniencing the developer:  "Grrr, can't use foo() here cuz it
MIGHT invoke a floating point operation..."

You can also implement floating point operations as a fully shareable
*service* (at a higher cost); I use that for my BigRational package,
currently.

Reply by John Devereux ●September 25, 20162016-09-25

Dave Nadler <drn@nadler.com> writes:

> Hi - Can anybody tell me for what, if anything, does GCC use the
> single-precision hardware floating point units on some Cortex-M4 parts?
> At all? Single-precision float only? Presumably not double?
>
> The PIC32MZ EF series parts have hardware double precision;
> presumably GCC uses this for double?

FYI the latest M7 processors from ST have double-precision floating
point e.g. STM32F769

<http://www.st.com/content/st_com/en/products/microcontrollers/stm32-32-bit-arm-cortex-mcus/stm32f7-series.html?querycriteria=productId=SS1858>

gcc does have the appropriate flags to choose double precision floating
point ARM hardware (fpv5-d16). How well it makes use of this I do not
know, but ARM do contribute directly to gcc so I would hope it is
there. One could try it and see.

<https://launchpad.net/gcc-arm-embedded>

<https://launchpadlibrarian.net/268329726/readme.txt>

-- 

John Devereux

Reply by David Brown ●September 25, 20162016-09-25

On 24/09/16 21:56, Tim Wescott wrote:
> On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:
>
>> Hi - Can anybody tell me for what, if anything, does GCC use the
>> single-precision hardware floating point units on some Cortex-M4 parts?
>> At all? Single-precision float only? Presumably not double?
>>
>> The PIC32MZ EF series parts have hardware double precision;
>> presumably GCC uses this for double?
>>
>> Thanks!
>> Best Regards, Dave
>
> Been there, done that, used correctly it works a charm:
>
> GCC uses the single-precision floating point processor if you tell it
> to.  And yes, it uses it for single-precision float only, so the usual
> math library stuff (sin, cos, etc.) is not, to my knowledge, supported.

Yes, you need to tell gcc about the processor you are using:

-mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16

You may also want to add "-fsingle-precision-constant" which makes gcc 
treat floating point constants as single precision, so that "x * 2.5" is 
done as single precision rather than double precision (without having to 
write "x * 2.5f").

And you /definitely/ want to have "-Wdouble-promotion" to warn about any 
implicit promotions to double that might have occurred accidentally. 
(You can always cast to double, or assign values to double variables, if 
you really want slow but accurate doubles in the code.)

As for the maths library, "sinf", "cosf", and friends are standard 
functions for single-precision versions of the trigonometric functions. 
  But remember that there is no such thing as the "gcc library" - gcc 
can come with a variety of libraries.  Some of these will have good 
implementations of "sinf" and friends, done using only single-precision 
floating point in order to be fast on a the M4F and similar processors. 
  Others simply convert their argument to double, call "sin", then 
convert the result back to single.  If you need to use standard library 
maths functions, and you need the speed of single precision, then check 
the details for the libraries available in the particular toolchain you 
have.

Of course, when you need fast trig functions, it is usually best to have 
tables, interpolations, or other approximations that give you the 
required accuracy far faster than any IEEE-compliant standard library.

>
> (In C++ one _could_ have a smart math library that would see the type of
> the argument and call the correct library function -- I don't believe
> that's done, and I could see it causing all sorts of trouble when things
> get inadvertently up-cast to double as you maintain the code.)

If you are going for C++, you can easily create a wrapper around "float" 
that does not have any implicit conversions to double so that you can be 
sure to avoid accidents.

Or you can use the gcc flags I mentioned above :-)

>
> At least on the Cortex M4 part that I used, you needed to turn on the
> math coprocessor -- it was not on by default.  I'm not sure if this is
> par for the course, or if it's just an ST thing.  If you don't turn on
> the math coprocessor then the first time you hit a coprocessor function
> you'll get a fault and the machine will hang.
>

Reply by Tim Wescott ●September 26, 20162016-09-26

On Sun, 25 Sep 2016 12:29:33 +0200, David Brown wrote:

> On 24/09/16 21:56, Tim Wescott wrote:
>> On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:
>>
>>> Hi - Can anybody tell me for what, if anything, does GCC use the
>>> single-precision hardware floating point units on some Cortex-M4
>>> parts?
>>> At all? Single-precision float only? Presumably not double?
>>>
>>> The PIC32MZ EF series parts have hardware double precision;
>>> presumably GCC uses this for double?
>>>
>>> Thanks!
>>> Best Regards, Dave
>>
>> Been there, done that, used correctly it works a charm:
>>
>> GCC uses the single-precision floating point processor if you tell it
>> to.  And yes, it uses it for single-precision float only, so the usual
>> math library stuff (sin, cos, etc.) is not, to my knowledge, supported.
> 
> Yes, you need to tell gcc about the processor you are using:
> 
> -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16
> 
> You may also want to add "-fsingle-precision-constant" which makes gcc
> treat floating point constants as single precision, so that "x * 2.5" is
> done as single precision rather than double precision (without having to
> write "x * 2.5f").
> 
> And you /definitely/ want to have "-Wdouble-promotion" to warn about any
> implicit promotions to double that might have occurred accidentally.
> (You can always cast to double, or assign values to double variables, if
> you really want slow but accurate doubles in the code.)
> 
> 
> As for the maths library, "sinf", "cosf", and friends are standard
> functions for single-precision versions of the trigonometric functions.
>   But remember that there is no such thing as the "gcc library" - gcc
> can come with a variety of libraries.  Some of these will have good
> implementations of "sinf" and friends, done using only single-precision
> floating point in order to be fast on a the M4F and similar processors.
>   Others simply convert their argument to double, call "sin", then
> convert the result back to single.  If you need to use standard library
> maths functions, and you need the speed of single precision, then check
> the details for the libraries available in the particular toolchain you
> have.
> 
> Of course, when you need fast trig functions, it is usually best to have
> tables, interpolations, or other approximations that give you the
> required accuracy far faster than any IEEE-compliant standard library.
> 
> 
> 
>> (In C++ one _could_ have a smart math library that would see the type
>> of the argument and call the correct library function -- I don't
>> believe that's done, and I could see it causing all sorts of trouble
>> when things get inadvertently up-cast to double as you maintain the
>> code.)
> 
> If you are going for C++, you can easily create a wrapper around "float"
> that does not have any implicit conversions to double so that you can be
> sure to avoid accidents.
> 
> Or you can use the gcc flags I mentioned above :-)
> 
> 
>> At least on the Cortex M4 part that I used, you needed to turn on the
>> math coprocessor -- it was not on by default.  I'm not sure if this is
>> par for the course, or if it's just an ST thing.  If you don't turn on
>> the math coprocessor then the first time you hit a coprocessor function
>> you'll get a fault and the machine will hang.
>>

Thanks for your comments David -- -Wdouble-promotion was, specifically, 
flying under my radar.  The next time that project is active I'll add it 
in, to the benefit of future-me.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Reply by ●September 28, 20162016-09-28

Tim Wescott <tim@seemywebsite.com> wrote:
> On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:

>> The PIC32MZ EF series parts have hardware double precision;
>> presumably GCC uses this for double?
> I would check.  You may have to direct the compiler to do so.

By default Microchip's XC32 compiler uses 32-bit doubles. There is a 
compiler switch for 64-bit doubles, but there have been several reports 
on their forums about the wrong libraries being linked in or other such 
nonsense.

-a

Previous12 Next

Does GCC use single-precision FP hardware on Cortex-M4 parts?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group