EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Does GCC use single-precision FP hardware on Cortex-M4 parts?

Started by Dave Nadler September 24, 2016
Hi - Can anybody tell me for what, if anything, does GCC use the
single-precision hardware floating point units on some Cortex-M4 parts?
At all? Single-precision float only? Presumably not double?

The PIC32MZ EF series parts have hardware double precision;
presumably GCC uses this for double?

Thanks!
Best Regards, Dave
On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:

> Hi - Can anybody tell me for what, if anything, does GCC use the > single-precision hardware floating point units on some Cortex-M4 parts? > At all? Single-precision float only? Presumably not double? > > The PIC32MZ EF series parts have hardware double precision; > presumably GCC uses this for double? > > Thanks! > Best Regards, Dave
Been there, done that, used correctly it works a charm: GCC uses the single-precision floating point processor if you tell it to. And yes, it uses it for single-precision float only, so the usual math library stuff (sin, cos, etc.) is not, to my knowledge, supported. (In C++ one _could_ have a smart math library that would see the type of the argument and call the correct library function -- I don't believe that's done, and I could see it causing all sorts of trouble when things get inadvertently up-cast to double as you maintain the code.) At least on the Cortex M4 part that I used, you needed to turn on the math coprocessor -- it was not on by default. I'm not sure if this is par for the course, or if it's just an ST thing. If you don't turn on the math coprocessor then the first time you hit a coprocessor function you'll get a fault and the machine will hang. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:

< snip >

> The PIC32MZ EF series parts have hardware double precision; > presumably GCC uses this for double?
I would check. You may have to direct the compiler to do so. Check also to see if you need to do any special initialization on the chip to make the hardware double precision work. If you're using an RTOS, make sure that either there's no special care and feeding needed of the coprocessor, or that the RTOS does it, or that you only have one magic (and VERY well commented) task that's ever allowed to do floating point operations. On RTOS and floating point: every processor that I've ever worked with that has hardware floating point, and even some floating point libraries that I've used, have required that the RTOS has knowledge of the floating point "stuff". In these cases, the floating point engine (HW or SW) needs to have it's state saved off independently of the rest of the processor and/or software. This means that task switches will be slower and the task control blocks bigger -- so you have to take that into account when you're choosing to use floating point. Note that most new software floating point libraries _don't_ require special attention from the RTOS -- but you never know. Caveat emptor. Speak softly, but carry a big stick. 'ware dragons. Etc. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On 9/24/2016 1:03 PM, Tim Wescott wrote:
> On RTOS and floating point: every processor that I've ever worked with > that has hardware floating point, and even some floating point libraries > that I've used, have required that the RTOS has knowledge of the floating > point "stuff". In these cases, the floating point engine (HW or SW) > needs to have it's state saved off independently of the rest of the > processor and/or software. This means that task switches will be slower > and the task control blocks bigger -- so you have to take that into > account when you're choosing to use floating point.
The typical approach is to handle the floating point unit independently of the normal task state. On a task switch, enable the FPU trap. When/*if* the new task attempts to run a floating point operation, the trap saves the FP state in a "FP control block" (i.e., only tasks that use the FPU need to bear the cost of that state plus the costs of storing/restoring it). This can lead to some pathological behaviors but no worse than storing and restoring for every task switch (or, for tasks identified, a priori at config time, as using the FPU). This approach is particularly useful for providing a framework to support *any* resource that can "co-execute" alongside the main CPU (i.e., shared resources with asynchronous interfaces). E.g., in cases where the FPU is an emulation library with potentially GREATER state than a genuine FPU.
> Note that most new > software floating point libraries _don't_ require special attention from > the RTOS -- but you never know. Caveat emptor. Speak softly, but carry > a big stick. 'ware dragons. Etc.
Think: errno.
On Saturday, September 24, 2016 at 4:03:22 PM UTC-4, Tim Wescott wrote:
> On RTOS and floating point: every processor that I've ever worked with > that has hardware floating point, and even some floating point libraries > that I've used, have required that the RTOS has knowledge of the floating > point "stuff". In these cases, the floating point engine (HW or SW) > needs to have it's state saved off independently of the rest of the > processor and/or software. This means that task switches will be slower > and the task control blocks bigger -- so you have to take that into > account when you're choosing to use floating point. Note that most new > software floating point libraries _don't_ require special attention from > the RTOS -- but you never know. Caveat emptor. Speak softly, but carry > a big stick. 'ware dragons. Etc.
Yup. I wrote an RTOS where the cost of saving/restoring the FP emulation status was too high, so limited FP to only one task. When other tasks are active the trap vectors used to implement FP are redirected to a fault, so accidents don't get past development. Cortex M4F can cause FP to fault except in a specific context to implement the above one-task-only, and can also do 'lazy stacking' where FP-context-switch cost can is incurred only when required. I need to check to see if FreeRTOS supports these two options out of the box... https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwjtsNTI7qjPAhVp3IMKHT0zAAIQFggrMAI&url=http%3A%2F%2Finfocenter.arm.com%2Fhelp%2Ftopic%2Fcom.arm.doc.dai0298a%2FDAFCBJJB.html&usg=AFQjCNHzPIVU-gxR7E1IB8S4tREfC0hycw&sig2=274orehfxdcufniHB0kO8w&bvm=bv.133700528,d.dmo Thanks Tim, Best Regards, Dave
On 9/24/2016 1:52 PM, Dave Nadler wrote:
> On Saturday, September 24, 2016 at 4:03:22 PM UTC-4, Tim Wescott wrote: >> On RTOS and floating point: every processor that I've ever worked with >> that has hardware floating point, and even some floating point libraries >> that I've used, have required that the RTOS has knowledge of the floating >> point "stuff". In these cases, the floating point engine (HW or SW) >> needs to have it's state saved off independently of the rest of the >> processor and/or software. This means that task switches will be slower >> and the task control blocks bigger -- so you have to take that into >> account when you're choosing to use floating point. Note that most new >> software floating point libraries _don't_ require special attention from >> the RTOS -- but you never know. Caveat emptor. Speak softly, but carry >> a big stick. 'ware dragons. Etc. > > Yup. I wrote an RTOS where the cost of saving/restoring the FP emulation > status was too high, so limited FP to only one task. When other tasks are > active the trap vectors used to implement FP are redirected to a fault, > so accidents don't get past development.
You can similarly "hook" the individual entry points to a floating point emulator/associated "helper routines" (common in supporting exnhanced data types in legacy compilers) to conditionally save/restore the floating point context "as needed". I.e., the FPE takes the form of a "monitor". [I wrote an old Z80 MTOS that does this -- allowing any task to use floats/doubles without preconfiguring the task to have such access; prior to this, I used to set a flag in each task "at development time" indicating whether or not it was allowed to use floats so a float task scheduled after another float task incurred the additional context switch costs but non-float tasks did not]
> Cortex M4F can cause FP to fault except in a specific context to implement > the above one-task-only, and can also do 'lazy stacking' where > FP-context-switch cost can is incurred only when required. > I need to check to see if FreeRTOS supports these two options out of the box...
With foreknowledge of a particular compiler's handling of floats and the nature of a floating point *emulation*, you can be much more fine-grained in supporting only the necessary *portions* of the FP state that need to be preserved (instead of unilaterally storing and restoring the *entire* state). In some cases, you can simply swap a "FP_context" pointer and let the FP state reside in task-specific memory. (i.e., the FPE becomes reentrant) Of course, this weds you to a particular implementation. But, for targets without FP hardware, it allows reasonably high performance without inconveniencing the developer: "Grrr, can't use foo() here cuz it MIGHT invoke a floating point operation..." You can also implement floating point operations as a fully shareable *service* (at a higher cost); I use that for my BigRational package, currently.
Dave Nadler <drn@nadler.com> writes:

> Hi - Can anybody tell me for what, if anything, does GCC use the > single-precision hardware floating point units on some Cortex-M4 parts? > At all? Single-precision float only? Presumably not double? > > The PIC32MZ EF series parts have hardware double precision; > presumably GCC uses this for double?
FYI the latest M7 processors from ST have double-precision floating point e.g. STM32F769 <http://www.st.com/content/st_com/en/products/microcontrollers/stm32-32-bit-arm-cortex-mcus/stm32f7-series.html?querycriteria=productId=SS1858> gcc does have the appropriate flags to choose double precision floating point ARM hardware (fpv5-d16). How well it makes use of this I do not know, but ARM do contribute directly to gcc so I would hope it is there. One could try it and see. <https://launchpad.net/gcc-arm-embedded> <https://launchpadlibrarian.net/268329726/readme.txt> -- John Devereux
On 24/09/16 21:56, Tim Wescott wrote:
> On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote: > >> Hi - Can anybody tell me for what, if anything, does GCC use the >> single-precision hardware floating point units on some Cortex-M4 parts? >> At all? Single-precision float only? Presumably not double? >> >> The PIC32MZ EF series parts have hardware double precision; >> presumably GCC uses this for double? >> >> Thanks! >> Best Regards, Dave > > Been there, done that, used correctly it works a charm: > > GCC uses the single-precision floating point processor if you tell it > to. And yes, it uses it for single-precision float only, so the usual > math library stuff (sin, cos, etc.) is not, to my knowledge, supported.
Yes, you need to tell gcc about the processor you are using: -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 You may also want to add "-fsingle-precision-constant" which makes gcc treat floating point constants as single precision, so that "x * 2.5" is done as single precision rather than double precision (without having to write "x * 2.5f"). And you /definitely/ want to have "-Wdouble-promotion" to warn about any implicit promotions to double that might have occurred accidentally. (You can always cast to double, or assign values to double variables, if you really want slow but accurate doubles in the code.) As for the maths library, "sinf", "cosf", and friends are standard functions for single-precision versions of the trigonometric functions. But remember that there is no such thing as the "gcc library" - gcc can come with a variety of libraries. Some of these will have good implementations of "sinf" and friends, done using only single-precision floating point in order to be fast on a the M4F and similar processors. Others simply convert their argument to double, call "sin", then convert the result back to single. If you need to use standard library maths functions, and you need the speed of single precision, then check the details for the libraries available in the particular toolchain you have. Of course, when you need fast trig functions, it is usually best to have tables, interpolations, or other approximations that give you the required accuracy far faster than any IEEE-compliant standard library.
> > (In C++ one _could_ have a smart math library that would see the type of > the argument and call the correct library function -- I don't believe > that's done, and I could see it causing all sorts of trouble when things > get inadvertently up-cast to double as you maintain the code.)
If you are going for C++, you can easily create a wrapper around "float" that does not have any implicit conversions to double so that you can be sure to avoid accidents. Or you can use the gcc flags I mentioned above :-)
> > At least on the Cortex M4 part that I used, you needed to turn on the > math coprocessor -- it was not on by default. I'm not sure if this is > par for the course, or if it's just an ST thing. If you don't turn on > the math coprocessor then the first time you hit a coprocessor function > you'll get a fault and the machine will hang. >
On Sun, 25 Sep 2016 12:29:33 +0200, David Brown wrote:

> On 24/09/16 21:56, Tim Wescott wrote: >> On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote: >> >>> Hi - Can anybody tell me for what, if anything, does GCC use the >>> single-precision hardware floating point units on some Cortex-M4 >>> parts? >>> At all? Single-precision float only? Presumably not double? >>> >>> The PIC32MZ EF series parts have hardware double precision; >>> presumably GCC uses this for double? >>> >>> Thanks! >>> Best Regards, Dave >> >> Been there, done that, used correctly it works a charm: >> >> GCC uses the single-precision floating point processor if you tell it >> to. And yes, it uses it for single-precision float only, so the usual >> math library stuff (sin, cos, etc.) is not, to my knowledge, supported. > > Yes, you need to tell gcc about the processor you are using: > > -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 > > You may also want to add "-fsingle-precision-constant" which makes gcc > treat floating point constants as single precision, so that "x * 2.5" is > done as single precision rather than double precision (without having to > write "x * 2.5f"). > > And you /definitely/ want to have "-Wdouble-promotion" to warn about any > implicit promotions to double that might have occurred accidentally. > (You can always cast to double, or assign values to double variables, if > you really want slow but accurate doubles in the code.) > > > As for the maths library, "sinf", "cosf", and friends are standard > functions for single-precision versions of the trigonometric functions. > But remember that there is no such thing as the "gcc library" - gcc > can come with a variety of libraries. Some of these will have good > implementations of "sinf" and friends, done using only single-precision > floating point in order to be fast on a the M4F and similar processors. > Others simply convert their argument to double, call "sin", then > convert the result back to single. If you need to use standard library > maths functions, and you need the speed of single precision, then check > the details for the libraries available in the particular toolchain you > have. > > Of course, when you need fast trig functions, it is usually best to have > tables, interpolations, or other approximations that give you the > required accuracy far faster than any IEEE-compliant standard library. > > > >> (In C++ one _could_ have a smart math library that would see the type >> of the argument and call the correct library function -- I don't >> believe that's done, and I could see it causing all sorts of trouble >> when things get inadvertently up-cast to double as you maintain the >> code.) > > If you are going for C++, you can easily create a wrapper around "float" > that does not have any implicit conversions to double so that you can be > sure to avoid accidents. > > Or you can use the gcc flags I mentioned above :-) > > >> At least on the Cortex M4 part that I used, you needed to turn on the >> math coprocessor -- it was not on by default. I'm not sure if this is >> par for the course, or if it's just an ST thing. If you don't turn on >> the math coprocessor then the first time you hit a coprocessor function >> you'll get a fault and the machine will hang. >>
Thanks for your comments David -- -Wdouble-promotion was, specifically, flying under my radar. The next time that project is active I'll add it in, to the benefit of future-me. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Tim Wescott <tim@seemywebsite.com> wrote:
> On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:
>> The PIC32MZ EF series parts have hardware double precision; >> presumably GCC uses this for double? > I would check. You may have to direct the compiler to do so.
By default Microchip's XC32 compiler uses 32-bit doubles. There is a compiler switch for 64-bit doubles, but there have been several reports on their forums about the wrong libraries being linked in or other such nonsense. -a

The 2024 Embedded Online Conference