On Sunday, September 25, 2016 at 6:29:40 AM UTC-4, David Brown wrote:
> Yes, you need to tell gcc about the processor you are using:
> 
> -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16
> 
> You may also want to add "-fsingle-precision-constant" which makes gcc 
> treat floating point constants as single precision, so that "x * 2.5" is 
> done as single precision rather than double precision (without having to 
> write "x * 2.5f").
> 
> And you /definitely/ want to have "-Wdouble-promotion" to warn about any 
> implicit promotions to double that might have occurred accidentally. 
> (You can always cast to double, or assign values to double variables, if 
> you really want slow but accurate doubles in the code.)
> 
> 
> As for the maths library, "sinf", "cosf", and friends are standard 
> functions for single-precision versions of the trigonometric functions. 
>   But remember that there is no such thing as the "gcc library" - gcc 
> can come with a variety of libraries.  Some of these will have good 
> implementations of "sinf" and friends, done using only single-precision 
> floating point in order to be fast on a the M4F and similar processors. 
>   Others simply convert their argument to double, call "sin", then 
> convert the result back to single.  If you need to use standard library 
> maths functions, and you need the speed of single precision, then check 
> the details for the libraries available in the particular toolchain you 
> have.
> 
> Of course, when you need fast trig functions, it is usually best to have 
> tables, interpolations, or other approximations that give you the 
> required accuracy far faster than any IEEE-compliant standard library.
> 
> 
> > (In C++ one _could_ have a smart math library that would see the type of
> > the argument and call the correct library function -- I don't believe
> > that's done, and I could see it causing all sorts of trouble when things
> > get inadvertently up-cast to double as you maintain the code.)
> 
> If you are going for C++, you can easily create a wrapper around "float" 
> that does not have any implicit conversions to double so that you can be 
> sure to avoid accidents.
> 
> Or you can use the gcc flags I mentioned above :-)

Thanks David for your (as usual) clear explanation,
Best Regards, Dave

On Wednesday, September 28, 2016 at 8:50:31 AM UTC-4, Anders....@kapsi.spam.stop.fi.invalid wrote:
> Tim Wescott <tim@seemywebsite.com> wrote:
> > On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:
> 
> >> The PIC32MZ EF series parts have hardware double precision;
> >> presumably GCC uses this for double?
> > I would check.  You may have to direct the compiler to do so.
> 
> By default Microchip's XC32 compiler uses 32-bit doubles. There is a 
> compiler switch for 64-bit doubles, but there have been several reports 
> on their forums about the wrong libraries being linked in or other such 
> nonsense.

Yes, this is lovely stuff:
http://www.microchip.com/forums/m940391.aspx

Thanks for the heads-up on -fno-short-double.
I expected this for PIC24, but a bit surprising for PIC32?
For some of our applications, 23 bits of precision doesn't do it...

Thanks Anders,
Best Regards, Dave

Tim Wescott <tim@seemywebsite.com> wrote:
> On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:

>> The PIC32MZ EF series parts have hardware double precision;
>> presumably GCC uses this for double?
> I would check.  You may have to direct the compiler to do so.

By default Microchip's XC32 compiler uses 32-bit doubles. There is a 
compiler switch for 64-bit doubles, but there have been several reports 
on their forums about the wrong libraries being linked in or other such 
nonsense.

-a

On Sun, 25 Sep 2016 12:29:33 +0200, David Brown wrote:

> On 24/09/16 21:56, Tim Wescott wrote:
>> On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:
>>
>>> Hi - Can anybody tell me for what, if anything, does GCC use the
>>> single-precision hardware floating point units on some Cortex-M4
>>> parts?
>>> At all? Single-precision float only? Presumably not double?
>>>
>>> The PIC32MZ EF series parts have hardware double precision;
>>> presumably GCC uses this for double?
>>>
>>> Thanks!
>>> Best Regards, Dave
>>
>> Been there, done that, used correctly it works a charm:
>>
>> GCC uses the single-precision floating point processor if you tell it
>> to.  And yes, it uses it for single-precision float only, so the usual
>> math library stuff (sin, cos, etc.) is not, to my knowledge, supported.
> 
> Yes, you need to tell gcc about the processor you are using:
> 
> -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16
> 
> You may also want to add "-fsingle-precision-constant" which makes gcc
> treat floating point constants as single precision, so that "x * 2.5" is
> done as single precision rather than double precision (without having to
> write "x * 2.5f").
> 
> And you /definitely/ want to have "-Wdouble-promotion" to warn about any
> implicit promotions to double that might have occurred accidentally.
> (You can always cast to double, or assign values to double variables, if
> you really want slow but accurate doubles in the code.)
> 
> 
> As for the maths library, "sinf", "cosf", and friends are standard
> functions for single-precision versions of the trigonometric functions.
>   But remember that there is no such thing as the "gcc library" - gcc
> can come with a variety of libraries.  Some of these will have good
> implementations of "sinf" and friends, done using only single-precision
> floating point in order to be fast on a the M4F and similar processors.
>   Others simply convert their argument to double, call "sin", then
> convert the result back to single.  If you need to use standard library
> maths functions, and you need the speed of single precision, then check
> the details for the libraries available in the particular toolchain you
> have.
> 
> Of course, when you need fast trig functions, it is usually best to have
> tables, interpolations, or other approximations that give you the
> required accuracy far faster than any IEEE-compliant standard library.
> 
> 
> 
>> (In C++ one _could_ have a smart math library that would see the type
>> of the argument and call the correct library function -- I don't
>> believe that's done, and I could see it causing all sorts of trouble
>> when things get inadvertently up-cast to double as you maintain the
>> code.)
> 
> If you are going for C++, you can easily create a wrapper around "float"
> that does not have any implicit conversions to double so that you can be
> sure to avoid accidents.
> 
> Or you can use the gcc flags I mentioned above :-)
> 
> 
>> At least on the Cortex M4 part that I used, you needed to turn on the
>> math coprocessor -- it was not on by default.  I'm not sure if this is
>> par for the course, or if it's just an ST thing.  If you don't turn on
>> the math coprocessor then the first time you hit a coprocessor function
>> you'll get a fault and the machine will hang.
>>

Thanks for your comments David -- -Wdouble-promotion was, specifically, 
flying under my radar.  The next time that project is active I'll add it 
in, to the benefit of future-me.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

On 24/09/16 21:56, Tim Wescott wrote:
> On Sat, 24 Sep 2016 09:19:58 -0700, Dave Nadler wrote:
>
>> Hi - Can anybody tell me for what, if anything, does GCC use the
>> single-precision hardware floating point units on some Cortex-M4 parts?
>> At all? Single-precision float only? Presumably not double?
>>
>> The PIC32MZ EF series parts have hardware double precision;
>> presumably GCC uses this for double?
>>
>> Thanks!
>> Best Regards, Dave
>
> Been there, done that, used correctly it works a charm:
>
> GCC uses the single-precision floating point processor if you tell it
> to.  And yes, it uses it for single-precision float only, so the usual
> math library stuff (sin, cos, etc.) is not, to my knowledge, supported.

Yes, you need to tell gcc about the processor you are using:

-mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16

You may also want to add "-fsingle-precision-constant" which makes gcc 
treat floating point constants as single precision, so that "x * 2.5" is 
done as single precision rather than double precision (without having to 
write "x * 2.5f").

And you /definitely/ want to have "-Wdouble-promotion" to warn about any 
implicit promotions to double that might have occurred accidentally. 
(You can always cast to double, or assign values to double variables, if 
you really want slow but accurate doubles in the code.)

As for the maths library, "sinf", "cosf", and friends are standard 
functions for single-precision versions of the trigonometric functions. 
  But remember that there is no such thing as the "gcc library" - gcc 
can come with a variety of libraries.  Some of these will have good 
implementations of "sinf" and friends, done using only single-precision 
floating point in order to be fast on a the M4F and similar processors. 
  Others simply convert their argument to double, call "sin", then 
convert the result back to single.  If you need to use standard library 
maths functions, and you need the speed of single precision, then check 
the details for the libraries available in the particular toolchain you 
have.

Of course, when you need fast trig functions, it is usually best to have 
tables, interpolations, or other approximations that give you the 
required accuracy far faster than any IEEE-compliant standard library.

>
> (In C++ one _could_ have a smart math library that would see the type of
> the argument and call the correct library function -- I don't believe
> that's done, and I could see it causing all sorts of trouble when things
> get inadvertently up-cast to double as you maintain the code.)

If you are going for C++, you can easily create a wrapper around "float" 
that does not have any implicit conversions to double so that you can be 
sure to avoid accidents.

Or you can use the gcc flags I mentioned above :-)

>
> At least on the Cortex M4 part that I used, you needed to turn on the
> math coprocessor -- it was not on by default.  I'm not sure if this is
> par for the course, or if it's just an ST thing.  If you don't turn on
> the math coprocessor then the first time you hit a coprocessor function
> you'll get a fault and the machine will hang.
>

Dave Nadler <drn@nadler.com> writes:

> Hi - Can anybody tell me for what, if anything, does GCC use the
> single-precision hardware floating point units on some Cortex-M4 parts?
> At all? Single-precision float only? Presumably not double?
>
> The PIC32MZ EF series parts have hardware double precision;
> presumably GCC uses this for double?

FYI the latest M7 processors from ST have double-precision floating
point e.g. STM32F769

<http://www.st.com/content/st_com/en/products/microcontrollers/stm32-32-bit-arm-cortex-mcus/stm32f7-series.html?querycriteria=productId=SS1858>

gcc does have the appropriate flags to choose double precision floating
point ARM hardware (fpv5-d16). How well it makes use of this I do not
know, but ARM do contribute directly to gcc so I would hope it is
there. One could try it and see.

<https://launchpad.net/gcc-arm-embedded>

<https://launchpadlibrarian.net/268329726/readme.txt>

-- 

John Devereux

On 9/24/2016 1:52 PM, Dave Nadler wrote:
> On Saturday, September 24, 2016 at 4:03:22 PM UTC-4, Tim Wescott wrote:
>> On RTOS and floating point: every processor that I've ever worked with
>> that has hardware floating point, and even some floating point libraries
>> that I've used, have required that the RTOS has knowledge of the floating
>> point "stuff".  In these cases, the floating point engine (HW or SW)
>> needs to have it's state saved off independently of the rest of the
>> processor and/or software.  This means that task switches will be slower
>> and the task control blocks bigger -- so you have to take that into
>> account when you're choosing to use floating point.  Note that most new
>> software floating point libraries _don't_ require special attention from
>> the RTOS -- but you never know.  Caveat emptor.  Speak softly, but carry
>> a big stick.  'ware dragons.  Etc.
>
> Yup. I wrote an RTOS where the cost of saving/restoring the FP emulation
> status was too high, so limited FP to only one task. When other tasks are
> active the trap vectors used to implement FP are redirected to a fault,
> so accidents don't get past development.

You can similarly "hook" the individual entry points to a floating
point emulator/associated "helper routines" (common in supporting exnhanced
data types in legacy compilers) to conditionally save/restore the floating
point context "as needed".  I.e., the FPE takes the form of a "monitor".

[I wrote an old Z80 MTOS that does this -- allowing any task to use
floats/doubles without preconfiguring the task to have such access;
prior to this, I used to set a flag in each task "at development
time" indicating whether or not it was allowed to use floats so
a float task scheduled after another float task incurred the additional
context switch costs but non-float tasks did not]

> Cortex M4F can cause FP to fault except in a specific context to implement
> the above one-task-only, and can also do 'lazy stacking' where
> FP-context-switch cost can is incurred only when required.
> I need to check to see if FreeRTOS supports these two options out of the box...

With foreknowledge of a particular compiler's handling of floats
and the nature of a floating point *emulation*, you can be much more
fine-grained in supporting only the necessary *portions* of the
FP state that need to be preserved (instead of unilaterally storing
and restoring the *entire* state).

In some cases, you can simply swap a "FP_context" pointer and let the
FP state reside in task-specific memory.  (i.e., the FPE becomes reentrant)

Of course, this weds you to a particular implementation.  But, for targets
without FP hardware, it allows reasonably high performance without
inconveniencing the developer:  "Grrr, can't use foo() here cuz it
MIGHT invoke a floating point operation..."

You can also implement floating point operations as a fully shareable
*service* (at a higher cost); I use that for my BigRational package,
currently.

On Saturday, September 24, 2016 at 4:03:22 PM UTC-4, Tim Wescott wrote:
> On RTOS and floating point: every processor that I've ever worked with 
> that has hardware floating point, and even some floating point libraries 
> that I've used, have required that the RTOS has knowledge of the floating 
> point "stuff".  In these cases, the floating point engine (HW or SW) 
> needs to have it's state saved off independently of the rest of the 
> processor and/or software.  This means that task switches will be slower 
> and the task control blocks bigger -- so you have to take that into 
> account when you're choosing to use floating point.  Note that most new 
> software floating point libraries _don't_ require special attention from 
> the RTOS -- but you never know.  Caveat emptor.  Speak softly, but carry 
> a big stick.  'ware dragons.  Etc.

Yup. I wrote an RTOS where the cost of saving/restoring the FP emulation
status was too high, so limited FP to only one task. When other tasks are
active the trap vectors used to implement FP are redirected to a fault,
so accidents don't get past development. 

Cortex M4F can cause FP to fault except in a specific context to implement
the above one-task-only, and can also do 'lazy stacking' where
FP-context-switch cost can is incurred only when required.
I need to check to see if FreeRTOS supports these two options out of the box...
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwjtsNTI7qjPAhVp3IMKHT0zAAIQFggrMAI&url=http%3A%2F%2Finfocenter.arm.com%2Fhelp%2Ftopic%2Fcom.arm.doc.dai0298a%2FDAFCBJJB.html&usg=AFQjCNHzPIVU-gxR7E1IB8S4tREfC0hycw&sig2=274orehfxdcufniHB0kO8w&bvm=bv.133700528,d.dmo

Thanks Tim,
Best Regards, Dave

On 9/24/2016 1:03 PM, Tim Wescott wrote:
> On RTOS and floating point: every processor that I've ever worked with
> that has hardware floating point, and even some floating point libraries
> that I've used, have required that the RTOS has knowledge of the floating
> point "stuff".  In these cases, the floating point engine (HW or SW)
> needs to have it's state saved off independently of the rest of the
> processor and/or software.  This means that task switches will be slower
> and the task control blocks bigger -- so you have to take that into
> account when you're choosing to use floating point.

The typical approach is to handle the floating point unit independently
of the normal task state.  On a task switch, enable the FPU trap.  When/*if*
the new task attempts to run a floating point operation, the trap saves the
FP state in a "FP control block" (i.e., only tasks that use the FPU need
to bear the cost of that state plus the costs of storing/restoring it).

This can lead to some pathological behaviors but no worse than storing and
restoring for every task switch (or, for tasks identified, a priori at
config time, as using the FPU).

This approach is particularly useful for providing a framework to support
*any* resource that can "co-execute" alongside the main CPU (i.e., shared
resources with asynchronous interfaces).  E.g., in cases where the
FPU is an emulation library with potentially GREATER state than a genuine
FPU.

> Note that most new
> software floating point libraries _don't_ require special attention from
> the RTOS -- but you never know.  Caveat emptor.  Speak softly, but carry
> a big stick.  'ware dragons.  Etc.

Think: errno.