EmbeddedRelated.com
Forums

Cortex M4 Floating Point Size

Started by Tim Wescott July 30, 2013
I am, apparently, incompetent at reading data sheets.

At least when they get up to several hundred pages.

Do Cortex M4 parts deal with 64-bit floating point in hardware, or just 
32-bit?

Thanks...

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

 Tim Wescott  wrote:

>Do Cortex M4 parts deal with 64-bit floating point in hardware, or just >32-bit?
32, I believe. From the Cortex-M4 reference manual ( DDI0439D_cortex_m4_processor_r0p1_trm.pdf "2.1 About the functions Optional Floating Point Unit (FPU) providing: * 32-bit instructions for single-precision (C float) data-processing operations. * Combined Multiply and Accumulate instructions for increased precision (Fused MAC). * Hardware support for conversion, addition, subtraction, multiplication with optional accumulate, division, and square-root. * Hardware support for denormals and all IEEE rounding modes. * 32 dedicated 32-bit single precision registers, also addressable as 16 double-word registers. * Decoupled three stage pipeline." "7.1 - About the FPU The Cortex-M4 FPU is an implementation of the single precision variant of the ARMv7-M Floating-Point Extension (FPv4-SP). It provides floating-point computation functionality that is compliant with the ANSI/IEEE Std 754-2008, IEEE Standard for Binary Floating-Point Arithmetic, referred to as the IEEE 754 standard. The FPU supports all single-precision data-processing instructions and data types described in the ARM�v7-M Architecture Reference Manual" And from infocenter.arm.com: "ARMv7-M Architecture Reference Manual ... This document is only available ... to registered ARM customers." -- Roberto Waltman [ Please reply to the group, return address is invalid ]
On Tue, 30 Jul 2013 17:37:06 -0400, Roberto Waltman wrote:

> Tim Wescott wrote: > >>Do Cortex M4 parts deal with 64-bit floating point in hardware, or just >>32-bit? > > > 32, I believe. > > From the Cortex-M4 reference manual ( > DDI0439D_cortex_m4_processor_r0p1_trm.pdf > > "2.1 About the functions Optional Floating Point Unit (FPU) providing: > * 32-bit instructions for single-precision (C float) data-processing > operations. > * Combined Multiply and Accumulate instructions for increased precision > (Fused MAC). > * Hardware support for conversion, addition, subtraction, multiplication > with optional accumulate, division, and square-root. > * Hardware support for denormals and all IEEE rounding modes. > * 32 dedicated 32-bit single precision registers, also addressable as 16 > double-word registers. > * Decoupled three stage pipeline." > > > "7.1 - About the FPU The Cortex-M4 FPU is an implementation of the > single precision variant of the ARMv7-M Floating-Point Extension > (FPv4-SP). > It provides floating-point computation functionality that is compliant > with the ANSI/IEEE Std 754-2008, IEEE Standard for Binary Floating-Point > Arithmetic, referred to as the IEEE 754 standard. > The FPU supports all single-precision data-processing instructions and > data types described in the ARM®v7-M Architecture Reference Manual" > > > And from infocenter.arm.com: > "ARMv7-M Architecture Reference Manual ... > This document is only available ... to registered ARM customers."
Crud. Thanks. I guess I test my algorithm with 32-bit arithmetic and see how it flies, then. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Tim Wescott <tim@seemywebsite.really> wrote:
> I am, apparently, incompetent at reading data sheets. > > At least when they get up to several hundred pages. > > Do Cortex M4 parts deal with 64-bit floating point in hardware, or just > 32-bit?
In addition to the information Roberto posted, it may be worth keeping in mind that the parts with the FPU are "Cortex-M4F", and the parts without are plain "Cortex-M4". At least some of Freescale's Kinetis parts are of the latter kind. -a
Tim Wescott wrote:
> On Tue, 30 Jul 2013 17:37:06 -0400, Roberto Waltman wrote: > >> Tim Wescott wrote: >> >>> Do Cortex M4 parts deal with 64-bit floating point in hardware, or just >>> 32-bit? >> >> >> 32, I believe. >> >> From the Cortex-M4 reference manual ( >> DDI0439D_cortex_m4_processor_r0p1_trm.pdf >> >> "2.1 About the functions Optional Floating Point Unit (FPU) providing: >> * 32-bit instructions for single-precision (C float) data-processing >> operations. >> * Combined Multiply and Accumulate instructions for increased precision >> (Fused MAC). >> * Hardware support for conversion, addition, subtraction, multiplication >> with optional accumulate, division, and square-root. >> * Hardware support for denormals and all IEEE rounding modes. >> * 32 dedicated 32-bit single precision registers, also addressable as 16 >> double-word registers. >> * Decoupled three stage pipeline." >> >> >> "7.1 - About the FPU The Cortex-M4 FPU is an implementation of the >> single precision variant of the ARMv7-M Floating-Point Extension >> (FPv4-SP). >> It provides floating-point computation functionality that is compliant >> with the ANSI/IEEE Std 754-2008, IEEE Standard for Binary Floating-Point >> Arithmetic, referred to as the IEEE 754 standard. >> The FPU supports all single-precision data-processing instructions and >> data types described in the ARM&reg;v7-M Architecture Reference Manual" >> >> >> And from infocenter.arm.com: >> "ARMv7-M Architecture Reference Manual ... >> This document is only available ... to registered ARM customers." > > Crud. > > Thanks. > > I guess I test my algorithm with 32-bit arithmetic and see how it flies, > then.
Just out of idle curiosity, what kind of an application might require 64 bit floating point?
On Wed, 31 Jul 2013 10:59:39 -0700, Jim Stewart wrote:

> Tim Wescott wrote: >> On Tue, 30 Jul 2013 17:37:06 -0400, Roberto Waltman wrote: >> >>> Tim Wescott wrote: >>> >>>> Do Cortex M4 parts deal with 64-bit floating point in hardware, or >>>> just 32-bit? >>> >>> >>> 32, I believe. >>> >>> From the Cortex-M4 reference manual ( >>> DDI0439D_cortex_m4_processor_r0p1_trm.pdf >>> >>> "2.1 About the functions Optional Floating Point Unit (FPU) providing: >>> * 32-bit instructions for single-precision (C float) data-processing >>> operations. >>> * Combined Multiply and Accumulate instructions for increased >>> precision (Fused MAC). >>> * Hardware support for conversion, addition, subtraction, >>> multiplication with optional accumulate, division, and square-root. >>> * Hardware support for denormals and all IEEE rounding modes. >>> * 32 dedicated 32-bit single precision registers, also addressable as >>> 16 double-word registers. >>> * Decoupled three stage pipeline." >>> >>> >>> "7.1 - About the FPU The Cortex-M4 FPU is an implementation of the >>> single precision variant of the ARMv7-M Floating-Point Extension >>> (FPv4-SP). >>> It provides floating-point computation functionality that is compliant >>> with the ANSI/IEEE Std 754-2008, IEEE Standard for Binary >>> Floating-Point Arithmetic, referred to as the IEEE 754 standard. >>> The FPU supports all single-precision data-processing instructions and >>> data types described in the ARM&reg;v7-M Architecture Reference Manual" >>> >>> >>> And from infocenter.arm.com: >>> "ARMv7-M Architecture Reference Manual ... >>> This document is only available ... to registered ARM customers." >> >> Crud. >> >> Thanks. >> >> I guess I test my algorithm with 32-bit arithmetic and see how it >> flies, >> then. > > Just out of idle curiosity, what kind of an application might require 64 > bit floating point?
Most control loops that need any precision won't work quite right with 32 bit floating point. You need more than the 25 bits worth of mantissa that comes with single-precision floating point (32 bit fixed-point often works quite well, however). If you're just spinning a motor then you can get by, but if you've got a PID loop with 16-bit or better inputs and a high sampling rate to bandwidth ratio, then you need integrators with more than 25 bits worth of precision. In this case it's a Kalman filter application. It may work with 32 bits, but I haven't tested it against the data that I have, and it'll be tight. So either I'll need to rearrange the algorithm (Kalman filters can use a "square root" algorithm that basically halves the required precision in the most sensitive areas, in return for a whole bunch of extra, and extra-weird, math) or re-think my processor choice. Sigh... -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
The single-precision FPU of Cortex-M4F needs to be enabled before it is used (the FPU is disabled out of reset). Typically the FPU is enabled in the startup code, but you need to check to be sure.

Also, the FPU in Cortex-M4F comes with its own register bank, which needs to be saved/restored if the FPU can be used in the ISRs or in tasks of a preemptive RTOS. The need for saving/restoring this context is a huge penalty for using the FPU in such circumstances. To reduce this (unacceptable really) overhead, ARM has introduced the feature called "lazy stacking" described in the ARM App Note: http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf . Lazy stacking of FPU registers is enabled by default.

Miro Samek
state-machine.com/arm
On 01/08/2013 02:35, info@quantum-leaps.com wrote:
> The single-precision FPU of Cortex-M4F needs to be enabled before it is used (the FPU is disabled out of reset). Typically the FPU is enabled in the startup code, but you need to check to be sure. > > Also, the FPU in Cortex-M4F comes with its own register bank, which needs to be saved/restored if the FPU can be used in the ISRs or in tasks of a preemptive RTOS. The need for saving/restoring this context is a huge penalty for using the FPU in such circumstances. To reduce this (unacceptable really) overhead, ARM has introduced the feature called "lazy stacking" described in the ARM App Note: http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf . Lazy stacking of FPU registers is enabled by default. > > Miro Samek > state-machine.com/arm >
We seem to of gone off the topic of the OP, but... [hardware] lazy stacking breaks down when using a true multi-threaded OS, requiring the FPU registers to be saved on a task context switch. The reason being, the lazy stacking algorithm [obviously] cannot be aware of the kernel's radical stack pointer manipulation - it can only be aware of predicable stack pointer increments and decrements. Regards, Richard. + http://www.FreeRTOS.org Designed for microcontrollers. More than 103000 downloads in 2012. + http://www.FreeRTOS.org/plus Trace, safety certification, FAT FS, TCP/IP, training, and more...
info@quantum-leaps.com writes:
> Also, the FPU in Cortex-M4F comes with its own register bank, which > needs to be saved/restored if the FPU can be used in the ISRs or in > tasks of a preemptive RTOS.
In Tim's application, I wonder whether the FPU can be exclusively used by a single task, so nothing else touches the registers. Is that a reasonable approach?
On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

>info@quantum-leaps.com writes: >> Also, the FPU in Cortex-M4F comes with its own register bank, which >> needs to be saved/restored if the FPU can be used in the ISRs or in >> tasks of a preemptive RTOS. > >In Tim's application, I wonder whether the FPU can be exclusively used >by a single task, so nothing else touches the registers. Is that a >reasonable approach?
Floating point instructions in ISRs ? I have never encountered such ISRs. Why not use the same principle for some of the highest priority tasks and only below a certain priority level FP-register save/restore is performed. At the low levels, the full save/restore cost is not that significant, since these tasks typically execute for quite long times at once. Of course, this requires some hooks into the task scheduler, but should not be too hard to implement.