Cortex M4 Floating Point Size

I am, apparently, incompetent at reading data sheets.

At least when they get up to several hundred pages.

Do Cortex M4 parts deal with 64-bit floating point in hardware, or just 
32-bit?

Thanks...

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Roberto Waltman ●July 30, 20132013-07-30

 Tim Wescott  wrote:

>Do Cortex M4 parts deal with 64-bit floating point in hardware, or just 
>32-bit?

32, I believe.

From the Cortex-M4 reference manual
( DDI0439D_cortex_m4_processor_r0p1_trm.pdf 

"2.1 About the functions
Optional Floating Point Unit (FPU) providing:
* 32-bit instructions for single-precision (C float) data-processing
operations.
* Combined Multiply and Accumulate instructions for increased
precision (Fused MAC).
* Hardware support for conversion, addition, subtraction,
multiplication with optional accumulate, division, and square-root.
* Hardware support for denormals and all IEEE rounding modes.
* 32 dedicated 32-bit single precision registers, also addressable as
16 double-word registers.
* Decoupled three stage pipeline."

"7.1 - About the FPU
The Cortex-M4 FPU is an implementation of the single precision variant
of the ARMv7-M Floating-Point Extension (FPv4-SP).
It provides floating-point computation functionality that is compliant
with the ANSI/IEEE Std 754-2008, IEEE Standard for Binary
Floating-Point Arithmetic, referred to as the IEEE 754 standard.
The FPU supports all single-precision data-processing instructions and
data types described in the ARM&#4294967295;v7-M Architecture Reference Manual"

And from infocenter.arm.com: 
"ARMv7-M Architecture Reference Manual
...
This document is only available ... to registered ARM customers."
--
Roberto Waltman

[ Please reply to the group,
  return address is invalid ]

Reply by Tim Wescott ●July 30, 20132013-07-30

On Tue, 30 Jul 2013 17:37:06 -0400, Roberto Waltman wrote:

> Tim Wescott  wrote:
> 
>>Do Cortex M4 parts deal with 64-bit floating point in hardware, or just
>>32-bit?
> 
> 
> 32, I believe.
> 
> From the Cortex-M4 reference manual (
> DDI0439D_cortex_m4_processor_r0p1_trm.pdf
> 
> "2.1 About the functions Optional Floating Point Unit (FPU) providing:
> * 32-bit instructions for single-precision (C float) data-processing
> operations.
> * Combined Multiply and Accumulate instructions for increased precision
> (Fused MAC).
> * Hardware support for conversion, addition, subtraction, multiplication
> with optional accumulate, division, and square-root.
> * Hardware support for denormals and all IEEE rounding modes.
> * 32 dedicated 32-bit single precision registers, also addressable as 16
> double-word registers.
> * Decoupled three stage pipeline."
> 
> 
> "7.1 - About the FPU The Cortex-M4 FPU is an implementation of the
> single precision variant of the ARMv7-M Floating-Point Extension
> (FPv4-SP).
> It provides floating-point computation functionality that is compliant
> with the ANSI/IEEE Std 754-2008, IEEE Standard for Binary Floating-Point
> Arithmetic, referred to as the IEEE 754 standard.
> The FPU supports all single-precision data-processing instructions and
> data types described in the ARM&reg;v7-M Architecture Reference Manual"
> 
> 
> And from infocenter.arm.com:
> "ARMv7-M Architecture Reference Manual ...
> This document is only available ... to registered ARM customers."

Crud.

Thanks.

I guess I test my algorithm with 32-bit arithmetic and see how it flies, 
then.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by ●July 30, 20132013-07-30

Tim Wescott <tim@seemywebsite.really> wrote:
> I am, apparently, incompetent at reading data sheets.
> 
> At least when they get up to several hundred pages.
> 
> Do Cortex M4 parts deal with 64-bit floating point in hardware, or just 
> 32-bit?

In addition to the information Roberto posted, it may be worth keeping 
in mind that the parts with the FPU are "Cortex-M4F", and the parts 
without are plain "Cortex-M4". At least some of Freescale's Kinetis 
parts are of the latter kind.

-a

Reply by Jim Stewart ●July 31, 20132013-07-31

Tim Wescott wrote:
> On Tue, 30 Jul 2013 17:37:06 -0400, Roberto Waltman wrote:
>
>> Tim Wescott  wrote:
>>
>>> Do Cortex M4 parts deal with 64-bit floating point in hardware, or just
>>> 32-bit?
>>
>>
>> 32, I believe.
>>
>>  From the Cortex-M4 reference manual (
>> DDI0439D_cortex_m4_processor_r0p1_trm.pdf
>>
>> "2.1 About the functions Optional Floating Point Unit (FPU) providing:
>> * 32-bit instructions for single-precision (C float) data-processing
>> operations.
>> * Combined Multiply and Accumulate instructions for increased precision
>> (Fused MAC).
>> * Hardware support for conversion, addition, subtraction, multiplication
>> with optional accumulate, division, and square-root.
>> * Hardware support for denormals and all IEEE rounding modes.
>> * 32 dedicated 32-bit single precision registers, also addressable as 16
>> double-word registers.
>> * Decoupled three stage pipeline."
>>
>>
>> "7.1 - About the FPU The Cortex-M4 FPU is an implementation of the
>> single precision variant of the ARMv7-M Floating-Point Extension
>> (FPv4-SP).
>> It provides floating-point computation functionality that is compliant
>> with the ANSI/IEEE Std 754-2008, IEEE Standard for Binary Floating-Point
>> Arithmetic, referred to as the IEEE 754 standard.
>> The FPU supports all single-precision data-processing instructions and
>> data types described in the ARM&reg;v7-M Architecture Reference Manual"
>>
>>
>> And from infocenter.arm.com:
>> "ARMv7-M Architecture Reference Manual ...
>> This document is only available ... to registered ARM customers."
>
> Crud.
>
> Thanks.
>
> I guess I test my algorithm with 32-bit arithmetic and see how it flies,
> then.

Just out of idle curiosity, what kind of an
application might require 64 bit floating point?

Reply by Tim Wescott ●July 31, 20132013-07-31

On Wed, 31 Jul 2013 10:59:39 -0700, Jim Stewart wrote:

> Tim Wescott wrote:
>> On Tue, 30 Jul 2013 17:37:06 -0400, Roberto Waltman wrote:
>>
>>> Tim Wescott  wrote:
>>>
>>>> Do Cortex M4 parts deal with 64-bit floating point in hardware, or
>>>> just 32-bit?
>>>
>>>
>>> 32, I believe.
>>>
>>>  From the Cortex-M4 reference manual (
>>> DDI0439D_cortex_m4_processor_r0p1_trm.pdf
>>>
>>> "2.1 About the functions Optional Floating Point Unit (FPU) providing:
>>> * 32-bit instructions for single-precision (C float) data-processing
>>> operations.
>>> * Combined Multiply and Accumulate instructions for increased
>>> precision (Fused MAC).
>>> * Hardware support for conversion, addition, subtraction,
>>> multiplication with optional accumulate, division, and square-root.
>>> * Hardware support for denormals and all IEEE rounding modes.
>>> * 32 dedicated 32-bit single precision registers, also addressable as
>>> 16 double-word registers.
>>> * Decoupled three stage pipeline."
>>>
>>>
>>> "7.1 - About the FPU The Cortex-M4 FPU is an implementation of the
>>> single precision variant of the ARMv7-M Floating-Point Extension
>>> (FPv4-SP).
>>> It provides floating-point computation functionality that is compliant
>>> with the ANSI/IEEE Std 754-2008, IEEE Standard for Binary
>>> Floating-Point Arithmetic, referred to as the IEEE 754 standard.
>>> The FPU supports all single-precision data-processing instructions and
>>> data types described in the ARM&reg;v7-M Architecture Reference Manual"
>>>
>>>
>>> And from infocenter.arm.com:
>>> "ARMv7-M Architecture Reference Manual ...
>>> This document is only available ... to registered ARM customers."
>>
>> Crud.
>>
>> Thanks.
>>
>> I guess I test my algorithm with 32-bit arithmetic and see how it
>> flies,
>> then.
> 
> Just out of idle curiosity, what kind of an application might require 64
> bit floating point?

Most control loops that need any precision won't work quite right with 32 
bit floating point.  You need more than the 25 bits worth of mantissa 
that comes with single-precision floating point (32 bit fixed-point often 
works quite well, however).  If you're just spinning a motor then you can 
get by, but if you've got a PID loop with 16-bit or better inputs and a 
high sampling rate to bandwidth ratio, then you need integrators with 
more than 25 bits worth of precision.

In this case it's a Kalman filter application.  It may work with 32 bits, 
but I haven't tested it against the data that I have, and it'll be 
tight.  So either I'll need to rearrange the algorithm (Kalman filters 
can use a "square root" algorithm that basically halves the required 
precision in the most sensitive areas, in return for a whole bunch of 
extra, and extra-weird, math) or re-think my processor choice.

Sigh...

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by ●July 31, 20132013-07-31

The single-precision FPU of Cortex-M4F needs to be enabled before it is used (the FPU is disabled out of reset). Typically the FPU is enabled in the startup code, but you need to check to be sure.

Also, the FPU in Cortex-M4F comes with its own register bank, which needs to be saved/restored if the FPU can be used in the ISRs or in tasks of a preemptive RTOS. The need for saving/restoring this context is a huge penalty for using the FPU in such circumstances. To reduce this (unacceptable really) overhead, ARM has introduced the feature called "lazy stacking" described in the ARM App Note: http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf . Lazy stacking of FPU registers is enabled by default.

Miro Samek
state-machine.com/arm

Reply by FreeRTOS info ●August 1, 20132013-08-01

On 01/08/2013 02:35, info@quantum-leaps.com wrote:
> The single-precision FPU of Cortex-M4F needs to be enabled before it is used (the FPU is disabled out of reset). Typically the FPU is enabled in the startup code, but you need to check to be sure.
> 
> Also, the FPU in Cortex-M4F comes with its own register bank, which needs to be saved/restored if the FPU can be used in the ISRs or in tasks of a preemptive RTOS. The need for saving/restoring this context is a huge penalty for using the FPU in such circumstances. To reduce this (unacceptable really) overhead, ARM has introduced the feature called "lazy stacking" described in the ARM App Note: http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf . Lazy stacking of FPU registers is enabled by default.
> 
> Miro Samek
> state-machine.com/arm
> 

We seem to of gone off the topic of the OP, but...


[hardware] lazy stacking breaks down when using a true multi-threaded
OS, requiring the FPU registers to be saved on a task context switch.
The reason being, the lazy stacking algorithm [obviously] cannot be
aware of the kernel's radical stack pointer manipulation - it can only
be aware of predicable stack pointer increments and decrements.

Regards,
Richard.

+ http://www.FreeRTOS.org
Designed for microcontrollers. More than 103000 downloads in 2012.

+ http://www.FreeRTOS.org/plus
Trace, safety certification, FAT FS, TCP/IP, training, and more...

Reply by Paul Rubin ●August 2, 20132013-08-02

info@quantum-leaps.com writes:
> Also, the FPU in Cortex-M4F comes with its own register bank, which
> needs to be saved/restored if the FPU can be used in the ISRs or in
> tasks of a preemptive RTOS. 

In Tim's application, I wonder whether the FPU can be exclusively used
by a single task, so nothing else touches the registers.  Is that a
reasonable approach?

Reply by ●August 2, 20132013-08-02

On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

>info@quantum-leaps.com writes:
>> Also, the FPU in Cortex-M4F comes with its own register bank, which
>> needs to be saved/restored if the FPU can be used in the ISRs or in
>> tasks of a preemptive RTOS. 
>
>In Tim's application, I wonder whether the FPU can be exclusively used
>by a single task, so nothing else touches the registers.  Is that a
>reasonable approach?

Floating point instructions in ISRs ? I have never encountered such
ISRs.

Why not use the same principle for some of the highest priority tasks
and only below a certain priority level FP-register save/restore is
performed. At the low levels, the full save/restore cost is not that
significant, since these tasks typically execute for quite long times
at once. Of course, this requires some hooks into the task scheduler,
but should not be too hard to implement.

Previous12 3 Next

Cortex M4 Floating Point Size

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group