Cortex M4 Floating Point Size| page 2

Reply by Paul Rubin ●August 2, 20132013-08-02

upsidedown@downunder.com writes:
> Floating point instructions in ISRs ? I have never encountered such
> ISRs.

Well I've heard of applications whose main loop consisted of a halt
instruction repeated endlessly.  All the functionality happened at
interrupt level.  No idea if they used floating point. :)

Reply by FreeRTOS info ●August 2, 20132013-08-02

On 02/08/2013 05:38, upsidedown@downunder.com wrote:
> On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin
> <no.email@nospam.invalid> wrote:
> 
>> info@quantum-leaps.com writes:
>>> Also, the FPU in Cortex-M4F comes with its own register bank, which
>>> needs to be saved/restored if the FPU can be used in the ISRs or in
>>> tasks of a preemptive RTOS. 
>>
>> In Tim's application, I wonder whether the FPU can be exclusively used
>> by a single task, so nothing else touches the registers.  Is that a
>> reasonable approach?
> 
> Floating point instructions in ISRs ? I have never encountered such
> ISRs.
> 
> Why not use the same principle for some of the highest priority tasks
> and only below a certain priority level FP-register save/restore is
> performed. At the low levels, the full save/restore cost is not that
> significant, since these tasks typically execute for quite long times
> at once. Of course, this requires some hooks into the task scheduler,
> but should not be too hard to implement.
> 

Where in this thread does it say that the OP is using multitasking or a
task scheduler?

If multithreading is not being used then the Cortex-M4F will handle
everything for you by only saving the floating point registers when it
is absolutely necessary (the save being triggered by a floating point
instruction being executed - if you turn this functionality on).

If multithreading is being used then there are several different ways of
doing it...the best of which can only be determined when you know how
the application is using the FPU (from how many tasks, how often, etc.).

However, as per my previous post, I think this is quite off topic from a
question of "is it 32-bits or 64-bits" so probably not a helpful
discussion to the OP.

Regards,
Richard.

+ http://www.FreeRTOS.org
Designed for microcontrollers. More than 103000 downloads in 2012.

+ http://www.FreeRTOS.org/plus
Trace, safety certification, FAT FS, TCP/IP, training, and more...

Reply by hamilton ●August 2, 20132013-08-02

On 8/1/2013 10:38 PM, upsidedown@downunder.com wrote:
> On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin
> <no.email@nospam.invalid> wrote:
>
>> info@quantum-leaps.com writes:
>>> Also, the FPU in Cortex-M4F comes with its own register bank, which
>>> needs to be saved/restored if the FPU can be used in the ISRs or in
>>> tasks of a preemptive RTOS.
>>
>> In Tim's application, I wonder whether the FPU can be exclusively used
>> by a single task, so nothing else touches the registers.  Is that a
>> reasonable approach?
>
> Floating point instructions in ISRs ? I have never encountered such
> ISRs.

I did that years ago (1985) on the i286 w/floating point co-processor 
(i287).

3-Axis vertical mill, at each 8 mSec interrupt a new position of one of 
the axis was run.

A simple mutex handled the FPU.

There was no RTOS involved, just a simple round robin of each axis.
All code was written with Turbo C.

Also did the same with a Z80 and an AM9511a co-processor before that.
This one used Microsoft BASIC.

hamilton

Reply by Tim Wescott ●August 2, 20132013-08-02

On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin wrote:

> info@quantum-leaps.com writes:
>> Also, the FPU in Cortex-M4F comes with its own register bank, which
>> needs to be saved/restored if the FPU can be used in the ISRs or in
>> tasks of a preemptive RTOS.
> 
> In Tim's application, I wonder whether the FPU can be exclusively used
> by a single task, so nothing else touches the registers.  Is that a
> reasonable approach?

It would.  I've thought of that.  At the moment the whole application is 
small enough that I'm planning on using a home-rolled cooperative 
multitasker that dodges the whole context-switch thing at the expense of 
weighing down the developer with the need to chop low-priority 
computations up into bits that are small enough that they don't bog down 
important tasks.  So the whole "can't RTOS" thing is moot for me at the 
moment.

As far as the "only one task gets the math processor", I've actually 
already been there, done that (sorta), with the ADSP 2101 using an RTOS.  
The ADSP 2101 has some hardware context associated with its DSP 
functionality that is simply not accessible via software (except by 
"push" and "pop" into very shallow hardware stacks).  It's not even a 
matter of "slow" -- it's "you can't, sucker".  So if you want to use its 
DSP features in an RTOS you're limited to doing it in one task.  (Well, 
one task and one ISR, thanks to those shallow stacks).

All the "regular processor" stuff can be context-switched just fine, 
however.  So we used the thing exactly that way: we had one task for the 
heavy lifting (running a spinning-wheel gyroscope that had to be in 
closed loop) with a bunch of tasks to make it play nice with the balance 
of the system.  That one magic control task was the _only_ task that got 
its fingers onto the MAC and associated instructions; everything else was 
kept away.

The board, by the way, worked great.

It would be harder to do this with the M4F.  Ironically, it's because the 
tools support floating point -- in the case of that 2101, the tools 
didn't know what to do with a MAC instruction and never generated one.  
So it was easy to tuck all the "DSP" stuff away in assembly language code 
that was only called from one c file.

I suppose it might be possible to compile just one or two magic files 
using the M4F switch, and compile the rest using the M4 switch (or 
whatever the gnu compiler supports -- that's my next task!!!).  If so, 
and if it works without weird namespace or other collisions, then I'd get 
software-synthesized math for most of the thing, and hardware math for 
the important stuff.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Tim Wescott ●August 2, 20132013-08-02

On Fri, 02 Aug 2013 09:03:47 +0100, FreeRTOS info wrote:

> On 02/08/2013 05:38, upsidedown@downunder.com wrote:
>> On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin
>> <no.email@nospam.invalid> wrote:
>> 
>>> info@quantum-leaps.com writes:
>>>> Also, the FPU in Cortex-M4F comes with its own register bank, which
>>>> needs to be saved/restored if the FPU can be used in the ISRs or in
>>>> tasks of a preemptive RTOS.
>>>
>>> In Tim's application, I wonder whether the FPU can be exclusively used
>>> by a single task, so nothing else touches the registers.  Is that a
>>> reasonable approach?
>> 
>> Floating point instructions in ISRs ? I have never encountered such
>> ISRs.
>> 
>> Why not use the same principle for some of the highest priority tasks
>> and only below a certain priority level FP-register save/restore is
>> performed. At the low levels, the full save/restore cost is not that
>> significant, since these tasks typically execute for quite long times
>> at once. Of course, this requires some hooks into the task scheduler,
>> but should not be too hard to implement.
>> 
>> 
> 
> Where in this thread does it say that the OP is using multitasking or a
> task scheduler?
> 
> If multithreading is not being used then the Cortex-M4F will handle
> everything for you by only saving the floating point registers when it
> is absolutely necessary (the save being triggered by a floating point
> instruction being executed - if you turn this functionality on).
> 
> If multithreading is being used then there are several different ways of
> doing it...the best of which can only be determined when you know how
> the application is using the FPU (from how many tasks, how often, etc.).
> 
> However, as per my previous post, I think this is quite off topic from a
> question of "is it 32-bits or 64-bits" so probably not a helpful
> discussion to the OP.

Totally off topic, yes.  But still interesting, and useful in that I may 
get 32 bit to work for me, and the selection of a multithreaded OS isn't 
entirely off the table.  This side discussion has certainly put a pretty 
high bar on any multitasking OS that I do select, so it's useful in that 
regard.

As I mentioned elsewhere, I'm currently planning on using a cooperative 
multitasker because (a) I have it lying around, and (b) I'm the only 
author on this software, so I don't have to worry about some dip**** 
trying to compute pi to 100 decimal places in the lowest-priority task 
without yielding.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by ●August 2, 20132013-08-02

Tim, Richard: To be strictly on topic, the whole discussion can be closed with just one number: 32, so all of the posts that go beyond this number are OT.

But, I still believe that the mention of the "lazy stacking" feature of the Cortex-M4F FPU _is_ relevant, even in the absence of a preemptive RTOS or ISRs that use the FPU. I think it's good to know about "lazy stacking", because it is enabled by default (when you enable the FPU), so if you don't know about it, it can hit you by unexpectedly high stack usage. "Lazy stacking" always allocates the space for the FPU registers on the stack, but the actual saving/restoring of the registers does not happen until the FPU is used. This has also an interesting implications for real-time, because if an ISR uses the FPU, its timing will carry the penalty of stacking the FPU registers.

Miro Samek
state-machine.com

Reply by ●August 2, 20132013-08-02

Indeed, a traditional RTOS kernel that can block in multiple places in a task body probably cannot take advantage of the "lazy stacking" feature.

But a simpler class of run-to-completion preemptive kernels _can_ take advantage of the "lazy stacking" and, in fact, this feature integrates very seamlessly with this type of kernels. The use of the Cortex-M4F FPU with a preemptive QK kernel is described in Section 4.2 of the AppNote, available at: http://www.state-machine.com/arm/AN_QP_and_ARM-Cortex-M-IAR.pdf .

Miro Samek
state-machine.com/arm

Reply by ●August 2, 20132013-08-02

On Thursday, August 1, 2013 11:08:05 PM UTC-4, Paul Rubin wrote:
> In Tim's application, I wonder whether the FPU can be exclusively used
> by a single task, so nothing else touches the registers.  Is that a
> reasonable approach?

Yes, this is the most efficient use of the FPU. In this case, you can disable "lazy stacking" to save stack space. The CMSIS-compliant code for disabling "lazy stacking" is:

    FPU->FPCCR &= ~((1U << FPU_FPCCR_ASPEN_Pos) | (1U << FPU_FPCCR_LSPEN_Pos));

Miro Samek
state-machine.com/arm

Reply by dp ●August 2, 20132013-08-02

On Wednesday, July 31, 2013 8:59:39 PM UTC+3, Jim Stewart wrote:
> ....
> 
> Just out of idle curiosity, what kind of an
> 
> application might require 64 bit floating point?

Oh more than those which can use 32 bits for sure.
For example, if you will be DSP-ing (that is, doing lots of MAC),
32-bit FP is just useless, the 24 bit mantissa begins
to lose data before you know.

32 bit FP can be useful of course but not a lot if
the FPU is constrained to 32-bit only. If it has both 32 and 64
one tends to use both, well, at least I tend to do so.

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Reply by dp ●August 2, 20132013-08-02

On Friday, August 2, 2013 8:27:42 PM UTC+3, in...@quantum-leaps.com wrote:
> Indeed, a traditional RTOS kernel that can block in multiple places in a task body probably cannot take advantage of the "lazy stacking" feature.
> 
> 
> 
> But a simpler class of run-to-completion preemptive kernels _can_ take advantage of the "lazy stacking" and, in fact, this feature integrates very seamlessly with this type of kernels. The use of the Cortex-M4F FPU with a preemptive QK kernel is described in Section 4.2 of the AppNote, available at: http://www.state-machine.com/arm/AN_QP_and_ARM-Cortex-M-IAR.pdf .
> 
> 
> 
> Miro Samek
> 
> state-machine.com/arm

Or, if an OS is well written, it does allow the tasks to
switch FPU saving on/off when needed - like I do under DPS
all the time, need FPU - call "fpuon$", which returns the former
state of "fpu" for that task. Return from the function, if former
state was off, switch it off again, leave on otherwise.
So FPU registers are saved during task switch only when
necessary.
This is not applicable to IRQ handlers, of course, but I can
think of no IRQ handler I ever wrote for what, nearly 30 years,
which needs/uses FPU.

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Previous 123 Next

Cortex M4 Floating Point Size

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group