EmbeddedRelated.com
Forums

Cortex M4 Floating Point Size

Started by Tim Wescott July 30, 2013
upsidedown@downunder.com writes:
> Floating point instructions in ISRs ? I have never encountered such > ISRs.
Well I've heard of applications whose main loop consisted of a halt instruction repeated endlessly. All the functionality happened at interrupt level. No idea if they used floating point. :)
On 02/08/2013 05:38, upsidedown@downunder.com wrote:
> On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin > <no.email@nospam.invalid> wrote: > >> info@quantum-leaps.com writes: >>> Also, the FPU in Cortex-M4F comes with its own register bank, which >>> needs to be saved/restored if the FPU can be used in the ISRs or in >>> tasks of a preemptive RTOS. >> >> In Tim's application, I wonder whether the FPU can be exclusively used >> by a single task, so nothing else touches the registers. Is that a >> reasonable approach? > > Floating point instructions in ISRs ? I have never encountered such > ISRs. > > Why not use the same principle for some of the highest priority tasks > and only below a certain priority level FP-register save/restore is > performed. At the low levels, the full save/restore cost is not that > significant, since these tasks typically execute for quite long times > at once. Of course, this requires some hooks into the task scheduler, > but should not be too hard to implement. >
Where in this thread does it say that the OP is using multitasking or a task scheduler? If multithreading is not being used then the Cortex-M4F will handle everything for you by only saving the floating point registers when it is absolutely necessary (the save being triggered by a floating point instruction being executed - if you turn this functionality on). If multithreading is being used then there are several different ways of doing it...the best of which can only be determined when you know how the application is using the FPU (from how many tasks, how often, etc.). However, as per my previous post, I think this is quite off topic from a question of "is it 32-bits or 64-bits" so probably not a helpful discussion to the OP. Regards, Richard. + http://www.FreeRTOS.org Designed for microcontrollers. More than 103000 downloads in 2012. + http://www.FreeRTOS.org/plus Trace, safety certification, FAT FS, TCP/IP, training, and more...
On 8/1/2013 10:38 PM, upsidedown@downunder.com wrote:
> On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin > <no.email@nospam.invalid> wrote: > >> info@quantum-leaps.com writes: >>> Also, the FPU in Cortex-M4F comes with its own register bank, which >>> needs to be saved/restored if the FPU can be used in the ISRs or in >>> tasks of a preemptive RTOS. >> >> In Tim's application, I wonder whether the FPU can be exclusively used >> by a single task, so nothing else touches the registers. Is that a >> reasonable approach? > > Floating point instructions in ISRs ? I have never encountered such > ISRs.
I did that years ago (1985) on the i286 w/floating point co-processor (i287). 3-Axis vertical mill, at each 8 mSec interrupt a new position of one of the axis was run. A simple mutex handled the FPU. There was no RTOS involved, just a simple round robin of each axis. All code was written with Turbo C. Also did the same with a Z80 and an AM9511a co-processor before that. This one used Microsoft BASIC. hamilton
On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin wrote:

> info@quantum-leaps.com writes: >> Also, the FPU in Cortex-M4F comes with its own register bank, which >> needs to be saved/restored if the FPU can be used in the ISRs or in >> tasks of a preemptive RTOS. > > In Tim's application, I wonder whether the FPU can be exclusively used > by a single task, so nothing else touches the registers. Is that a > reasonable approach?
It would. I've thought of that. At the moment the whole application is small enough that I'm planning on using a home-rolled cooperative multitasker that dodges the whole context-switch thing at the expense of weighing down the developer with the need to chop low-priority computations up into bits that are small enough that they don't bog down important tasks. So the whole "can't RTOS" thing is moot for me at the moment. As far as the "only one task gets the math processor", I've actually already been there, done that (sorta), with the ADSP 2101 using an RTOS. The ADSP 2101 has some hardware context associated with its DSP functionality that is simply not accessible via software (except by "push" and "pop" into very shallow hardware stacks). It's not even a matter of "slow" -- it's "you can't, sucker". So if you want to use its DSP features in an RTOS you're limited to doing it in one task. (Well, one task and one ISR, thanks to those shallow stacks). All the "regular processor" stuff can be context-switched just fine, however. So we used the thing exactly that way: we had one task for the heavy lifting (running a spinning-wheel gyroscope that had to be in closed loop) with a bunch of tasks to make it play nice with the balance of the system. That one magic control task was the _only_ task that got its fingers onto the MAC and associated instructions; everything else was kept away. The board, by the way, worked great. It would be harder to do this with the M4F. Ironically, it's because the tools support floating point -- in the case of that 2101, the tools didn't know what to do with a MAC instruction and never generated one. So it was easy to tuck all the "DSP" stuff away in assembly language code that was only called from one c file. I suppose it might be possible to compile just one or two magic files using the M4F switch, and compile the rest using the M4 switch (or whatever the gnu compiler supports -- that's my next task!!!). If so, and if it works without weird namespace or other collisions, then I'd get software-synthesized math for most of the thing, and hardware math for the important stuff. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On Fri, 02 Aug 2013 09:03:47 +0100, FreeRTOS info wrote:

> On 02/08/2013 05:38, upsidedown@downunder.com wrote: >> On Thu, 01 Aug 2013 20:08:05 -0700, Paul Rubin >> <no.email@nospam.invalid> wrote: >> >>> info@quantum-leaps.com writes: >>>> Also, the FPU in Cortex-M4F comes with its own register bank, which >>>> needs to be saved/restored if the FPU can be used in the ISRs or in >>>> tasks of a preemptive RTOS. >>> >>> In Tim's application, I wonder whether the FPU can be exclusively used >>> by a single task, so nothing else touches the registers. Is that a >>> reasonable approach? >> >> Floating point instructions in ISRs ? I have never encountered such >> ISRs. >> >> Why not use the same principle for some of the highest priority tasks >> and only below a certain priority level FP-register save/restore is >> performed. At the low levels, the full save/restore cost is not that >> significant, since these tasks typically execute for quite long times >> at once. Of course, this requires some hooks into the task scheduler, >> but should not be too hard to implement. >> >> > > Where in this thread does it say that the OP is using multitasking or a > task scheduler? > > If multithreading is not being used then the Cortex-M4F will handle > everything for you by only saving the floating point registers when it > is absolutely necessary (the save being triggered by a floating point > instruction being executed - if you turn this functionality on). > > If multithreading is being used then there are several different ways of > doing it...the best of which can only be determined when you know how > the application is using the FPU (from how many tasks, how often, etc.). > > However, as per my previous post, I think this is quite off topic from a > question of "is it 32-bits or 64-bits" so probably not a helpful > discussion to the OP.
Totally off topic, yes. But still interesting, and useful in that I may get 32 bit to work for me, and the selection of a multithreaded OS isn't entirely off the table. This side discussion has certainly put a pretty high bar on any multitasking OS that I do select, so it's useful in that regard. As I mentioned elsewhere, I'm currently planning on using a cooperative multitasker because (a) I have it lying around, and (b) I'm the only author on this software, so I don't have to worry about some dip**** trying to compute pi to 100 decimal places in the lowest-priority task without yielding. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Tim, Richard: To be strictly on topic, the whole discussion can be closed with just one number: 32, so all of the posts that go beyond this number are OT.

But, I still believe that the mention of the "lazy stacking" feature of the Cortex-M4F FPU _is_ relevant, even in the absence of a preemptive RTOS or ISRs that use the FPU. I think it's good to know about "lazy stacking", because it is enabled by default (when you enable the FPU), so if you don't know about it, it can hit you by unexpectedly high stack usage. "Lazy stacking" always allocates the space for the FPU registers on the stack, but the actual saving/restoring of the registers does not happen until the FPU is used. This has also an interesting implications for real-time, because if an ISR uses the FPU, its timing will carry the penalty of stacking the FPU registers.

Miro Samek
state-machine.com
Indeed, a traditional RTOS kernel that can block in multiple places in a task body probably cannot take advantage of the "lazy stacking" feature.

But a simpler class of run-to-completion preemptive kernels _can_ take advantage of the "lazy stacking" and, in fact, this feature integrates very seamlessly with this type of kernels. The use of the Cortex-M4F FPU with a preemptive QK kernel is described in Section 4.2 of the AppNote, available at: http://www.state-machine.com/arm/AN_QP_and_ARM-Cortex-M-IAR.pdf .

Miro Samek
state-machine.com/arm
On Thursday, August 1, 2013 11:08:05 PM UTC-4, Paul Rubin wrote:
> In Tim's application, I wonder whether the FPU can be exclusively used > by a single task, so nothing else touches the registers. Is that a > reasonable approach?
Yes, this is the most efficient use of the FPU. In this case, you can disable "lazy stacking" to save stack space. The CMSIS-compliant code for disabling "lazy stacking" is: FPU->FPCCR &= ~((1U << FPU_FPCCR_ASPEN_Pos) | (1U << FPU_FPCCR_LSPEN_Pos)); Miro Samek state-machine.com/arm
On Wednesday, July 31, 2013 8:59:39 PM UTC+3, Jim Stewart wrote:
> .... > > Just out of idle curiosity, what kind of an > > application might require 64 bit floating point?
Oh more than those which can use 32 bits for sure. For example, if you will be DSP-ing (that is, doing lots of MAC), 32-bit FP is just useless, the 24 bit mantissa begins to lose data before you know. 32 bit FP can be useful of course but not a lot if the FPU is constrained to 32-bit only. If it has both 32 and 64 one tends to use both, well, at least I tend to do so. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On Friday, August 2, 2013 8:27:42 PM UTC+3, in...@quantum-leaps.com wrote:
> Indeed, a traditional RTOS kernel that can block in multiple places in a task body probably cannot take advantage of the "lazy stacking" feature. > > > > But a simpler class of run-to-completion preemptive kernels _can_ take advantage of the "lazy stacking" and, in fact, this feature integrates very seamlessly with this type of kernels. The use of the Cortex-M4F FPU with a preemptive QK kernel is described in Section 4.2 of the AppNote, available at: http://www.state-machine.com/arm/AN_QP_and_ARM-Cortex-M-IAR.pdf . > > > > Miro Samek > > state-machine.com/arm
Or, if an OS is well written, it does allow the tasks to switch FPU saving on/off when needed - like I do under DPS all the time, need FPU - call "fpuon$", which returns the former state of "fpu" for that task. Return from the function, if former state was off, switch it off again, leave on otherwise. So FPU registers are saved during task switch only when necessary. This is not applicable to IRQ handlers, of course, but I can think of no IRQ handler I ever wrote for what, nearly 30 years, which needs/uses FPU. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/