EmbeddedRelated.com
Forums
Memfault Beyond the Launch

ARM Cortex Mx vs the rest of the gang

Started by Klaus Kragelund May 30, 2017
On 12.6.2017 г. 17:21, StateMachineCOM wrote:
> I said that "... the FPU integration in the Cortex-M4F/M7 is horrible", > because it adds tons of overhead and a lot of headache for the > system-level software.
> The problem is that the ARM Vector Floating-Point (VFP) coprocessor > comes with a big context of 32 32-bit registers (S0-S31). These > registers need to be saved and restored as part of every context switch, > just like the CPU registers.
They have to be saved during every context switch if the OS is broken. What stops it from saving the FPU context only for tasks which use the FPU? It is a single bit of information in the task descriptor. Then what stops the programmer from declaring the FPU is in use or not within a task? Normally what is done for this purpose is - enable FPU for tha task, saving the state the previous FPU_in_use bit state, - do the FPU work, - restore the previous state of the FPU_in_use bit. Obviously for tasks which use the FPU intensively this needs not be done, one just leaves the FPU_in_use on.
>.. ARM has come up with some hardware > optimizations called "lazy stacking and context switching" (see ARM > AppNote 298 at > http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf > ).
I once had a look at that only to confirm my suspicion that it is just piling problems over existing problems. Things like this are done by software, or rather software is written such that there is no need for that sort of thing.
> Anyway, does it have to be that hard? Apparently not. For example the > Renesas RX CPU comes also with single precision FPU, which is much > better integrated with the CPU and does not have its own register > context. Compared to the ARM VFP it is a pleasure to work with.
Not having a separate FPU register set is a disadvantage, not an advantage (now if the FPU is a useless 32 bit one having it at all is a disadvantage but this is another matter :-). Nothing prevents software from using what it wants and from saving only what has to be saved; often having an entire FPU and saving its entire context in addition to that of the integer unit makes things more efficient, sometimes a lot more efficient (e.g. implementing a FIR on a plain FPU where data dependencies do not allow you to do it at a speed more than speed/(pipeline length), you just need many registers). The e500 cores from freescale were never popular with me exactly because they had the integer and FP register sets in one, just 32 registers in total. Not a good idea on a load/store machine. Dimiter ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
Il giorno lunedì 12 giugno 2017 16:21:15 UTC+2, StateMachineCOM ha scritto:
> I said that "... the FPU integration in the Cortex-M4F/M7 is horrible", because it adds tons of overhead and a lot of headache for the system-level software. > > The problem is that the ARM Vector Floating-Point (VFP) coprocessor comes with a big context of 32 32-bit registers (S0-S31). These registers need to be saved and restored as part of every context switch, just like the CPU registers. ARM has come up with some hardware optimizations called "lazy stacking and context switching" (see ARM AppNote 298 at http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf ). But as you will see in the AppNote, the scheme is quite involved and still requires much more stack RAM than a context switch without the VFP. The overhead of the ARM VFP in a multitasking system is so big, in fact, that often it outweighs the benefits of having hardware FPU in the first place. Often, a better solution would be to use the FPU in one task only, and forbid to use it anywhere else. In this case, preserving the FPU context would be unnecessary. (But it is difficult to reliably forbid using FPU in other parts of the same code, so it opens the door for race conditions around the FPU if the rule is violated.) > > Anyway, does it have to be that hard? Apparently not. For example the Renesas RX CPU comes also with single precision FPU, which is much better integrated with the CPU and does not have its own register context. Compared to the ARM VFP it is a pleasure to work with. > > Miro Samek > state-machine.com
Thanks for the explanation. Bye Jack
On 6/12/2017 7:21 AM, StateMachineCOM wrote:
> I said that "... the FPU integration in the Cortex-M4F/M7 is horrible", > because it adds tons of overhead and a lot of headache for the system-level > software. > > The problem is that the ARM Vector Floating-Point (VFP) coprocessor comes > with a big context of 32 32-bit registers (S0-S31). These registers need to > be saved and restored as part of every context switch, just like the CPU > registers. ARM has come up with some hardware optimizations called "lazy > stacking and context switching" (see ARM AppNote 298 at > http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf > ). But as you will see in the AppNote, the scheme is quite involved and > still requires much more stack RAM than a context switch without the VFP.
Actually, this is fairly old news. I've been designing MTOS's & RTOS's with this in mind for at least 30 years -- though the actual mechanisms involved vary. E.g., the FPU for the NS32k is a true coprocessor; it executes in parallel with the CPU. So, you don't *want* to save its state at a context switch cuz it might be busy "doing something". Instead, you enable the trap on the FPU opcodes so that if the "new" task attempts to use the FPU, you first swap out the FPU's state -- having remembered which task it belongs to (which may not be the task that executed immediately prior to the current task!). Having done so, you restore the saved FPU state for *this* task, disable the trap and let the instruction complete *in* the FPU. All the while, knowing that it may not complete before the current task loses control of the processor. With this framework, you can configure individual tasks to use the FPU -- or not. *AND*, detect a task that "accidentally" uses the FPU when it has been configured not to. The last bit is important because you can build different flavor TCB's -- one that holds just the basic registers and another that holds the basic *plus* the FPU state. If your tools give you finer-grained control over which *parts* of the FPU are used, then you can similarly refine the parts that you save/restore. [E.g., the Zx80 has an "alternate register set" that a compiler will rarely make use of. But, is handy for ASM coders. Saving and restoring it unconditionally is wasteful as it almost doubles the process state. *But*, conditionally doing this (synchronous with a regular task switch in the case of the Zx80's) can offer significant reward.]
> The overhead of the ARM VFP in a multitasking system is so big, in fact, > that often it outweighs the benefits of having hardware FPU in the first > place. Often, a better solution would be to use the FPU in one task only, > and forbid to use it anywhere else. In this case, preserving the FPU context > would be unnecessary. (But it is difficult to reliably forbid using FPU in > other parts of the same code, so it opens the door for race conditions > around the FPU if the rule is violated.)
Why "difficult"? Turn on FP emulation in the code compiled for the "nonFPU tasks". Then, *if* an occasional floating point instruction is invoked, it just executes "slowly".
> Anyway, does it have to be that hard? Apparently not. For example the > Renesas RX CPU comes also with single precision FPU, which is much better > integrated with the CPU and does not have its own register context. Compared > to the ARM VFP it is a pleasure to work with.
That's like complaining that the 6 course meal is much inferior to just "grabbing a burger" at a fast-food joint...
On 6/12/2017 7:57 AM, Dimiter_Popoff wrote:
> On 12.6.2017 г. 17:21, StateMachineCOM wrote: >> I said that "... the FPU integration in the Cortex-M4F/M7 is horrible", >> because it adds tons of overhead and a lot of headache for the >> system-level software. > >> The problem is that the ARM Vector Floating-Point (VFP) coprocessor >> comes with a big context of 32 32-bit registers (S0-S31). These >> registers need to be saved and restored as part of every context switch, >> just like the CPU registers. > > They have to be saved during every context switch if the OS is broken. > What stops it from saving the FPU context only for tasks which use > the FPU? It is a single bit of information in the task descriptor.
Exactly.
> Then what stops the programmer from declaring the FPU is in use or > not within a task? Normally what is done for this purpose is > - enable FPU for tha task, saving the state the previous FPU_in_use bit > state, > - do the FPU work, > - restore the previous state of the FPU_in_use bit. > Obviously for tasks which use the FPU intensively this needs not > be done, one just leaves the FPU_in_use on. > >> .. ARM has come up with some hardware >> optimizations called "lazy stacking and context switching" (see ARM >> AppNote 298 at >> http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf >> >> ). > > I once had a look at that only to confirm my suspicion that it is just > piling problems over existing problems. Things like this are done > by software, or rather software is written such that there is no > need for that sort of thing. > >> Anyway, does it have to be that hard? Apparently not. For example the >> Renesas RX CPU comes also with single precision FPU, which is much >> better integrated with the CPU and does not have its own register >> context. Compared to the ARM VFP it is a pleasure to work with. > > Not having a separate FPU register set is a disadvantage, not an > advantage (now if the FPU is a useless 32 bit one having it at all > is a disadvantage but this is another matter :-).
Exactly. As memory is now the bottleneck in most designs, any program state (e.g., floating point values) that have to be saved OUTSIDE the CPU (even in normal operation, ignoring context switches) take a hit in terms of performance. Esp when you're dealing with "big" data types (64, 80, 128b). [Guttag clearly missed the boat on that call wrt the 99K. Cute/clever idea but he failed to see the growing disparity in the CPU/memory "impedance mismatch". But, context switches were a piece of cake! :>]
> Nothing prevents software from using what it wants and from saving > only what has to be saved; often having an entire FPU and saving its > entire context in addition to that of the integer unit makes things more > efficient, sometimes a lot more efficient (e.g. implementing a FIR > on a plain FPU where data dependencies do not allow you to do it > at a speed more than speed/(pipeline length), you just need many > registers).
Ideally, register-rich processors would implement a set of internal flags that would be set each time a register was loaded. And, a "save state" opcode that acted similar to the 09's PUSH "vector" saving only those registers that have been "altered". Then, letting the programmer clear this "flag register" as it reloaded the preserved state for the new task. [An even more flexible/conditional scheme would allow traps for each register but the overhead of servicing the trap FOR EACH REGISTER would far outweigh the cost of unconditionally restoring ALL state]
> The e500 cores from freescale were never popular with me exactly > because they had the integer and FP register sets in one, just 32 > registers in total. Not a good idea on a load/store machine.
On 12.6.2017 г. 20:10, Don Y wrote:
>> .... > > Ideally, register-rich processors would implement a set of internal flags > that would be set each time a register was loaded. And, a "save state" > opcode that acted similar to the 09's PUSH "vector" saving only those > registers that have been "altered". Then, letting the programmer > clear this "flag register" as it reloaded the preserved state for > the new task.
Hi Don, I am not so sure how useful this would be, basically software knows which registers have been used and which not, it is up to it to save just what needs to be saved and restore it when needed only. It could be some help but not enough to justify the extra silicon & complexity I believe. Dimiter
Hi Dimiter,

On 6/12/2017 12:06 PM, Dimiter_Popoff wrote:
> On 12.6.2017 г. 20:10, Don Y wrote: >>> .... >> >> Ideally, register-rich processors would implement a set of internal flags >> that would be set each time a register was loaded. And, a "save state" >> opcode that acted similar to the 09's PUSH "vector" saving only those >> registers that have been "altered". Then, letting the programmer >> clear this "flag register" as it reloaded the preserved state for >> the new task. > > I am not so sure how useful this would be, basically software knows > which registers have been used and which not, it is up to it to save > just what needs to be saved and restore it when needed only.
You're not thinking with HLL's in mind -- where a *tool* creates the software (how does the tool tell you, concisely, which registers it used?) And, even if you know which registers were used, you don't know which were used SINCE THE LAST CONTEXT SWITCH!
> It could be some help but not enough to justify the extra silicon & complexity > I believe.
The silicon is trivial: each load of a register (or register in a register file) forces a corresponding bit to be set in a collection of flags. Then, a new "PUSH <vector>" opcode simply uses that "collection of flags" as the <vector>. If you had to "manually" examine the flags (bits) in that vector and conditionally save/restore registers, the overhead of doing so wouldn't offset the cost of just unconditionally performing the save/restore. In essence, this is what I do with my handling of the FPU context (see other post). I assume the FPU registers are NOT used and let the processor (in the NS32k example) tell me when a floating point instruction is invoked (the FPU is an optional component in the early NS32k systems; if it is NOT present, the opcodes are implemented by traps to user-supplied emulation functions) by invoking a TRAP handler. Of course, I can use that notification (with or without a hardware FPU) to alert the OS to the fact that the additional state is being referenced and save/restore it, as appropriate. [This only needs to happen at most once for each context switch]
On 12.6.2017 &#1075;. 22:19, Don Y wrote:
> Hi Dimiter, > > On 6/12/2017 12:06 PM, Dimiter_Popoff wrote: >> On 12.6.2017 &#1075;. 20:10, Don Y wrote: >>>> .... >>> >>> Ideally, register-rich processors would implement a set of internal >>> flags >>> that would be set each time a register was loaded. And, a "save state" >>> opcode that acted similar to the 09's PUSH "vector" saving only those >>> registers that have been "altered". Then, letting the programmer >>> clear this "flag register" as it reloaded the preserved state for >>> the new task. >> >> I am not so sure how useful this would be, basically software knows >> which registers have been used and which not, it is up to it to save >> just what needs to be saved and restore it when needed only. > > You're not thinking with HLL's in mind -- where a *tool* creates the > software > (how does the tool tell you, concisely, which registers it used?)
Of course not, although HLL-s should be pretty good at knowing what to push/pull. But I certainly consider a language broken if it creates a need for hardware not needed when using other languages.
> > And, even if you know which registers were used, you don't know which were > used SINCE THE LAST CONTEXT SWITCH!
Well and if they are not changed what, you have to mark stack frames somehow to know what exactly did you save so you can restore etc.; then even if you switch task once per ms the time to save/restore all registers is negligible (32 longwords get written to cache on a 400 MHz 32 bit processor within 80 ns...). OTOH if it is just an interrupt handler it will know what registers it uses and would save just them - and will modify them so there is no need to know whether they were changed or not.
> >> It could be some help but not enough to justify the extra silicon & >> complexity >> I believe. > > The silicon is trivial: each load of a register (or register in a register > file) forces a corresponding bit to be set in a collection of flags. > > Then, a new "PUSH <vector>" opcode simply uses that "collection of > flags" as > the <vector>.
This sounds simple enough indeed, push the register list along with the list descriptor, then use it to restore the list. But you will still have to calculate the length you allocated to save the list and this variable length might complicate the scheduler enough to cancel the benefits if not worse... hard to say by just hypothesizing.
> If you had to "manually" examine the flags (bits) in that vector and > conditionally save/restore registers, the overhead of doing so wouldn't > offset the cost of just unconditionally performing the save/restore. > > In essence, this is what I do with my handling of the FPU context (see > other > post). I assume the FPU registers are NOT used and let the processor > (in the NS32k example) tell me when a floating point instruction is invoked > (the FPU is an optional component in the early NS32k systems; if it is NOT > present, the opcodes are implemented by traps to user-supplied emulation > functions) by invoking a TRAP handler.
I am even more cheeky than that on power for DPS. A task _must_ have declared it will use the FPU if it will use it, this means it gets its FPU context preserved, entirely. All 32 FP regs + fpscr etc. thing. If not, the task just won't know what the state of the FPU is; I can make it trap (a bit in the MCR) or leave it unknown (not sure which I do, trying to use the FPU when not explicitly enabled is a programming error). Quite often when I need the FPU for just a function and do not know whether the calling task will have the FPU enabled or not at the beginning the function saves the FPU on/off state, switches it "on", does its job then restores the former state. Dimiter ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
On 6/12/2017 1:04 PM, Dimiter_Popoff wrote:
> On 12.6.2017 &#1075;. 22:19, Don Y wrote: >> On 6/12/2017 12:06 PM, Dimiter_Popoff wrote: >>> On 12.6.2017 &#1075;. 20:10, Don Y wrote: >>>>> .... >>>> >>>> Ideally, register-rich processors would implement a set of internal >>>> flags >>>> that would be set each time a register was loaded. And, a "save state" >>>> opcode that acted similar to the 09's PUSH "vector" saving only those >>>> registers that have been "altered". Then, letting the programmer >>>> clear this "flag register" as it reloaded the preserved state for >>>> the new task. >>> >>> I am not so sure how useful this would be, basically software knows >>> which registers have been used and which not, it is up to it to save >>> just what needs to be saved and restore it when needed only. >> >> You're not thinking with HLL's in mind -- where a *tool* creates the >> software >> (how does the tool tell you, concisely, which registers it used?) > > Of course not, although HLL-s should be pretty good at knowing what to > push/pull. But I certainly consider a language broken if it creates a > need for hardware not needed when using other languages.
Yes, but the language doesn't know when a context switch is going to be triggered that swaps out *all* of the processor state! context switch restore state of task M ---- do something alter register Q task M do something alter register P ---- preserve state of task M context switch restore state of task N ---- use register Y alter register Q task N ... When preserving the state of task M, *only* the saved state of registers Q and P needs to be updated as they are the only registers that have been altered in that portion of task M's execution. The other 873 registers haven't been altered so the "preserved copies" of their state needn't be updated. The problem with traditional architectures is that there is no easy way of the "context switch" routine knowing what has been altered since the state for *that* task was most recently preserved. The hack I proposed would be "reset" as the last step in the context switch so that any alterations of the register file's contents would be individually flagged. I.e., at the time of the task switch at the end of task M, the CPU will have "noticed" that ONLY registers Q & P have been updated (update can be as simple as noting ANY write to the register *or* as clever as noting any write that alters the specific contents of *that* register so overwriting a value of '27' with another '27' would NOT flag the register as "altered"). So, when the context switch went to "preserve state of task M", it would execute this magical "PUSH <vector>" command and move only those altered register contents back into the TCB. [It's not really a "PUSH" as it needs to put each register in a specific place relative to some "frame" -- i.e., the TCB -- which could be carefully arranged to be ToS relative]
>> And, even if you know which registers were used, you don't know which were >> used SINCE THE LAST CONTEXT SWITCH! > > Well and if they are not changed what, you have to mark stack frames > somehow to know what exactly did you save so you can restore etc.; > then even if you switch task once per ms the time to save/restore > all registers is negligible (32 longwords get written to cache on a > 400 MHz 32 bit processor within 80 ns...).
Yes, but its obvious that the number of cores and the number (and size) of registers in those cores will only keep increasing as the memory interface becomes more of a "performance issue". Cache tries to handwave around this problem -- more buffering doesn't always come without costs (e.g., having to flush the cache) N cores means N times the hammering on the memory interface due to task switches.
> OTOH if it is just an interrupt handler it will know what registers > it uses and would save just them - and will modify them so there is > no need to know whether they were changed or not.
Yes -- in that case, the developer knows what he's "touched" in the ISR (if written in ASM) and would only bother preserving and restoring those things that he was about to "taint". E.g., there are ARM's that have a "FIRQ (Fast IRQ)" capability that essentially preserves minimal state for extremely low latency IRQ's.
>>> It could be some help but not enough to justify the extra silicon & >>> complexity >>> I believe. >> >> The silicon is trivial: each load of a register (or register in a register >> file) forces a corresponding bit to be set in a collection of flags. >> >> Then, a new "PUSH <vector>" opcode simply uses that "collection of >> flags" as >> the <vector>. > > This sounds simple enough indeed, push the register list along with > the list descriptor, then use it to restore the list. But you will > still have to calculate the length you allocated to save the list > and this variable length might complicate the scheduler enough to > cancel the benefits if not worse... hard to say by just hypothesizing.
The CPU vendor knows how many registers he has in the core. So, if he knows that registers 1, 2 and 12 need to be saved (by this magical "PUSH <vector>" instruction, he could save r1 to WORKSPACE+1, r2 to WORKSPACE+2 and r12 to WORKSPACE+12 where WORKSPACE is a particular register/address. I.e., the 99k implemented all registers in main memory. So, when you accessed r1, you were really accessing the contents of memory at WORKSPACE_POINTER+(1*register_size). Shifting r1 left resulted in a read of that memory location into the CPU, a left ship in the ALU and a write *back* to that location of the updated datum. Imagine, instead, caching all of those operations in an internal register file (gee, what a novel idea! :> ) and, only flushing the contents of ALTERED registers back into main memory (at locations relative to the WORKSPACE_POINTER) when a task switch was needed. [I.e., on the 99k, to do a context switch, you just changed one register -- the workspace pointer -- as the context was already *in* memory!]
>> If you had to "manually" examine the flags (bits) in that vector and >> conditionally save/restore registers, the overhead of doing so wouldn't >> offset the cost of just unconditionally performing the save/restore. >> >> In essence, this is what I do with my handling of the FPU context (see >> other >> post). I assume the FPU registers are NOT used and let the processor >> (in the NS32k example) tell me when a floating point instruction is invoked >> (the FPU is an optional component in the early NS32k systems; if it is NOT >> present, the opcodes are implemented by traps to user-supplied emulation >> functions) by invoking a TRAP handler. > > I am even more cheeky than that on power for DPS. A task _must_ have > declared it will use the FPU if it will use it, this means it gets its > FPU context preserved, entirely. All 32 FP regs + fpscr etc. thing. > If not, the task just won't know what the state of the FPU is; I can > make it trap (a bit in the MCR) or leave it unknown (not sure which > I do, trying to use the FPU when not explicitly enabled is a programming > error). Quite often when I need the FPU for just a function and do not > know whether the calling task will have the FPU enabled or not at > the beginning the function saves the FPU on/off state, switches > it "on", does its job then restores the former state.
I did this in my first implementation. But, that meant having a handler for those cases where someone screwed up the configuration of the task and forgot to indicate that it needed the FPU. Runtime might not see a task run that portion of its code so you might not see a "crash and burn" (until thorough testing). So, if you have to handle the case where the task hasn't been configured to use the FPU and it *does*, then why not let that handler just "handle the FPU's usage" without forcing the developer to make that configuration choice? You can create "FCB's" (FPU Control Blocks) to store FPU state and reference them *from* each task's TCB (so they are a part of that task's "state"). This also lets the FPU-handler keep a pointer to the FCB into which the current FPU's hardware should be (eventually) saved -- *if* that need arises. If the current task executes a FP opcode before the FPU state has been preserved, then the old task's FPU state can be saved the TCB for the current task lets you chase down the FCB for the current task so it's previous FPU state can be restored before the floating point operation is allowed to continue (restart). Its damn near impossible to come up with a winning strategy without an understanding of the application and the deployment environment. Where are the *effective* resource shortages?
On 13.6.2017 &#1075;. 00:34, Don Y wrote:
> On 6/12/2017 1:04 PM, Dimiter_Popoff wrote: > ...... >> >> I am even more cheeky than that on power for DPS. A task _must_ have >> declared it will use the FPU if it will use it, this means it gets its >> FPU context preserved, entirely. All 32 FP regs + fpscr etc. thing. >> If not, the task just won't know what the state of the FPU is; I can >> make it trap (a bit in the MCR) or leave it unknown (not sure which >> I do, trying to use the FPU when not explicitly enabled is a programming >> error). Quite often when I need the FPU for just a function and do not >> know whether the calling task will have the FPU enabled or not at >> the beginning the function saves the FPU on/off state, switches >> it "on", does its job then restores the former state. > > I did this in my first implementation. But, that meant having a handler > for those cases where someone screwed up the configuration of the task > and forgot to indicate that it needed the FPU. Runtime might not see > a task run that portion of its code so you might not see a "crash and burn" > (until thorough testing). > > So, if you have to handle the case where the task hasn't been configured > to use the FPU and it *does*, then why not let that handler just "handle > the FPU's usage" without forcing the developer to make that configuration > choice?
Yes, I see no problem with that but not much gain either. Generally I don't care much about mistakes other people will make, I make enough of my own to care for. So if something is a programming error the best I can do is to make it as easily detectable as possible. The way you have done it is OK of course, it is no longer a programming error, but I am not sure it would save me much time. Nor would it waste me much though, so why not. Dimiter
On 6/12/2017 4:44 PM, Dimiter_Popoff wrote:
> On 13.6.2017 &#1075;. 00:34, Don Y wrote: >> On 6/12/2017 1:04 PM, Dimiter_Popoff wrote: >> ...... >>> >>> I am even more cheeky than that on power for DPS. A task _must_ have >>> declared it will use the FPU if it will use it, this means it gets its >>> FPU context preserved, entirely. All 32 FP regs + fpscr etc. thing. >>> If not, the task just won't know what the state of the FPU is; I can >>> make it trap (a bit in the MCR) or leave it unknown (not sure which >>> I do, trying to use the FPU when not explicitly enabled is a programming >>> error). Quite often when I need the FPU for just a function and do not >>> know whether the calling task will have the FPU enabled or not at >>> the beginning the function saves the FPU on/off state, switches >>> it "on", does its job then restores the former state. >> >> I did this in my first implementation. But, that meant having a handler >> for those cases where someone screwed up the configuration of the task >> and forgot to indicate that it needed the FPU. Runtime might not see >> a task run that portion of its code so you might not see a "crash and burn" >> (until thorough testing). >> >> So, if you have to handle the case where the task hasn't been configured >> to use the FPU and it *does*, then why not let that handler just "handle >> the FPU's usage" without forcing the developer to make that configuration >> choice? > > Yes, I see no problem with that but not much gain either. Generally I > don't care much about mistakes other people will make, I make enough of > my own to care for. So if something is a programming error the best I > can do is to make it as easily detectable as possible. The way you > have done it is OK of course, it is no longer a programming error, but > I am not sure it would save me much time. Nor would it waste me much > though, so why not.
The gain comes by decoupling the need to configure the "uses_FPU" flag for each task as well as eliminating the problem of a developer failing to *correctly* define that flag (i.e., if he changes the switches used with the compiler to use FP opcodes in the generated code he won't shoot himself in the foot). This is especially true when the task can execute code that the developer didn't write/compile. Does he know under what circumstances FP opcodes are called into play? I'm finding working in "resource richer" environments results in very different approaches to software/system design! :<

Memfault Beyond the Launch