ARM Cortex Mx vs the rest of the gang| page 3

Reply by Dimiter_Popoff ●June 12, 20172017-06-12

On 12.6.2017 &#1075;. 17:21, StateMachineCOM wrote:
> I said that "... the FPU integration in the Cortex-M4F/M7 is horrible",
> because it adds tons of overhead and a lot of headache for the
> system-level software.

> The problem is that the ARM Vector Floating-Point (VFP) coprocessor
> comes with a big context of 32 32-bit registers (S0-S31). These
> registers need to be saved and restored as part of every context switch,
> just like the CPU registers.

They have to be saved during every context switch if the OS is broken.
What stops it from saving the FPU context only for tasks which use
the FPU? It is a single bit of information in the task descriptor.
Then what stops the programmer from declaring the FPU is in use or
not within a task? Normally what is done for this purpose is
- enable FPU for tha task, saving the state the previous FPU_in_use bit
state,
- do the FPU work,
- restore the previous state of the FPU_in_use bit.
Obviously for tasks which use the FPU intensively this needs not
be done, one just leaves the FPU_in_use on.

>.. ARM has come up with some hardware
> optimizations called "lazy stacking and context switching" (see ARM
> AppNote 298 at
> http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf
> ).

I once had a look at that only to confirm my suspicion that it is just
piling problems over existing problems. Things like this are done
by software, or rather software is written such that there is no
need for that sort of thing.

> Anyway, does it have to be that hard? Apparently not. For example the
> Renesas RX CPU comes also with single precision FPU, which is much
> better integrated with the CPU and does not have its own register
> context. Compared to the ARM VFP it is a pleasure to work with.

Not having a separate FPU register set is a disadvantage, not an
advantage (now if the FPU is a useless 32 bit one having it at all
is a disadvantage but this is another matter :-).
Nothing prevents software from using what it wants and from saving
only what has to be saved; often having an entire FPU and saving its
entire context in addition to that of the integer unit makes things more
efficient, sometimes a lot more efficient (e.g. implementing a FIR
on a plain FPU where data dependencies do not allow you to do it
at a speed more than speed/(pipeline length), you just need many
registers).
The e500 cores from freescale were never popular with me exactly
because they had the integer and FP register sets in one, just 32
registers in total. Not a good idea on a load/store machine.

Dimiter

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

Reply by Jack ●June 12, 20172017-06-12

Il giorno luned&igrave; 12 giugno 2017 16:21:15 UTC+2, StateMachineCOM ha scritto:
> I said that "... the FPU integration in the Cortex-M4F/M7 is horrible", because it adds tons of overhead and a lot of headache for the system-level software.
> 
> The problem is that the ARM Vector Floating-Point (VFP) coprocessor comes with a big context of 32 32-bit registers (S0-S31). These registers need to be saved and restored as part of every context switch, just like the CPU registers. ARM has come up with some hardware optimizations called "lazy stacking and context switching" (see ARM AppNote 298 at http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf ). But as you will see in the AppNote, the scheme is quite involved and still requires much more stack RAM than a context switch without the VFP. The overhead of the ARM VFP in a multitasking system is so big, in fact, that often it outweighs the benefits of having hardware FPU in the first place. Often, a better solution would be to use the FPU in one task only, and forbid to use it anywhere else. In this case, preserving the FPU context would be unnecessary. (But it is difficult to reliably forbid using FPU in other parts of the same code, so it opens the door for race conditions around the FPU if the rule is violated.)
> 
> Anyway, does it have to be that hard? Apparently not. For example the Renesas RX CPU comes also with single precision FPU, which is much better integrated with the CPU and does not have its own register context. Compared to the ARM VFP it is a pleasure to work with.
> 
> Miro Samek
> state-machine.com

Thanks for the explanation.

Bye Jack

Reply by Don Y ●June 12, 20172017-06-12

On 6/12/2017 7:21 AM, StateMachineCOM wrote:
> I said that "... the FPU integration in the Cortex-M4F/M7 is horrible",
> because it adds tons of overhead and a lot of headache for the system-level
> software.
>
> The problem is that the ARM Vector Floating-Point (VFP) coprocessor comes
> with a big context of 32 32-bit registers (S0-S31). These registers need to
> be saved and restored as part of every context switch, just like the CPU
> registers. ARM has come up with some hardware optimizations called "lazy
> stacking and context switching" (see ARM AppNote 298 at
> http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf
> ). But as you will see in the AppNote, the scheme is quite involved and
> still requires much more stack RAM than a context switch without the VFP.

Actually, this is fairly old news.  I've been designing MTOS's & RTOS's
with this in mind for at least 30 years -- though the actual mechanisms
involved vary.

E.g., the FPU for the NS32k is a true coprocessor; it executes in parallel
with the CPU.  So, you don't *want* to save its state at a context switch
cuz it might be busy "doing something".

Instead, you enable the trap on the FPU opcodes so that if the "new" task
attempts to use the FPU, you first swap out the FPU's state -- having
remembered which task it belongs to (which may not be the task that executed
immediately prior to the current task!).  Having done so, you restore the
saved FPU state for *this* task, disable the trap and let the instruction
complete *in* the FPU.  All the while, knowing that it may not complete
before the current task loses control of the processor.

With this framework, you can configure individual tasks to use the FPU -- or
not.  *AND*, detect a task that "accidentally" uses the FPU when it has been
configured not to.  The last bit is important because you can build different
flavor TCB's -- one that holds just the basic registers and another that
holds the basic *plus* the FPU state.

If your tools give you finer-grained control over which *parts* of the FPU
are used, then you can similarly refine the parts that you save/restore.

[E.g., the Zx80 has an "alternate register set" that a compiler will rarely
make use of.  But, is handy for ASM coders.  Saving and restoring it
unconditionally is wasteful as it almost doubles the process state.  *But*,
conditionally doing this (synchronous with a regular task switch in the
case of the Zx80's) can offer significant reward.]

> The overhead of the ARM VFP in a multitasking system is so big, in fact,
> that often it outweighs the benefits of having hardware FPU in the first
> place. Often, a better solution would be to use the FPU in one task only,
> and forbid to use it anywhere else. In this case, preserving the FPU context
> would be unnecessary. (But it is difficult to reliably forbid using FPU in
> other parts of the same code, so it opens the door for race conditions
> around the FPU if the rule is violated.)

Why "difficult"?  Turn on FP emulation in the code compiled for the
"nonFPU tasks".  Then, *if* an occasional floating point instruction
is invoked, it just executes "slowly".

> Anyway, does it have to be that hard? Apparently not. For example the
> Renesas RX CPU comes also with single precision FPU, which is much better
> integrated with the CPU and does not have its own register context. Compared
> to the ARM VFP it is a pleasure to work with.

That's like complaining that the 6 course meal is much inferior to just
"grabbing a burger" at a fast-food joint...

Reply by Don Y ●June 12, 20172017-06-12

On 6/12/2017 7:57 AM, Dimiter_Popoff wrote:
> On 12.6.2017 &#1075;. 17:21, StateMachineCOM wrote:
>> I said that "... the FPU integration in the Cortex-M4F/M7 is horrible",
>> because it adds tons of overhead and a lot of headache for the
>> system-level software.
>
>> The problem is that the ARM Vector Floating-Point (VFP) coprocessor
>> comes with a big context of 32 32-bit registers (S0-S31). These
>> registers need to be saved and restored as part of every context switch,
>> just like the CPU registers.
>
> They have to be saved during every context switch if the OS is broken.
> What stops it from saving the FPU context only for tasks which use
> the FPU? It is a single bit of information in the task descriptor.

Exactly.

> Then what stops the programmer from declaring the FPU is in use or
> not within a task? Normally what is done for this purpose is
> - enable FPU for tha task, saving the state the previous FPU_in_use bit
> state,
> - do the FPU work,
> - restore the previous state of the FPU_in_use bit.
> Obviously for tasks which use the FPU intensively this needs not
> be done, one just leaves the FPU_in_use on.
>
>> .. ARM has come up with some hardware
>> optimizations called "lazy stacking and context switching" (see ARM
>> AppNote 298 at
>> http://infocenter.arm.com/help/topic/com.arm.doc.dai0298a/DAI0298A_cortex_m4f_lazy_stacking_and_context_switching.pdf
>>
>> ).
>
> I once had a look at that only to confirm my suspicion that it is just
> piling problems over existing problems. Things like this are done
> by software, or rather software is written such that there is no
> need for that sort of thing.
>
>> Anyway, does it have to be that hard? Apparently not. For example the
>> Renesas RX CPU comes also with single precision FPU, which is much
>> better integrated with the CPU and does not have its own register
>> context. Compared to the ARM VFP it is a pleasure to work with.
>
> Not having a separate FPU register set is a disadvantage, not an
> advantage (now if the FPU is a useless 32 bit one having it at all
> is a disadvantage but this is another matter :-).

Exactly.  As memory is now the bottleneck in most designs, any program
state (e.g., floating point values) that have to be saved OUTSIDE the
CPU (even in normal operation, ignoring context switches) take a hit
in terms of performance.  Esp when you're dealing with "big" data types
(64, 80, 128b).

[Guttag clearly missed the boat on that call wrt the 99K.  Cute/clever
idea but he failed to see the growing disparity in the CPU/memory
"impedance mismatch".  But, context switches were a piece of cake!  :>]

> Nothing prevents software from using what it wants and from saving
> only what has to be saved; often having an entire FPU and saving its
> entire context in addition to that of the integer unit makes things more
> efficient, sometimes a lot more efficient (e.g. implementing a FIR
> on a plain FPU where data dependencies do not allow you to do it
> at a speed more than speed/(pipeline length), you just need many
> registers).

Ideally, register-rich processors would implement a set of internal flags
that would be set each time a register was loaded.  And, a "save state"
opcode that acted similar to the 09's PUSH "vector" saving only those
registers that have been "altered".  Then, letting the programmer
clear this "flag register" as it reloaded the preserved state for
the new task.

[An even more flexible/conditional scheme would allow traps for each
register but the overhead of servicing the trap FOR EACH REGISTER
would far outweigh the cost of unconditionally restoring ALL state]

> The e500 cores from freescale were never popular with me exactly
> because they had the integer and FP register sets in one, just 32
> registers in total. Not a good idea on a load/store machine.

Reply by Dimiter_Popoff ●June 12, 20172017-06-12

On 12.6.2017 &#1075;. 20:10, Don Y wrote:
>> ....
>
> Ideally, register-rich processors would implement a set of internal flags
> that would be set each time a register was loaded.  And, a "save state"
> opcode that acted similar to the 09's PUSH "vector" saving only those
> registers that have been "altered".  Then, letting the programmer
> clear this "flag register" as it reloaded the preserved state for
> the new task.

Hi Don,

I am not so sure how useful this would be, basically software knows
which registers have been used and which not, it is up to it to save
just what needs to be saved and restore it when needed only. It could be
some help but not enough to justify the extra silicon & complexity
I believe.

Dimiter

Reply by Don Y ●June 12, 20172017-06-12

Hi Dimiter,

On 6/12/2017 12:06 PM, Dimiter_Popoff wrote:
> On 12.6.2017 &#1075;. 20:10, Don Y wrote:
>>> ....
>>
>> Ideally, register-rich processors would implement a set of internal flags
>> that would be set each time a register was loaded.  And, a "save state"
>> opcode that acted similar to the 09's PUSH "vector" saving only those
>> registers that have been "altered".  Then, letting the programmer
>> clear this "flag register" as it reloaded the preserved state for
>> the new task.
>
> I am not so sure how useful this would be, basically software knows
> which registers have been used and which not, it is up to it to save
> just what needs to be saved and restore it when needed only.

You're not thinking with HLL's in mind -- where a *tool* creates the software
(how does the tool tell you, concisely, which registers it used?)

And, even if you know which registers were used, you don't know which were
used SINCE THE LAST CONTEXT SWITCH!

> It could be some help but not enough to justify the extra silicon & complexity
> I believe.

The silicon is trivial:  each load of a register (or register in a register
file) forces a corresponding bit to be set in a collection of flags.

Then, a new "PUSH <vector>" opcode simply uses that "collection of flags" as
the <vector>.

If you had to "manually" examine the flags (bits) in that vector and
conditionally save/restore registers, the overhead of doing so wouldn't
offset the cost of just unconditionally performing the save/restore.

In essence, this is what I do with my handling of the FPU context (see other
post).  I assume the FPU registers are NOT used and let the processor
(in the NS32k example) tell me when a floating point instruction is invoked
(the FPU is an optional component in the early NS32k systems; if it is NOT
present, the opcodes are implemented by traps to user-supplied emulation
functions) by invoking a TRAP handler.

Of course, I can use that notification (with or without a hardware FPU)
to alert the OS to the fact that the additional state is being referenced
and save/restore it, as appropriate.

[This only needs to happen at most once for each context switch]

Reply by Dimiter_Popoff ●June 12, 20172017-06-12

On 12.6.2017 &#1075;. 22:19, Don Y wrote:
> Hi Dimiter,
>
> On 6/12/2017 12:06 PM, Dimiter_Popoff wrote:
>> On 12.6.2017 &#1075;. 20:10, Don Y wrote:
>>>> ....
>>>
>>> Ideally, register-rich processors would implement a set of internal
>>> flags
>>> that would be set each time a register was loaded.  And, a "save state"
>>> opcode that acted similar to the 09's PUSH "vector" saving only those
>>> registers that have been "altered".  Then, letting the programmer
>>> clear this "flag register" as it reloaded the preserved state for
>>> the new task.
>>
>> I am not so sure how useful this would be, basically software knows
>> which registers have been used and which not, it is up to it to save
>> just what needs to be saved and restore it when needed only.
>
> You're not thinking with HLL's in mind -- where a *tool* creates the
> software
> (how does the tool tell you, concisely, which registers it used?)

Of course not, although HLL-s should be pretty good at knowing what to
push/pull. But I certainly consider a language broken if it creates a
need for hardware not needed when using other languages.

>
> And, even if you know which registers were used, you don't know which were
> used SINCE THE LAST CONTEXT SWITCH!

Well and if they are not changed what, you have to mark stack frames
somehow to know what exactly did you save so you can restore etc.;
then even if you switch task once per ms the time to save/restore
all registers is negligible (32 longwords get written to cache on a
400 MHz 32 bit processor within 80 ns...).
OTOH if it is just an interrupt handler it will know what registers
it uses and would save just them - and will modify them so there is
no need to know whether they were changed or not.

>
>> It could be some help but not enough to justify the extra silicon &
>> complexity
>> I believe.
>
> The silicon is trivial:  each load of a register (or register in a register
> file) forces a corresponding bit to be set in a collection of flags.
>
> Then, a new "PUSH <vector>" opcode simply uses that "collection of
> flags" as
> the <vector>.

This sounds simple enough indeed, push the register list along with
the list descriptor, then use it to restore the list. But you will
still have to calculate the length you allocated to save the list
and this variable length might complicate the scheduler enough to
cancel the benefits if not worse... hard to say by just hypothesizing.

> If you had to "manually" examine the flags (bits) in that vector and
> conditionally save/restore registers, the overhead of doing so wouldn't
> offset the cost of just unconditionally performing the save/restore.
>
> In essence, this is what I do with my handling of the FPU context (see
> other
> post).  I assume the FPU registers are NOT used and let the processor
> (in the NS32k example) tell me when a floating point instruction is invoked
> (the FPU is an optional component in the early NS32k systems; if it is NOT
> present, the opcodes are implemented by traps to user-supplied emulation
> functions) by invoking a TRAP handler.

I am even more cheeky than that on power for DPS. A task _must_ have
declared it will use the FPU if it will use it, this means it gets its
FPU context preserved, entirely. All 32 FP regs + fpscr etc. thing.
If not, the task just won't know what the state of the FPU is; I can
make it trap (a bit in the MCR) or leave it unknown (not sure which
I do, trying to use the FPU when not explicitly enabled is a programming
error). Quite often when I need the FPU for just a function and do not
know whether the calling task will have the FPU enabled or not at
the beginning the function saves the FPU on/off state, switches
it "on", does its job then restores the former state.

Dimiter

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

Reply by Don Y ●June 12, 20172017-06-12

On 6/12/2017 1:04 PM, Dimiter_Popoff wrote:
> On 12.6.2017 &#1075;. 22:19, Don Y wrote:
>> On 6/12/2017 12:06 PM, Dimiter_Popoff wrote:
>>> On 12.6.2017 &#1075;. 20:10, Don Y wrote:
>>>>> ....
>>>>
>>>> Ideally, register-rich processors would implement a set of internal
>>>> flags
>>>> that would be set each time a register was loaded.  And, a "save state"
>>>> opcode that acted similar to the 09's PUSH "vector" saving only those
>>>> registers that have been "altered".  Then, letting the programmer
>>>> clear this "flag register" as it reloaded the preserved state for
>>>> the new task.
>>>
>>> I am not so sure how useful this would be, basically software knows
>>> which registers have been used and which not, it is up to it to save
>>> just what needs to be saved and restore it when needed only.
>>
>> You're not thinking with HLL's in mind -- where a *tool* creates the
>> software
>> (how does the tool tell you, concisely, which registers it used?)
>
> Of course not, although HLL-s should be pretty good at knowing what to
> push/pull. But I certainly consider a language broken if it creates a
> need for hardware not needed when using other languages.

Yes, but the language doesn't know when a context switch is going to be
triggered that swaps out *all* of the processor state!

context switch

           restore state of task M
----
           do something

           alter register Q
task M
           do something

           alter register P
----
           preserve state of task M

context switch

           restore state of task N
----
           use register Y

           alter register Q
task N
           ...

When preserving the state of task M, *only* the saved state of registers
Q and P needs to be updated as they are the only registers that have been
altered in that portion of task M's execution.  The other 873 registers
haven't been altered so the "preserved copies" of their state needn't
be updated.

The problem with traditional architectures is that there is no easy
way of the "context switch" routine knowing what has been altered
since the state for *that* task was most recently preserved.

The hack I proposed would be "reset" as the last step in the context switch
so that any alterations of the register file's contents would be individually
flagged.  I.e., at the time of the task switch at the end of task M, the
CPU will have "noticed" that ONLY registers Q & P have been updated
(update can be as simple as noting ANY write to the register *or* as clever
as noting any write that alters the specific contents of *that* register
so overwriting a value of '27' with another '27' would NOT flag the register
as "altered").

So, when the context switch went to "preserve state of task M", it would
execute this magical "PUSH <vector>" command and move only those altered
register contents back into the TCB.

[It's not really a "PUSH" as it needs to put each register in a specific
place relative to some "frame" -- i.e., the TCB -- which could be  carefully
arranged to be ToS relative]

>> And, even if you know which registers were used, you don't know which were
>> used SINCE THE LAST CONTEXT SWITCH!
>
> Well and if they are not changed what, you have to mark stack frames
> somehow to know what exactly did you save so you can restore etc.;
> then even if you switch task once per ms the time to save/restore
> all registers is negligible (32 longwords get written to cache on a
> 400 MHz 32 bit processor within 80 ns...).

Yes, but its obvious that the number of cores and the number (and size)
of registers in those cores will only keep increasing as the memory
interface becomes more of a "performance issue".  Cache tries to handwave
around this problem -- more buffering doesn't always come without costs
(e.g., having to flush the cache)

N cores means N times the hammering on the memory interface due to
task switches.

> OTOH if it is just an interrupt handler it will know what registers
> it uses and would save just them - and will modify them so there is
> no need to know whether they were changed or not.

Yes -- in that case, the developer knows what he's "touched" in the
ISR (if written in ASM) and would only bother preserving and restoring
those things that he was about to "taint".

E.g., there are ARM's that have a "FIRQ (Fast IRQ)" capability that essentially
preserves minimal state for extremely low latency IRQ's.

>>> It could be some help but not enough to justify the extra silicon &
>>> complexity
>>> I believe.
>>
>> The silicon is trivial:  each load of a register (or register in a register
>> file) forces a corresponding bit to be set in a collection of flags.
>>
>> Then, a new "PUSH <vector>" opcode simply uses that "collection of
>> flags" as
>> the <vector>.
>
> This sounds simple enough indeed, push the register list along with
> the list descriptor, then use it to restore the list. But you will
> still have to calculate the length you allocated to save the list
> and this variable length might complicate the scheduler enough to
> cancel the benefits if not worse... hard to say by just hypothesizing.

The CPU vendor knows how many registers he has in the core.
So, if he knows that registers 1, 2 and 12 need to be saved
(by this magical "PUSH <vector>" instruction, he could save
r1 to WORKSPACE+1, r2 to WORKSPACE+2 and r12 to WORKSPACE+12
where WORKSPACE is a particular register/address.

I.e., the 99k implemented all registers in main memory.  So, when
you accessed r1, you were really accessing the contents of memory
at WORKSPACE_POINTER+(1*register_size).  Shifting r1 left resulted
in a read of that memory location into the CPU, a left ship in the
ALU and a write *back* to that location of the updated datum.

Imagine, instead, caching all of those operations in an internal
register file (gee, what a novel idea!  :> ) and, only flushing
the contents of ALTERED registers back into main memory (at locations
relative to the WORKSPACE_POINTER) when a task switch was needed.

[I.e., on the 99k, to do a context switch, you just changed one
register -- the workspace pointer -- as the context was already *in*
memory!]

>> If you had to "manually" examine the flags (bits) in that vector and
>> conditionally save/restore registers, the overhead of doing so wouldn't
>> offset the cost of just unconditionally performing the save/restore.
>>
>> In essence, this is what I do with my handling of the FPU context (see
>> other
>> post).  I assume the FPU registers are NOT used and let the processor
>> (in the NS32k example) tell me when a floating point instruction is invoked
>> (the FPU is an optional component in the early NS32k systems; if it is NOT
>> present, the opcodes are implemented by traps to user-supplied emulation
>> functions) by invoking a TRAP handler.
>
> I am even more cheeky than that on power for DPS. A task _must_ have
> declared it will use the FPU if it will use it, this means it gets its
> FPU context preserved, entirely. All 32 FP regs + fpscr etc. thing.
> If not, the task just won't know what the state of the FPU is; I can
> make it trap (a bit in the MCR) or leave it unknown (not sure which
> I do, trying to use the FPU when not explicitly enabled is a programming
> error). Quite often when I need the FPU for just a function and do not
> know whether the calling task will have the FPU enabled or not at
> the beginning the function saves the FPU on/off state, switches
> it "on", does its job then restores the former state.

I did this in my first implementation.  But, that meant having a handler
for those cases where someone screwed up the configuration of the task
and forgot to indicate that it needed the FPU.  Runtime might not see
a task run that portion of its code so you might not see a "crash and burn"
(until thorough testing).

So, if you have to handle the case where the task hasn't been configured
to use the FPU and it *does*, then why not let that handler just "handle
the FPU's usage" without forcing the developer to make that configuration
choice?

You can create "FCB's" (FPU Control Blocks) to store FPU state and reference
them *from* each task's TCB (so they are a part of that task's "state").
This also lets the FPU-handler keep a pointer to the FCB into which the
current FPU's hardware should be (eventually) saved -- *if* that need arises.
If the current task executes a FP opcode before the FPU state has been
preserved, then the old task's FPU state can be saved the TCB for the
current task lets you chase down the FCB for the current task so it's
previous FPU state can be restored before the floating point operation is
allowed to continue (restart).

Its damn near impossible to come up with a winning strategy without an
understanding of the application and the deployment environment.  Where
are the *effective* resource shortages?

Reply by Dimiter_Popoff ●June 12, 20172017-06-12

On 13.6.2017 &#1075;. 00:34, Don Y wrote:
> On 6/12/2017 1:04 PM, Dimiter_Popoff wrote:
> ......
>>
>> I am even more cheeky than that on power for DPS. A task _must_ have
>> declared it will use the FPU if it will use it, this means it gets its
>> FPU context preserved, entirely. All 32 FP regs + fpscr etc. thing.
>> If not, the task just won't know what the state of the FPU is; I can
>> make it trap (a bit in the MCR) or leave it unknown (not sure which
>> I do, trying to use the FPU when not explicitly enabled is a programming
>> error). Quite often when I need the FPU for just a function and do not
>> know whether the calling task will have the FPU enabled or not at
>> the beginning the function saves the FPU on/off state, switches
>> it "on", does its job then restores the former state.
>
> I did this in my first implementation.  But, that meant having a handler
> for those cases where someone screwed up the configuration of the task
> and forgot to indicate that it needed the FPU.  Runtime might not see
> a task run that portion of its code so you might not see a "crash and burn"
> (until thorough testing).
>
> So, if you have to handle the case where the task hasn't been configured
> to use the FPU and it *does*, then why not let that handler just "handle
> the FPU's usage" without forcing the developer to make that configuration
> choice?

Yes, I see no problem with that but not much gain either. Generally I
don't care much about mistakes other people will make, I make enough of
my own to care for. So if something is a programming error the best I
can do is to make it as easily detectable as possible. The way you
have done it is OK of course, it is no longer a programming error, but
I am not sure it would save me much time. Nor would it waste me much
though, so why not.

Dimiter

Reply by Don Y ●June 12, 20172017-06-12

On 6/12/2017 4:44 PM, Dimiter_Popoff wrote:
> On 13.6.2017 &#1075;. 00:34, Don Y wrote:
>> On 6/12/2017 1:04 PM, Dimiter_Popoff wrote:
>> ......
>>>
>>> I am even more cheeky than that on power for DPS. A task _must_ have
>>> declared it will use the FPU if it will use it, this means it gets its
>>> FPU context preserved, entirely. All 32 FP regs + fpscr etc. thing.
>>> If not, the task just won't know what the state of the FPU is; I can
>>> make it trap (a bit in the MCR) or leave it unknown (not sure which
>>> I do, trying to use the FPU when not explicitly enabled is a programming
>>> error). Quite often when I need the FPU for just a function and do not
>>> know whether the calling task will have the FPU enabled or not at
>>> the beginning the function saves the FPU on/off state, switches
>>> it "on", does its job then restores the former state.
>>
>> I did this in my first implementation.  But, that meant having a handler
>> for those cases where someone screwed up the configuration of the task
>> and forgot to indicate that it needed the FPU.  Runtime might not see
>> a task run that portion of its code so you might not see a "crash and burn"
>> (until thorough testing).
>>
>> So, if you have to handle the case where the task hasn't been configured
>> to use the FPU and it *does*, then why not let that handler just "handle
>> the FPU's usage" without forcing the developer to make that configuration
>> choice?
>
> Yes, I see no problem with that but not much gain either. Generally I
> don't care much about mistakes other people will make, I make enough of
> my own to care for. So if something is a programming error the best I
> can do is to make it as easily detectable as possible. The way you
> have done it is OK of course, it is no longer a programming error, but
> I am not sure it would save me much time. Nor would it waste me much
> though, so why not.

The gain comes by decoupling the need to configure the "uses_FPU"
flag for each task as well as eliminating the problem of a developer
failing to *correctly* define that flag (i.e., if he changes the
switches used with the compiler to use FP opcodes in the generated
code he won't shoot himself in the foot).

This is especially true when the task can execute code that the
developer didn't write/compile.  Does he know under what circumstances
FP opcodes are called into play?

I'm finding working in "resource richer" environments results in
very different approaches to software/system design!  :<

Previous 1 234 5 6 Next

ARM Cortex Mx vs the rest of the gang

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group