From cooperative to preemptive scheduler: a real example| page 3

Reply by Stef ●January 15, 20202020-01-15

On 2020-01-15 pozz wrote in comp.arch.embedded:
>
> I don't agree.
> Most of the time I can guess what is the task (or a few tasks) that 
> consumes more stack. So the effort to estimate stack usage is limited.
>
> In preemptive scheduler, you would need to multiplicate your effort to 
> estimate stack usage for every single task to avoid wasting precious memory.
>
>
> Anyway, suppose we are very smart to calculate stack usage of each task:
> - task1 needs 10kB
> - task2 needs 10kB
>
> With preemptive scheduler you need to reserve 20kB for the stack, in 
> superloop you can reserve only 10kB.

But in the superloop you need to declare a lot of stuff static to
preserve it for the next loop, so it can't be on the stack like in the
preemptive case. So you will more likely need 1kB of stack and 2 * 10kB 
of statics in the superloop case.


-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

Been Transferred Lately?

Reply by pozz ●January 15, 20202020-01-15

Il 15/01/2020 11:30, Stef ha scritto:
> On 2020-01-15 pozz wrote in comp.arch.embedded:
>>
>> I don't agree.
>> Most of the time I can guess what is the task (or a few tasks) that
>> consumes more stack. So the effort to estimate stack usage is limited.
>>
>> In preemptive scheduler, you would need to multiplicate your effort to
>> estimate stack usage for every single task to avoid wasting precious memory.
>>
>>
>> Anyway, suppose we are very smart to calculate stack usage of each task:
>> - task1 needs 10kB
>> - task2 needs 10kB
>>
>> With preemptive scheduler you need to reserve 20kB for the stack, in
>> superloop you can reserve only 10kB.
> 
> But in the superloop you need to declare a lot of stuff static to
> preserve it for the next loop, so it can't be on the stack like in the
> preemptive case. So you will more likely need 1kB of stack and 2 * 10kB
> of statics in the superloop case.

This is right if the stack-consuming function is slow and should be 
split in steps to avoid having a long-running task (i.e., convert it to 
state-machine).

However I could have a fast stack-consuming function that shouldn't be 
split.

while(1) {
   task1();  // calls StackConsumingFunc()
   task2();  // calls StackConsumingFunc()
}

If StackConsumingFunc() needs 10kB stack size, what is the stack 
allocation for superloop and preemptive scheduler?

I think I can allocate only 10kB stack if superloop is used, but I need 
20kB stack if preemptive scheduler is used.

Reply by Niklas Holsti ●January 15, 20202020-01-15

On 2020-01-10 1:42, pozz wrote:
> Il 09/01/2020 00:39, Niklas Holsti ha scritto:
>> On 2020-01-08 0:51, pozz wrote:
>>> Il 07/01/2020 15:51, Don Y ha scritto:
>>>> On 1/7/2020 2:11 AM, pozz wrote:
>>>>> Il 07/01/2020 03:37, Don Y ha scritto:
>>>>>> On 1/6/2020 6:08 PM, pozz wrote:
> 
>> What is your concern with that? You only need one semaphore to provide 
>> mutual exclusion between two tasks, not a separate semaphore for each 
>> shared variable.
>>
>> Are you worried about the processor time for the semaphore operations? 
>> or the code clutter?
> 
> Code clutter. I am worried to forget to add semaphore management to a 
> newly added function without noticing the problem (because those bugs 
> are very rare to see).

You can hide the variables (make them "static" in their own .c file) so 
that they can only be accessed through the mutexed functions. Then you 
can't forget.

(Ada has a nice feature called the "protected object" that can do that 
without needing any explicit semaphore or mutex calls.)

>> You might try coding an FFT or Quicksort or other complex algorithm as 
>> a state machine, with a variable overall length of the input and 
>> output arrays, and then compare the "clutter" of those state machines 
>> with the clutter of pre-emptive coding.
> 
> Those are right examples, but I think very rare in typical control systems.

Depends on your definition of "typical". I recently completed a project 
which required both FFT and median-of-vector, using a median algorithm 
very similar to quick-sort. (To Be Honest: the target had an FFT HW 
unit, so that part did not have to be implemented in SW.)

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by Niklas Holsti ●January 15, 20202020-01-15

On 2020-01-10 1:32, pozz wrote:
> Il 09/01/2020 13:03, Niklas Holsti ha scritto:
>> On 2020-01-09 13:19, pozz wrote:
>>> Il 08/01/2020 23:54, Niklas Holsti ha scritto:
>>>> On 2020-01-08 1:02, pozz wrote:
>>>>> Il 07/01/2020 08:38, Niklas Holsti ha scritto:
>>
>> &#4294967295;&#4294967295;&#4294967295; [snip]
>>
>>>>>> Let's assume these requirements and properties of the environment:
>>>>>>
>>>>>> A. The function "serial_rx" polls the one-character reception 
>>>>>> buffer of the serial line once, and returns the received 
>>>>>> character, if any, and EOF otherwise. It must be called at least 
>>>>>> as often as characters arrive (that is, depending on baud rate) to 
>>>>>> avoid overrun and loss of some characters.
>>>>
>>>> You asked about possible advantages of pre-emption; I made my 
>>>> assumptions, above, such that the (incomplete) example you gave 
>>>> shows this advantage, under these assumptions (which could be true 
>>>> for other, otherwise similar example applications).
>>>>
>>>>> No, serial driver works in interrupt mode and already use a FIFO 
>>>>> buffer, sufficiently big. serial_rx() pop a single element from the 
>>>>> FIFO, if any.
>>>>
>>>> Ah, then your *system* is intrinsically pre-emptive (the interrupts 
>>>> pre-empt the tasks), even if the *code you showed* does not show 
>>>> this pre-emption.
>>>
>>> Ah yes, interrupts are preemptive and I use them a lot, but they are 
>>> confined in their works, they are very lightweight and fast.
>>
>> That's a design decision. Some systems do most of their work in 
>> interrupt handlers, and use "background" processing only for some 
>> non-critical housekeeping tasks.
>>
> 
> Yes, I imagine there are a pletora of possibilites. Anyway I thought 
> there was two typical approaches: superloop that continuously calls 
> non-blocking functions (an example of cooperative scheduler) and a full 
> preemptive scheduler (most of the time a full RTOS).

The approach where all real-time "tasks" are interrupts is called 
"foreground-background", I believe. It was popular in the microcomputer 
era, before real-time kernels became popular -- it gave most of the same 
benefits (pre-emption) but scheduled by HW (interrupt priorities).

> Interrupts are always present, even in superloop.

Again a design choice. Some critical systems don't want *any* 
pre-emption, even by interrupts -- simplicity reigns. So they poll, and 
accept the performance hit. For example, some colleagues of mine 
recently completed the "recovery SW" for ESA's ExoMars rover, which 
takes over if the "real" SW crashes and is intended to be super-safe and 
just support problem analysis and recovery. Using any kind of real-time 
kernel was forbidden -- only sequential coding was allowed. Any new, 
nice SW "feature" that my colleagues suggested, for example to speed up 
communication with Earth, was denied by the customer because it would 
complicate the SW a little bit -- perhaps one more conditional 
statement. No-no.

> Maybe I am wrong, but I implement ISRs with great care and attention,

Of course you are right to use great care and attention, and especially 
in ISRs.

> but they are very small and limited.

Many tasks in pre-emptive systems are also small and limited -- for 
example, one of my designs has a task whose main loop just takes a 
data-packet from an input queue, computes its CRC and appends it to the 
packet, and puts the updated packet in an output queue. A handful of 
source lines.

I would say that having a pre-emptive system relieves the designer of 
the great care and attention needed to slice computations into 
(state-mnachine-driven) "tasks" that execute their computations little 
by little.

> Normally the biggest part of the firmware is related to the 
> application/tasks, not interrupts.

Except in the aforementioned foreground-background designs.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by Niklas Holsti ●January 15, 20202020-01-15

On 2020-01-14 14:27, pozz wrote:
> Another trouble I found with a preemptive RTOS is how to size the stack 
> of tasks.
> 
> In my superloop cooperative scheduler:
> 
>  &#4294967295; while(1) {
>  &#4294967295;&#4294967295;&#4294967295; task1();&#4294967295; // Non-blocking fast state-machined task
>  &#4294967295;&#4294967295;&#4294967295; task2();&#4294967295; // Non-blocking fast state-machined task
>  &#4294967295; }
> 
> the stack is assigned to all the unused memory available in the system 
> (that is all the memory minus variables and heap size, if you use it).
> 
> If two tasks use a stack-intensive function (such as a printf-like 
> function), this doesn't increase the overall stack requirement.
> For example, if the stack-intensive function needs 2kB of stack, the 
> global stack can be 2kB (more other stack needed by tasks for other 
> operations).
> 
> With a preemptive scheduler, tasks can be interrupted in any point, even 
> during printf-like function. So *each* task needs a stack of 2kB, 
> reaching a global stack requirement of 4kB.

That is quite true, if you allow pre-emption in the printf-like 
function. Also, depending on how your interrupt handlers are designed, 
you may have to allocate interrupt-handler space on each of the per-task 
stacks.

However, any computation that needs a lot of stack also tends to use a 
lot of time, so you might have to slice it up into a sequence of states 
(state machine approach) and then the data must be statically allocated 
rather than stack-allocated, as Stef pointed out, and moreover each task 
that uses that computation must have its own copy of the data.

> Another issue with preemptive approach is that you should be smart 
> enough to size N stacks (where N is the number of tasks).
> With the superloop architecture above, you should size only *one* global 
> stack, that can be calculated over one task, the one that needs more stack.

Having a single stack rather than per-task stacks is easier if one has 
to program the high-water-mark method for measuring stack usage 
separately and manually for each stack.

Static analysis of stack usage bounds, either by a dedicated tool from 
the machine code or as an extra function of the compiler and linker, is 
really so simple that I see little excuse for emmbedded-SW programming 
environments not to offer this analysis as a matter of course. And then 
per-task stacks are just as easy as one stack.

> Does this make sense?

Yes, the main-loop approach can use less stack than the pre-emptive 
approach.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by Hans-Bernhard Bröker ●January 15, 20202020-01-15

Am 15.01.2020 um 09:36 schrieb pozz:
> Il 14/01/2020 20:10, Hans-Bernhard Br&#4294967295;ker ha scritto:
>> Am 14.01.2020 um 13:27 schrieb pozz:
>>> Another trouble I found with a preemptive RTOS is how to size the stack
>>> of tasks.
>>
>> That trouble is not at all particular to preemptive scheduling.
>>
>>> In my superloop cooperative scheduler:
>>
>>> the stack is assigned to all the unused memory available in the system
>>> (that is all the memory minus variables and heap size, if you use it).
>>
>> And how do you know that that's sufficient?
> 
> I'm not too smart to estimate the stack usage of a function (I know the
> compiler can produce some useful info about this, but you should add
> interrupt stack usage and so on), so my approach is only tests.

As the saying goes: testing proves diddly-squat.  You'll hardly ever
know whether your test cases came anywhere near exercising the code path
of worst stack usage.

>> The difficulty of (reliably) computing stack usage is the same,
>> regardless what tasking concept is used.
> 
> I don't agree.

You disagree to a different statement than the one I made.  I was
talking about this job being difficult. You object based on it being a
lot to work.

> Most of the time I can guess what is the task (or a few tasks) that
> consumes more stack. 

Guessing is about the only approach to this that is even less reliable
than testing.

> In preemptive scheduler, you would need to multiplicate your effort to
> estimate stack usage for every single task to avoid wasting precious
> memory.

The effort doesn't really multiply, because it'll all be handled by
loops in the code anyway.  Doing the same thing multiple times is what
CPUs are supposed to be good at, after all.

Reply by Hans-Bernhard Bröker ●January 15, 20202020-01-15

Am 15.01.2020 um 20:26 schrieb Niklas Holsti:

> Having a single stack rather than per-task stacks is easier if one has
> to program the high-water-mark method for measuring stack usage
> separately and manually for each stack.

No.  It's exactly as easy.  You just do the same thing for each stack as
you would for a single one.

> Static analysis of stack usage bounds, either by a dedicated tool from
> the machine code or as an extra function of the compiler and linker, is
> really so simple

Objection.  It is simple only in a limited class of use cases.  As soon
as there is any use of function pointers or the merest hint of possible
recursion, static stack analysis becomes equivalent to Turing's Halting
Problem.  Problems don't really get much harder than that one.

Reply by Niklas Holsti ●January 15, 20202020-01-15

On 2020-01-15 23:12, Hans-Bernhard Br&#4294967295;ker wrote:
> Am 15.01.2020 um 20:26 schrieb Niklas Holsti:
> 
>> Having a single stack rather than per-task stacks is easier if one has
>> to program the high-water-mark method for measuring stack usage
>> separately and manually for each stack.
> 
> No.  It's exactly as easy.  You just do the same thing for each stack as
> you would for a single one.

In terms of intrinsic difficulty, yes. In terms of the amount of manual 
work, no.

>> Static analysis of stack usage bounds, either by a dedicated tool from
>> the machine code or as an extra function of the compiler and linker, is
>> really so simple
> 
> Objection.  It is simple only in a limited class of use cases.  As soon
> as there is any use of function pointers or the merest hint of possible
> recursion, static stack analysis becomes equivalent to Turing's Halting
> Problem.  Problems don't really get much harder than that one.

In theory, yes. In practice, no. The Halting Problem is impossible to 
solve for *all* programs. It doesn't mean that solutions are impossible 
for a *large* class of *common* programs.

Most programs are designed cleanly enough to make stack-usage analysis 
possible. An exact value is not needed; an upper bound is enough, if it 
isn't very pessimistic. Assuming that the type system is not broken, a 
whole-program analysis can figure out all possible target functions of 
any function pointer.

For recursion, I agree that some manual support (provide a bound on the 
depth of recursion) is usually needed. However, embedded programs rarely 
use recursion in any non-trivial way. And most functions have 
stack-frames of static size; it is rare to allocate dynamic amounts of 
stack.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by pozz ●January 16, 20202020-01-16

Il 15/01/2020 17:53, Niklas Holsti ha scritto:
> On 2020-01-10 1:32, pozz wrote:
>> Il 09/01/2020 13:03, Niklas Holsti ha scritto:
>>> On 2020-01-09 13:19, pozz wrote:
>>>> Il 08/01/2020 23:54, Niklas Holsti ha scritto:
>>>>> On 2020-01-08 1:02, pozz wrote:
>>>>>> Il 07/01/2020 08:38, Niklas Holsti ha scritto:
>>>
>>> &#4294967295;&#4294967295;&#4294967295; [snip]
>>>
>>>>>>> Let's assume these requirements and properties of the environment:
>>>>>>>
>>>>>>> A. The function "serial_rx" polls the one-character reception 
>>>>>>> buffer of the serial line once, and returns the received 
>>>>>>> character, if any, and EOF otherwise. It must be called at least 
>>>>>>> as often as characters arrive (that is, depending on baud rate) 
>>>>>>> to avoid overrun and loss of some characters.
>>>>>
>>>>> You asked about possible advantages of pre-emption; I made my 
>>>>> assumptions, above, such that the (incomplete) example you gave 
>>>>> shows this advantage, under these assumptions (which could be true 
>>>>> for other, otherwise similar example applications).
>>>>>
>>>>>> No, serial driver works in interrupt mode and already use a FIFO 
>>>>>> buffer, sufficiently big. serial_rx() pop a single element from 
>>>>>> the FIFO, if any.
>>>>>
>>>>> Ah, then your *system* is intrinsically pre-emptive (the interrupts 
>>>>> pre-empt the tasks), even if the *code you showed* does not show 
>>>>> this pre-emption.
>>>>
>>>> Ah yes, interrupts are preemptive and I use them a lot, but they are 
>>>> confined in their works, they are very lightweight and fast.
>>>
>>> That's a design decision. Some systems do most of their work in 
>>> interrupt handlers, and use "background" processing only for some 
>>> non-critical housekeeping tasks.
>>>
>>
>> Yes, I imagine there are a pletora of possibilites. Anyway I thought 
>> there was two typical approaches: superloop that continuously calls 
>> non-blocking functions (an example of cooperative scheduler) and a 
>> full preemptive scheduler (most of the time a full RTOS).
> 
> The approach where all real-time "tasks" are interrupts is called 
> "foreground-background", I believe. It was popular in the microcomputer 
> era, before real-time kernels became popular -- it gave most of the same 
> benefits (pre-emption) but scheduled by HW (interrupt priorities).

Yes "foreground-background" is another name, but it says nothing about 
the blocking/non-blocking of background (non interrupts) tasks.
If you have blocking (long running) tasks, you will not have a reactive 
system (only interrupts will be reactive).
If the tasks are coded as non-blocking (maybe with the help of 
state-machines), you will be able to have a reactive system even for the 
non-interrupts tasks.

>> Interrupts are always present, even in superloop.
> 
> Again a design choice. Some critical systems don't want *any* 
> pre-emption, even by interrupts -- simplicity reigns. So they poll, and 
> accept the performance hit. For example, some colleagues of mine 
> recently completed the "recovery SW" for ESA's ExoMars rover, which 
> takes over if the "real" SW crashes and is intended to be super-safe and 
> just support problem analysis and recovery. Using any kind of real-time 
> kernel was forbidden -- only sequential coding was allowed. Any new, 
> nice SW "feature" that my colleagues suggested, for example to speed up 
> communication with Earth, was denied by the customer because it would 
> complicate the SW a little bit -- perhaps one more conditional 
> statement. No-no.

I don't think this is an example that can be considered "typical" and I 
know there are "extreme cases" where a singular approach is needed.


>> Maybe I am wrong, but I implement ISRs with great care and attention,
> 
> Of course you are right to use great care and attention, and especially 
> in ISRs.
> 
>> but they are very small and limited.
> 
> Many tasks in pre-emptive systems are also small and limited -- for 
> example, one of my designs has a task whose main loop just takes a 
> data-packet from an input queue, computes its CRC and appends it to the 
> packet, and puts the updated packet in an output queue. A handful of 
> source lines.
> 
> I would say that having a pre-emptive system relieves the designer of 
> the great care and attention needed to slice computations into 
> (state-mnachine-driven) "tasks" that execute their computations little 
> by little.

I don't think the CRC computation takes too long to split it in smaller 
and faster steps (state-machine). I think you could write that task 
without a state-machine in superloop architecture:

void task_CRC(void) {
   uint8_t *packet;
   size_t packet_len;
   int err = queue_pop(&packet, &packet_len);
   if (!err) {
     packet[packet_len] = crc(packet, packet_len);
     packet_len++;
     queue_push(packet, packet_len);
   }
}

And here you don't have to think about multithreading issues (race 
condition, sharing resources, ...).

>> Normally the biggest part of the firmware is related to the 
>> application/tasks, not interrupts.
> 
> Except in the aforementioned foreground-background designs.

Even in foreground-background ISRs are limited to critical real-time events.

Reply by ●January 16, 20202020-01-16

On Thu, 16 Jan 2020 14:48:34 +0100, pozz <pozzugno@gmail.com> wrote:

>
>I don't think the CRC computation takes too long to split it in smaller 
>and faster steps (state-machine). I think you could write that task 
>without a state-machine in superloop architecture:
>
>void task_CRC(void) {
>   uint8_t *packet;
>   size_t packet_len;
>   int err = queue_pop(&packet, &packet_len);
>   if (!err) {
>     packet[packet_len] = crc(packet, packet_len);
>     packet_len++;
>     queue_push(packet, packet_len);
>   }
>}

What is the problem ?

In the character receiving ISR calculate the partial CRC/checksum for
each received character. At the end of the message, you do need to
process the last received character to check if the CRC matches.

In fact I have done a Modbus RTU line monitor for a hardware without
accurate timing information. 

After receiving a byte calculate a partial CRC and compare it with the
last two bytes. If the check matches, Assume it was he end of frame.
Then perform a sanity check that the frame is longer than the minimum
length and that it makes sense according to the protocol
specification. If these additional tests fail, assume that this was a
premature CRC. Continue reading bytes until a new CRC match is
obtained. Perform the same sanity test.

If the received frame is longer than maximum valid frame, move the
start position one byte ahead and perform a long CRC calculation.

There are surprisingly lot of false CRC triggerings, but
resynchronisation is usually quite quickly obtained.