EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Common name for a "Task Loop"

Started by Tim Wescott June 24, 2016
>but the writeback is an other issue (write through or write back) how to handle it.
Is writeback necessary? Does reading RAM clear it, with the controller doing a rewrite? Or was that just core? mac the na�f
I've changed the Subject according to the veering of the discussion.

On 16-07-12 11:31 , upsidedown@downunder.com wrote:
> On Mon, 11 Jul 2016 16:11:27 +0300, Niklas Holsti > <niklas.holsti@tidorum.invalid> wrote: > >> On 16-07-11 06:51 , upsidedown@downunder.com wrote:
[snip]
>>> What exactly is your "memory scrubber" actually doing ? >>> >>> * Read/ECC correction/writeback a memory location to avoid cosmic ray >>> problems ? >> >> Yes. I thought this was the (only) common meaning of "memory scrubbing", >> but perhaps I was wrong. > > That had earlier been my impression too, but googling, quite a lot of > strange hits emerged :-). > > IIRC, when the Hubble (HST) had problems flying through the SAA > radiation zone, it was called memory "flushing". > > Any earlier space usage of scrubbing/flushing ?
I've been working with ESA projects since 1995, and "scrubbing" is the only word I've seen used.
> Doing the scrubbing in the null task is easy, as long as the hardware > supports uninterruptable memory to memory instructions, so no need to > disable interrupts an hence task switching. > > Things get more problematic with RISC style load/store architectures.
In this application, the processor is a SPARC v8 (a LEON2, to be exact). It is load/store. The EDAC corrects loaded data on the fly, but does not write the corrected data back. The scrubber will read every word in a block of memory (non-atomically), then check the EDAC control registers to find out if the EDAC corrected an error, and which was the first corrected address. If an error was corrected, the scrubber atomically loads and stores the first corrected address, then continues with a new block of memory from the next address.
> Also caches complicates the situation. A single byte read is > sufficient to load a full cache line such as 32 bytes, but the > writeback is an other issue (write through or write back) how to > handle it.
The LEON2 D-cache is unusual in that a cache miss loads only the missing word, not a whole cache line. This application will probably keep the D-cache disabled. The RAM is pretty fast (2 wait states) and it is doubtful (but to be measured) if the D-cache will really speed things up or mainly increase jitter. The kernel can manage cache-state (enable/disable) as part of the task context, so if necesssary we can disable the D-cache just for the scrubber task but enable it for other tasks. There is also the option of making the scrubber load data using a special form of the load instruction that bypasses the D-cache and always reads from RAM. In addition to the cache, DMA input also complicates scrubbing because even an interrupt-atomic load-store pair is non-atomic with respect to DMA. One common work-around, which we will also use in this application, is to use all DMA buffers in a round-robin fashion and show by analysis that no data stays long enough in RAM, between a store and a load, to risk accumulating more than a correctable one-bit error. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Mon, 11 Jul 2016 16:11:27 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

>On 16-07-11 06:51 , upsidedown@downunder.com wrote: >> On Sun, 10 Jul 2016 03:29:01 -0500, Les Cargill >> <lcargill99@comcast.com> wrote: >> >>>>> It could be that the instrumentation necessary for the memory scrubber >>>>> to figure out it needs to swap, costs too much. And I understand that. >>>>> >>>>> And I'm not talking about randomly inserting "sleep()" calls as if >>>>> you were balancing a wheel :) - I mean constructing a "memory >>>>> scrubber" object that understands its CPU budget. Running the >>>>> memory scrubber on constraints based on estimated CPU utilization. >>>> >>>> My current memory scubber object understands its CPU budget: it is >>>> allowed (and even required) to use all the CPU time available at its >>>> priority level. >>>> >>> >>> Ah, then that mainly follows what I mean then. I've just seen lots of >>> things where there was no concept of CPU budget and then people are >>> surprised to find that one was needed. You'd have a cluster >>> of events occur within some horizon and things would go wonky. >> >> What exactly is your "memory scrubber" actually doing ? >> >> * Read/ECC correction/writeback a memory location to avoid cosmic ray >> problems ? > >Yes. I thought this was the (only) common meaning of "memory scrubbing", >but perhaps I was wrong. > >> These could all be done in the background in the null task with no >> knowledge of the CPU budget. > >Rest easy, my memory scrubber *is* the background task (as I said in my >original mention of the scrubber, a number of posts ago) in the sense >that it has the lowest priority of all tasks and never blocks.
I know of a system which did not have hardware scrubbing, and it was (reasonably) thought to do software scrubbing in the idle loop. The only problem was that the original implementation did not anticipate the effect of an uncorrectable error being hit by the scrubber, the resulting machine check panicked the kernel...
On Tue, 12 Jul 2016 15:08:39 -0000 (UTC), mac <acolvin@efunct.com>
wrote:

>>but the writeback is an other issue (write through or write back) how to handle it. > >Is writeback necessary? Does reading RAM clear it, with the controller >doing a rewrite? Or was that just core?
DRAM cell read-out is destructive - the charge (or lack-thereof) in the capacitor makes a blip (or not) on the sense line. It's not the controller doing the rewrite, since the entire row is read at once (and all the bits in the row are destroyed), the rewrite is done on the memory chip. Refresh performed by the DRAM controller is just an abbreviated read cycle (the chip reads and rewrites the selected row, but does not return any data so does not need the column address). But ECC detection/correction is almost always done in the DRAM controller (or something nearby), and usually does force a rewrite of the corrected word. Writeback in this context refers to caches not immediately writing modified data back to RAM. To do scrubbing in software, you need to bypass the caches (both for proper operation, and to avoid flushing useful data from your caches), and generally you'd like to do it with something approximating real (as opposed to virtual) addresses. You also need to make sure scrubbing happens at some minimum rate - if there no idle time, there won't be any scrubbing if that's the only time you do it. There's a lot to be said for leaving it to the DRAM controller...
In article <_q-dnWEQhONI4PDKnZ2dnUU7-YOdnZ2d@giganews.com>, 
seemywebsite@myfooter.really says...
> So, this is the third time in a month or so that I've needed to tell > someone "use a task loop" -- but I'm not sure if I can say "just Google > it". > > So: When I say "task loop" I mean that I'm _not_ using an RTOS, but > rather that I'm doing some small thing in a small processor, and > somewhere in my code there's a loop that goes: > > for (;;) > { > if (task_1_ready) > { > task_1_update(); > } > else if (task_2_ready) > { > task_2_update(); > } > else if (task_3_ready) > // et cetera > } > > The "task_n_ready" variables are set offstage (in an ISR, or by one of > the task_n_update functions) and reset within the tasks. > > So -- is there a common Google-able term for this? > >
I always called it a "Round Robin" type, but I could be wrong.
WangoTango wrote:

> In article <_q-dnWEQhONI4PDKnZ2dnUU7-YOdnZ2d@giganews.com>, > seemywebsite@myfooter.really says... >> So, this is the third time in a month or so that I've needed to tell >> someone "use a task loop" -- but I'm not sure if I can say "just Google >> it". >> >> So: When I say "task loop" I mean that I'm _not_ using an RTOS, but >> rather that I'm doing some small thing in a small processor, and >> somewhere in my code there's a loop that goes: >> >> for (;;) >> { >> if (task_1_ready) >> { >> task_1_update(); >> } >> else if (task_2_ready) >> { >> task_2_update(); >> } >> else if (task_3_ready) >> // et cetera >> } >> >> The "task_n_ready" variables are set offstage (in an ISR, or by one of >> the task_n_update functions) and reset within the tasks. >> >> So -- is there a common Google-able term for this? >> >> > I always called it a "Round Robin" type, but I could be wrong.
That's one I'll actually disagree with. Round robin to me implies a lack of prioritization, whereas this is explicitly priority based. It's the difference between: while(1) { for {task=tasklisthead; task->fptr != NULL; task++) { if (task->readyflag) { task->fptr(); /* The break makes it priority, without it is round robin. */ break; } } } -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.
On 7/10/2016 7:55 AM, Niklas Holsti wrote:
> On 16-07-10 12:41 , Don Y wrote: >> On 7/9/2016 5:46 AM, Niklas Holsti wrote: >>> On 16-07-09 00:56 , Les Cargill wrote: >>>> Niklas Holsti wrote: >>>>> [snips] >>>>> One example of interactions I find difficult is a shared I/O channel or >>>>> bus that must be used by various tasks, for various purposes, with most >>>>> transmissions being sporadic and such that the sending task must wait >>>>> for and check a response transmission. >>>> >>>> So the sender sends asynchronously then blocks on a receive >>>> ( presumably with a timeout ) , with other tasks/ISRs handling >>>> the details. You may even have separate send and receive loops, >>>> with state indicating the timeout for each receive. >>> >>> I have no problem *implementing* such things, my problem is >>> *analysing* the timing to compute worst-case task response times >>> under various load scenarios. This computation must also consider >>> the possible latencies of response-generation at the remote end of >>> the channel. >> >> It's no different than a uniprocessor implementation -- *if* you >> "migrate" the "scheduling criteria" (in your case, "priorities") >> *with* the communication... THROUGHOUT the system! > > Sure it is different from a uniprocessor, as there are now multiple processors. > I haven't yet used any multi-core/multi-processor schedulability-analysis > methods, but I know they exist.
I think you will be very disappointed when you start "looking under the hood" of most of those methods. Esp if you are looking for a simple "relation" (i.e., if X then PASS else FAIL) to summarize your task *set*! For example, "if the sum of the utilizations is less than..." Virtually all of the algorithms place preconditions on the task sets *and* require concrete information on their release rates, utilization factors, etc. On top of that, you will find most are designed for SMP environments; adding transport delays complicates the analysis. (for starters, look for BAK, BCL, GFB, Dhall, etc.) I approach systems differently: treat everything as possible to fail. And, ensure you can respond rationally in that case! This lets me reduce the number of things that I'd really like NOT to fail to something more manageable... something that I can analyze manually to determine the conditions under which it can be expected to perform *best* in the system. In my current system, tasks can "migrate" (I need a better term as that has been overloaded, historically) between physical processors. It's notion of "importance" (avoiding the term "priority") shouldn't change just because it has "physically" moved to a different CPU. It's *relative* importance when competing for resources on that new node may (almost certainly!) change. But, the criteria by which that node's scheduling decisions are made should still be consistent with the criteria that was used "in its original home". [N.B. The scheduling ALGORITHM is free to change!] Even in those cases where the task is bound to a particular physical processor, the actions performed by other tasks (local and remote) ON ITS BEHALF must be scheduled with its schedulability criteria. Otherwise, asking any other task (service) to perform an action effectively alters the performance of the current task. [I.e., if a task calls a subroutine/function, that code executes with it's scheduling criteria, right? Why should a *process* that implements that subroutine/function -- i.e., IPC/RPC -- not behave the same way? Does the task, instead, need to be aware of which function invocations are backed by IPC/RPC vs. "simple" function calls?] If, instead, a task's "importance" ("priority", in your case) magically changes because it has crossed a subsystem (i.e., work is now being performed in the network stack on its behalf) or node (work being performed on anotehr physical processor), then the complexity of the scheduling analysis grows exponentially. I.e., even tiny tak sets will prove to be difficult to analyze with any certainty. In a very real sense, I let the actual tasks' execution do that analysis FOR me: let them run under a scheduling algorithm AFTER DEPLOYMENT and trap each deadline that is missed -- with a suitably constructed handler to "fix up" the system (and the task's role in it) in light of this MISSED deadline (i.e., THIS INSTANCE of THIS TASKSET was NOT schedulable!). For example, there are ~dozen tasks in one of my speech synthesizers. I *know* the hardware is seriously underprovisioned -- despite being "PC class" dedicated JUST to that one task! [As "proof", a speech channel is a few hundred BITS per second. Yet, I can deliver thousands of BYTES of data, per second, to the synthesizer to "speak". Endlessly. Build ANY finite buffer and it WILL overflow.] The algorithm knows it has to rely on the benevolence of anything feeding it. Yet, also knows that it can't *gag* just because the upstream data source "misbehaved". So, it uses some inherent knowledge of grammar to ensure what it *can* "speak" makes some sort of sense. It doesn't just process whatever "word" happens to be available at its input -- without concern for any intervening words that may have been elided! Instead, it deals with phrases and sentences (which, of course, have no *definite* lengths!) and tries to present them, intact. With an indication of any elided portions (i.e., "BEEP -- something is missing at this point in the dialog"). Similarly, the constituent synthesis tasks know they can be preempted or overridden by more "important" tasks. So, they do their work in manageable pieces -- words and phrases at a time. So, the actual waveforms presented to the user don't represent partial words. Or, partial phrases. Much better for the algorithm to speak in little phrases than to speak in partial words. (i.e., each time the algorithm had to place such an "arbitrary" constraint on its output is a sign of a missed deadline; the "next word" wasn't ready in time to be spoken when it should have been) [Humans don't have very good "aural memories". You can watch choppy video and still track the "action". But, listening to choppy speech is virtually unintelligible]
>> I.e., the (local) network stack doesn't just treat enqueued >> outgoing packets in FIFO order but, rather, "assumes" the >> priority of each "client" associated with each packet. So, >> the packet associated with the "highest priority" (using >> your scheduling lexicon) gets placed on the wire, "first". > > In my current application, there are separate transmission queues according to > a (rough) priority order of the kind of data, command, or response being > transmitted. This will certainly help to limit the queueing delays for > high-priority traffic.
But, there's no way of predicting (on paper) the delay incurred by any message knowing the "priority" of its issuing task! You've introduced an elastic delay by allowing the workload of one particular task (with one set of known scheduling criteria) to be intermixed with the workload of another -- in a manner that no longer reflects the "ranking" that the scheduling criteria had been applying to the tasks, up to that point.
>>>> It might be worth adding references to a high-res free-running timer >>>> and accumulating data. If you then wish to construct a latency model, >>>> you can then have data to see if your model makes any sense. >>> >>> I don't want to rely on such measurements, which always depend on the >>> scenarios and test cases used. I want to have a method for computing >>> response times from the design, and proving by analysis that deadlines >>> are met in all possible scenarios. >> >> IME, you can only do that in small/trivial systems with predictable >> execution environments. > > I don't disagree. But this happens to be my situation. My applications may be > "trivial" in this sense, but they provide my bread & butter (or margarine, as > the case may be), so they are non-trivial to me.
"Trivial" isn't meant as a denigration/dismissal of the problems you face. It is, instead, chosen as an antonym for "complex". In complex systems, the interactions become too numerous, subtle and *layered* for most analysis techniques to apply. There are too many balls in the air.
>> Aperiodic tasks, "random" events, etc. >> leave you with a boundless range of possible scenarios that you're >> trying to cram into a neat little, well dimensioned, box. > > Aperiodic tasks can be limited by minimum inter-arrival times, agreed with the > SW customer and (if required) enforced in the SW, and then they can be analysed > in the same way as periodic tasks. Alternatively, scheduling methods called > "servers" exist that can handle aperiodic or sporadic stuff, in a > priority-based or EDF-based environment, while using at most a defined fraction > of the resources.
You can *agree* to anything you want. But, when the product fails to *work* -- because the USER had their own idea of what those limits should be -- suddenly your past agreements are just a leg to stand on in a court of law... but, don't leave you *or* the customer happy! When your reservations are exceeded, the system breaks. How do you get a *user* to accept the fact that *you* (or your customer) decided that *he* (the user) should only be expected to perform a certain set of tasks in a certain amount of time, etc.? IME, a better approach is to let the entire system "flex" to accommodate the user -- in ways that you can't always predict with a static reservation. Allow it to operate *into* overload and for the user to find that sweet spot between "too much" and "not enough" -- without arbitrarily enforcing a hard limit because you *thought* (and contractually agreed) that a certain reservation was "adequate". If you carefully examine the systems around you, you should be able to identify those that have been designed to "bend" vs. those that simply "break". Each is "annoying" in that it doesn't implicitly meet our needs. But, the ones that force us to STOP and GO at their defined pace tend to be more annoying than those who will let us "get ahead of them" -- but, eventually, catch up!
On Sun, 10 Jul 2016 17:55:10 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

>On 16-07-10 12:41 , Don Y wrote: >> On 7/9/2016 5:46 AM, Niklas Holsti wrote: >>> On 16-07-09 00:56 , Les Cargill wrote: >>>> Niklas Holsti wrote: >>>>> [snips] >>>>> One example of interactions I find difficult is a shared I/O channel or >>>>> bus that must be used by various tasks, for various purposes, with most >>>>> transmissions being sporadic and such that the sending task must wait >>>>> for and check a response transmission. >>>> >>>> So the sender sends asynchronously then blocks on a receive >>>> ( presumably with a timeout ) , with other tasks/ISRs handling >>>> the details. You may even have separate send and receive loops, >>>> with state indicating the timeout for each receive. >>> >>> I have no problem *implementing* such things, my problem is >>> *analysing* the timing to compute worst-case task response times >>> under various load scenarios. This computation must also consider >>> the possible latencies of response-generation at the remote end of >>> the channel. >> >> It's no different than a uniprocessor implementation -- *if* you >> "migrate" the "scheduling criteria" (in your case, "priorities") >> *with* the communication... THROUGHOUT the system!
If you have the luxury of multiple cores/processors, just set the task affinities so that the critical tasks are locking on one core/processors and do the timing analysis with that. The other noncritical tasks will run on the remaining core(s)/processor(s). This is especially important in multiprocessor applications with large private caches. If the scheduler is allowed to through around the task among all processors, there is a lot of cache invalidation.
On 7/10/2016 7:34 AM, Niklas Holsti wrote:
> On 16-07-10 12:01 , Don Y wrote: >> On 7/7/2016 1:09 PM, Niklas Holsti wrote: >>> Don Y wrote: >>>> Then, EVERY product that employs a priority-based scheduler should >>>> have a >>>> FORMAL document in part of its deliverables that clearly states these >>>> priorities and the method by which they were derived (as you later call >>>> them "the basis for one important set of the mathematical methods for >>>> verifying real-time performance: schedulability analysis"). So, the >>>> next bloke to look at the code knows exactly why they were chosen and >>>> which assumptions he/she must *continue* to operate under for those >>>> priority assignments to remain valid. >>> >>> Ideally yes, just as every mechanical engineering project should >>> document its >>> design assumptions, stress and strength calculations, etc., and every SW >>> project should deliver full design documentation, complete user manuals, >>> maintenance manuals, etc. >>> >>> But seriously, documentation requirements are a different dimension >>> from the >>> choice of design and implementation methods. Using a systematic design >>> method >>> does not oblige you to document the design, although it makes it much >>> easier to >>> document it. >> >> How can you say that? Do you expect the next soul to read your mind >> regarding how and why you made specific choices? Do you expect them >> to retrace your design steps to ensure they understand the assumptions >> you've ENCODED in your choice of priorities? > > Design decisions and assumptions resulting in priority assignments are no > different from design decisions and assumptions resulting in buffer sizes, > control-loop coefficients, and other kinds of design features. You either > document them or you don't, depending on your customers' requirements and on > your own quality standards.
As damn near everything I've worked on has had to see others support/maintain it after me, NOT documenting design issues has significant (often serious) consequences. You can't count on the next soul to have the same grasp of the complexity of a particular *portion* of an implementation that you had when you *designed* it, in the context of the rest of the application/system. (And this assuming there are formal specifications available!) Esp when that soul is probably just pushed at a particular "problem", *cold*: "Fix this", "Add this feature", etc. "And, can you have it done before lunch??" I've taken to preparing "tutorials" for various algorithms to give the next guy a quickie education on the issues involved -- as "comments" in code don't go far enough to address the sort of background information that is often required. And, are generally limited to *text* (e.g., no illustrations, animations, etc. How do I describe the "creakiness" parameterization of a voice "in words"? OTOH, "click on this to hear an example of creakiness... Now, tweak this control and click, again!") This allows my code to be skimpier on comments -- because it can reference issues that are explained more fully, in those documents. It also frees me from having to keep tweeking those comments as I make subtle changes in the implementation. And, lets me put truly important reminders in the code. E.g., you will frequently encounter "Here, there be dragons" -- as a warning that the implementation is intentionally the way it is; resist the urge to "optimize" or "simplify" it... because there are issues at stake that such changes will inevitably break. Often without being immediately noticeable! I.e., make SURE you've read the accompanying information and understand what's at stake before you go twiddling bits! I also like to litter my code with numerous invariants -- to make it crystal clear what can be expected at each point in the code (even if the invariants are just "pseudo-code"). Usually, these have to be elided (e.g., implemented with #ifdef DEBUG) from the production code (as they represent the worst kind of "dead code")
>> Or, are they free to change your priorities WITH NO EFFECT ON THE >> RESULTING IMPLEMENTATION? > > Just as free as they are to change buffer sizes and control-loop coefficients. > They either know what they are doing (helped by documentation, if it exists) or > don't know what they are doing. As I've said, all my day-job projects deliver > extensive design documentation, including SW Budget Reports with task priorites > and their motivations.
And your "mathematical proof/analysis"? I.e., can someone *verify*/validate your design based on your documentation? If I set aside N bytes for a buffer, there is a justification for that choice in the comments or accompanying descriptions. So, if that rationale doesn't stand the test of time (i.e., assumptions change), the guy after me knows how he should go about determining the new value.
>> But, I am careful to document how and why each of these mechanisms >> are used and "prove" that they *can't* run amok in my description >> of the algorithms, etc. And, the assumptions that dictate how >> future uses/adaptations must operate. > > Good for you and your customers. I hope they pay you for it, too. I suspect > that for many mass-market products the customer (i.e. the entity who > subcontracts SW development) only cares that the SW works, but is not > interested in its design details.
For products that have no "maintenance cycle", that may be the case. But, for medical instruments, pharmaceutical systems, process control systems, etc. there is often MORE cost in the validation than the "development". Having lots of documentation often helps with "due diligence" defenses, etc. It goes to addressing concerns that the design may have been sloppy, intentionally negligent, etc. "Did you consider this?" "Yes -- and this is the analysis that was performed..." And, anything that hopes to be extensible *must* be documented; you don't want to discourage those extensions by effectively obfuscating your design! Otherwise, why KEEP the source code if you don't expect others to look at it! You, instead, want people to be able to come up to speed and understand your design methodology without HOPING it jumps off the page (code) at them.
>>>> Ask yourself how many of those documents you've seen? Authored? >>> >>> Several, for both questions. All the day-job projects I work on have >>> such, and >>> I am usually the author. >> >> I suspect you are in the minority as (IME) most folks using priority >> based schedulers just "pick numbers" and, later, "tweek them" to make >> their design work properly. Usually, with little formal knowledge of >> the limitations of the algorithms or the problems that can manifest >> (nor the potential solutions *or* the problems with the solutions!) >> >> Would you care to share any of those rationales? > > As I have said, I usually rely on deadline-monotonic priority assignment and > its response-time analysis. There is a huge literature on scheduling methods > and schedulability analysis methods, for hard real-time systems, soft real-time > systems, and mixtures. I know only a little bit about the hard-real-time part > and almost nothing about the soft-real-time part. In my day-job projects, the > soft-real-time part is non-critical and is validated by stress tests on > practical-worst-case test scenarios agreed with the customer.
You're taking the attitude of "if its HRT, then I can claim 'I need...'". I argue that much HRT isn't truly necessary to be implemented in that fashion. (I'm speaking with no knowledge of your particular application domain -- just years of experience watching people treat things as "hard" requirements that needn't be.) The suggestion that "soft" is therefore "not as important" is a foolish distinction, IME. "Hard real-time is hard; SOFT real-time is HARDER!" Yet, there's no inherent guarantee that those tests actually stress the things they *should* stress. You've just got a "legal defense" (contractual) but haven't ensured the product does what it *should*. E.g., the spirit rover, I'm *sure*, had a boatload of testing and simulation before it was ever loaded into the nose cone of that rocket...
>> I'm hosting a course >> that presents various OS design techniques, scheduling algorithms, >> hazzards, etc. and examples of how to (and NOT to) exploit each later >> this Fall. "Fabricating" artificial examples always *feels* like an >> artifice; presenting REAL examples allows folks to see ACTUAL tradeoffs. > > If this is about server operating systems, with an unpredictable mix of batch, > interactive, and network/transaction applications, it's a different ball-game. > And mobile handsets are more and more coming to resemble such servers, with the > profusion of applications of every possible nature. I have no contributions to > make in those domains.
No, I'm simply offering a smorgasbord of exposures to different technologies for folks who've typically never had any formal training on the material; they're victims of the "heard it at the water cooler" approach that is so common in the industry; or, "read a 2 page description in a trade magazine". People not keeping up with technology and theoretical work in the (mistaken) assumption that "they, *themselves*, actually DO this and are, thus, The Experts". Would you bring your wife to a doctor who thought the treatment for breast cancer was radical mastectomy? (as it ONCE *was*) Or, would you find someone who is "keeping current" with the best practices in their trade/industry?
> Where will this course be given? Will it be on the net, perhaps as a MOOC?
No, it's a local/private gathering for friends/colleagues. It was suggested to video tape them and make YT videos but I've argued against that as it alters the way in which I can present the material. I've opted for a "lab" approach: write some code to solve this "trivial" problem; now, let's show you how easy it is to BREAK that code as it interacts with other tasks, unexpected events in the environment, etc. -- the sorts of things you MAY or may NOT encounter before formal release... but, that are problematic areas in your design! Because you "hand-waved" important design issues instead of approaching them as ENGINEERING tasks.
>> And, I suspect, your tasks are largely *periodic*? And/or orthogonal to >> each other? > > The hard-real-time tasks tend to be periodic, but sporadic ones also occur. The > softer tasks are usually sporadic. > > Configuration parameters and commands usually flow from the soft-real-time part > to the hard-real-time part, and the hard-real-time part usually generates data > into buffers that are emptied by the soft-real-time part. Rather typical > control & measurement systems.
Yes. Address the "(sub)tasks" with the strictest deadlines "in hardware" (e.g., the foreground). Then, let the remaining tasks "consume" (and produce) as resources are available. How do you know the SRT tasks *will* empty the buffers before the HRT tasks need to refill them? etc. E.g., my barcode reader ISR filling its buffers as bras/spaces pass the sensor... *hoping* the background tasks can clear out the old data before the newest needs that space!
> The most common complex task interaction is a communication channel shared by > several tasks. For example, a hard-real-time task may be communicating > cyclically over the channel, while some soft-real-time tasks must use the same > channel for sporadic communications. > >> I have had exactly one project in 40 years of RT design that fit that >> mold -- a process that ran at 200Hz (5ms cycle time) with "hard" >> deadlines (in the sense that a deadline missed resulted in a failure >> of that *cycle* of the process -- a "defective product" to be discarded). >> Note that missed hard deadlines didn't indicate a failed *implementation*! >> >> By and large, my projects have consisted of aperiodic, cooperating >> tasks driven by "semi-random processes"; processes that can't be easily >> quantified nor constrained! And, running on under-provisioned hardware. >> >> E.g., the example up-thread has a bunch of user I/O's: >> - barcode reader >> - two serial ports >> - keypad >> - sensor array >> - display >> - audio annunciator (sound synthesizer) >> >> There is nothing I can do to prevent a user from scanning a barcode >> label at *any* time in the normal operation of the system. I.e., >> "bars" (not "labels") can arrive for processing at any time. That's >> part of the *appeal* of the system -- the USER drives it instead >> of *it* driving the user! > > [snip] > > If you have decided that static priorities, or some other specific scheduling > principles, don't work for you in this application, fine, use something else. > But IME static priorities do work for many other applications.
I'm not saying they *can't* work. What I am saying is that people employ "RTOS's" (COTS or homegrown) and inevitably end up with a "priority based scheduler" (i.e., small integers) and they *pick* priorities based on "feel". Or, some other notion of what they should "mean". Then, jigger them when those assumptions prove to be false given the task partitioning that they've ALREADY adopted. [They're unlikely to refactor the design; MORE likely to jigger the priorities, run it "for a while" to reassure themselves that "it works" -- and then extrapolate that experience on the bench to experience in the field! "I put in a 2A fuse and it didn't blow. Therefore, the load will never exceed 2A!" (reread that statement to realize how silly it sounds -- esp as a design technique! I.e., the fact that it didn't blow somehow ENSURES that it won't!) Just because your code *appeared* to work in the lab, doesn't mean it will continue to work! So, *prove* to me -- in your design review -- that your "scheduling criteria" (whatever they might be) ensures that these things WILL get executed; *OR*, show me how you will handle the cases where they DON'T!] When something "misbehaves", if they can't easily find an explanation -- and, if it doesn't *repeatedly* misbehave -- they shrug and write it off as a "fluke". As if this one time event was a "can't happen" (yet, it DID! I.e., some assumption that you have made in your design has now PROVEN to be wrong! YOU SAW IT HAPPEN!! Do you think this was a stray alpha particle? Power supply ripple? Gremlins??) I designed an autopilot for a boat ~35 years ago. It used LORAN-C navigation signals to pilot the boat to a particular spot on the globe (latitude,longitude) -- compensating for oceanic currents, etc. On its maiden voyage, we took it around Cape Cod (Massachusetts) <https://upload.wikimedia.org/wikipedia/commons/b/bc/Cape_Cod_Landsat_7.jpg> using the coordinates of buoys "in open water" to define the individual legs of the trip (kinda hard to know FOR SURE where you are in open water -- there aren't a lot of street signs you can consult! :> ). Our track was recorded (on paper) via a LORAN-C position plotter. So, we could document our travels wrt the coastline (indicated on the *map* that had been used as a drawing surface). On one leg of the trip, a very pronounced 'S' was visible in the track; as if the boat had turned 90 degrees off course, then compensated by making a "U-turn, then overshot the original track and made another "U-turn", then turning 90 degrees to get BACK onto the original track! Sure *sounds* like some sort of "divide by zero" bug! Or, something equally bizarre to cause such a sharp set of discontinuities in our track! I was never able to find an "explanation" for this behavior in the code. Despite a careful analysis of my algorithms, the floating point library, etc. (of course, I don't have records of the raw data on which my algorithms operated...) My boss explains it as the point at which we stopped to do some deep sea fishing. The boat would twist and turn as the currents kept trying to align the vessel with the incoming waves (so the waves would strike the boat broadside instead of at the bow). So, the skipper had to periodically turn the boat to bow or stern would be into the current. Those being exactly the "right angles" noticed in the plot! While this is PROBABLY true, I still drag out those sources periodically in the hope of stumbling on something that would be a more *provable* explanation (even though it would have to be a BUG!)
>> How do I assign scheduling priorities to each of these competing, >> aperiodic, asynchronous tasks? Priority effectively assigns >> "importance" by directing resources (CPU cycles) to one task at >> the expense of others. Which task(s) are more important than >> which OTHER tasks? > > Again you confuse "importance" with "urgency".
Because it *is* "importance" and not "urgency". Urgency tends to conjure up images of "immediacy". As if a deadline 5 seconds hence is more urgent than a deadline 5 hours hence. That distorts the "importance" of the two situations. If you have a poor selection of food choices in the house, a trip to the grocery store "before dinner" (an hour hence) may seem "most urgent". However, if your vehicle is up on blocks, awaiting repairs, *that* may be the "more important" undertaking; eat some cheese and crackers (or, go hungry for a meal or two -- it won't KILL you!) so you can, instead, devote your time to getting the car repaired (even though the "deadline" is DAYS away -- driving it to work Monday morning! It may well take you ALL of that time to get it running!) Algorithms like EDF assume all tasks have equal "importance" (value) and, favor those with the "soonest" deadlines (by assuming that all deadlines *can* be met!). LST, by contrast, favors tasks that have the "smallest margins (for timing errors)". "Priority" allows some arbitrary "small integer" to codify the scheduling needs. None of these address the relative "importance" of a particular undertaking. E.g., you can afford to live on cheese and crackers for a few meals *if* it allows you to drive to work Monday morning. The alternative is a richer set of meal choices, in the short term, followed by a bus/cab ride to work, Monday! (more costly and inconvenient *and*, risks your not meeting some OTHER goals, Monday afternoon, because you had to wait for a bus to get you back to the house... and you STILL have to finish work on the car for TUESDAY morning's trip. But, you'll be able to eat STEAK, *again*, for Monday supper... :< )
> For example, some inputs have HW > buffers (FIFOs), some have not; typically processing the buffered inputs is > less urgent than the unbuffered inputs. Or you can provide a SW buffer, filled > by an interrupt handler, for an input that does not have a HW buffer, and > correspondingly relax the urgency of the task reading the SW buffer. > > If you are saying that all *inputs* are equally urgent, fine, poll them all in > a single task and delegate the input data processing to other tasks in order of > urgency of the *outputs*. If you think that all the outputs are equally urgent, > too, then do everything in a single task -- the "Task Loop" in our Subject.
You can't "poll" for incoming characters from a UART without risking encountering many "receiver overrun" errors. You can't poll for black-to-white (or white-to-black) transitions on a photodetector input (at a nominal 8KHz) and expect to "see" them all. You can't poll a bunch of high frequency oscillators to note how their frequencies are changing in response to actions from the user. All of this data acquisition has to happen "in hardware" -- but, very *little* hardware. E.g., the barcode "video" just effectively captures the time of a free-running counter into a latch (that latch being a software "read" instruction executing in an ISR). If you *miss* it, it's gone. And, the barcode label in which it occurred can now NOT be decoded: "Sorry, user, could you please try that again? Ooops! Damn! One more time... I was busy polling a UART that time... (sigh) And, yet again... I got distracted by the audio annunciator on that last attempt..." Because there are no constraints on what the *user* can choose to do, you have to be ready and able to "see" whatever he chooses to do WHEN he chooses to do it. Everything is "equally important". (I.e., he doesn't push a button to say, "I am now going to scan a barcode label. Please activate the barcode reading task and suspend any other tasks that might be competing with it for resources... I *promise* I won't do anything that you SHOULD see unless I *tell* you that I will be doing it!") As I' like to say, tongue-in-cheek: "Let's make everything louder than everything else" (IIRC, Deep Purple?) Instead, you ensure nothing "gets by you", unnoticed. It might not be recognized/processed. But, it MUST be "noticed". You have to KNOW that you've "missed something". Because, you have to know that you have NOT missed ANYTHING when you make claims about the transactions that you are monitoring and the data that you are maintaining: "I *know* the message I received was 'deploy the warhead' and *not* 'do NOT deploy the warhead' -- because I didn't *miss* anything during its reception!" Or, conversely: "I'm sorry, I can't vouch for the integrity of the PORTION of the message that I THINK I received because I *know* parts of it were missed..."
>> No amount of analysis will produce a schedulable task list. The >> system EXPECTS to encounter overload. > > In priority-based systems, the typical solution to overload is to implement > load-refusing mechanisms such as ignoring interrupts that are too close > together. Whether this can be made to work well enough for the application to > react gracefully under overload depends on the system design (including the HW > design).
You're still "picking winners" (and, thus, losers). Which of the above tasks should have which priorities? Assigning priorities effectively assigns "importance". If I let the scheduler favor the barcode decoding task (in the background, forget the foreground role), then that implies that decoding barcodes are more important than servicing data requests on the UARTs. Or, detecting blood sample/reagents placed in the sensor array. Even if scanning a barcode is meaningless at this point in time! (I can't reach out and grab the user's arm or drop a shutter over the photodetector!)
>> It *expects* to miss deadlines. Exactly because it can't control >> the actions of those things that are interacting *with* it. > > The SW can control the rate at which it accepts inputs for processing.
That just says, "the software can decide how much it wants to MISS". But, it still has *missed* those things! I can't reach out and stay the hand of a user who is preparing to swipe a barcode label passed the photodetector. I can turn off the ISR that services it (or, kill the task that would decode it), but when do I turn it back on? How happy will the user be that he scanned a label and the system didn't "see" it? I can turn off the Rx ISR for the UARTs. I can drop the handshaking signals to tell the remote device(s) to stop sending. But, that doesn't mean they will. Or, will as promptly as I *hope* they will (how quickly will an unknown piece of equipment respond to an XON/XOFF? Modem control signals? Will they *know* that I may not have seen some action that they took on that interface? Will they stall waiting for an acknowledgement that will NEVER come? When they reissue the request, will I coincidentally repeat this same action -- leading to yet another interface stall? When will the customer consider the system to be performing badly? How will he know what *he* can do to change that? How much MORE hardware should the system include to address the needs of *some* users -- yet STILL not be "foolproof"?
>> The *environment* (user + other equipment) determines the >> performance of the system. > > The environment always determines the offered load. In case of overload, > depending on the robustness requirements of the SW the SW may be allowed to > fail severely or be required to degrade gracefully. Apparently in the system > you describe the latter case obtains, but then the proper design depends on the > precise definition of "gracefully".
You let the user define what is acceptable by altering his interactions with the system to obtain the "best" performance for his particular needs. If a user only has one device connected to the UARTs, then less "real time" will be consumed servicing UARTs. If a user manually types identifiers in via the keyboard, then less "real time" will be consumed scanning barcodes. Etc. It's silly of the device to DICTATE to the user how it should be used. And, few would want to *pay* for the hardware and software that can address ANY set of needs, regardless of THEIR *particular* set of needs!
>> I excel at making robust *soft* real-time systems -- recognizing >> that MOST applications actually *are* "soft", even those that >> folks want to THINK of as "hard". > > If a specific graceful handling of overloads is required then I agree that the > system acquires soft-real-time aspects. But some hard-real-time aspects may > remain, if overload is still considered an anomaly. If the load that you call > "overload" is actually expected, I would not call it "overload" at all, just a > load for which all inputs cannot be processed in the same way as under lesser > loads. > >> In my current project, I use a "value function" as the scheduling >> criteria: a tuple that relates (initial) deadline, a notion of value >> and a *hard* deadline (at a hard deadline, the task has no continued >> value so can be terminated -- and the associated deadline handler >> invoked to compensate for its deletion). > > As I said, I know almost nothing about soft-real-time scheduling, but I > remember seeing published papers with similar methods: value functions that > depend on the elapsed time of a computation or the expected time of delivery of > the result. I don't doubt that they can work, even if the definition of such > value functions seems to me more problematic and ad-hoc than the definition of > priority.
The "value" of the approach is that it lets the scheduler make the tradeoff, dynamically. I *know* how "valuable" (important) a particular scheduling option is, *now*. I also know when it has NO value! In your systems, you appear to split things into two groups: those that MUST meet their deadlines and those that "don't have to". You call these HARD and SOFT, respectively. If a HARD task misses its deadline, you (appear to?) consider that a system failure. Your product is broken. Presumably, there is also some point at which you consider a postponed SOFT task to also be indicative of a failure (I assume you aren't willing to leave a task eligible "forever" without expecting some progress towards its goal?). In my world, every task has two deadlines (and a "value"). The first ("soft") deadline is the desired target. Maximum (system) value ("usefulness") is (typically) achieved by meeting these deadlines for all tasks. [I assume a particular "shape" of value function that isn't necessarily applicable to ALL types of real-time systems] The second ("hard") deadline is the point at which any further effort/resources expended by the task is "useless". BUT, this doesn't imply that the product is *broken*! Rather, that the task's role in the product is no longer of interest to justify expending additional resources on attaining its goal(s). Kill the task and free the resources. Why steal those from OTHER tasks that might still be able to meet *their* deadlines, once you are out of the picture?! What "priority" should an audible keystroke feedback indication be? How "urgent"? What should the system risk NOT accomplishing in order to make a silly little "click/beep" sound on each keystroke? Ideally, that should be within milliseconds of the key being *pressed* (i.e., not when it is RECOGNIZED but, rather, when it is PRESSED! Artificially HIGHER "priority"!). Yet, in the grand scheme of things, the user probably cares little about it! It's not like he's likely to say, "Hey! I didn't get a keyclick for that last keystroke -- even though I can SEE that it was recognized by the appearance of the associated digit in the display..." So, there is a point at which any work towards making that keyclick is just waste. And, can even be distracting. Kill the job and move on. [The classic example is bottles on an ever-moving conveyor; once a bottle has past a certain point, any efforts to position a "picking arm" to grasp it and convey it to an alternate location are wasted! The bottle *will* crash to the floor. There will be some cost to this "missed deadline" (i.e., a broken bottle). Wasting resources to try to "catch it as it falls" could put the *next* bottle at jeopardy. Etc. It may be more prudent to just accept that occasional loss (perhaps the arm mechanism bound up during this cycle resulting in the *intermittent* failure) and get back into stride than risk prolonging the "disturbance".] At the same time, the deadline handler for the (abandoned!) task is scheduled for execution. *It* can then apply the "consequences" of this missed HARD deadline. Maybe just increment a "broken bottle counter". Or, shut down the production line because "too many" bottles have broken in too short an interval ("Maintenance" needs to look at the mechanism to see why it has been misbehaving at a statistically higher rate than in days passed). The grey area between soft and hard deadlines is where the scheduler can "add the most value" (tongue-in-cheek). Should it chase after that bottle that has slipped past the "picker" and possibly save it from crashing to the floor? Or, should it visually examine the label on the *next* bottle to ensure that *it* is positioned properly? ("quality") In some scenarios, you could argue for having the picker chase the bottle as that can "save" on an otherwise material loss. Especially if the "other" bottle can summarily be routed to a "manual label verification" station (where a human operator reexamines the label placement and removes and replaces it, if necessary, before reintroducing the bottle to the ass'y line). In another scenario, the labor involved in inspecting and/or reapplying labels might be so high that it's cheaper to let a bottle fall to the floor (or, into a cushioned bin placed there to address these deadline overruns!) than it is to run yet another bottle through the manual label inspection process! You can't do this with "priorities". Something has to be able to dynamically REASSIGN priorities, on the fly, in order for a scheduler to know how to choose.
>> Attempting ANY kind of schedulability analysis on a system this >> large (thousands of tasks, hundreds of cores) is simply impossible. >> All you can hope for is "best effort". > > Soft-real-time "schedulability" analyses usually focus on maximising the total > "value" delivered or the overall Quality of Service, not on computing or > minimising worst-case response times.
That depends on how you define that "value". If value is correlated to "inverse lateness", then maximizing value inherently *minimizes* lateness.
>> Or, simply concede that it is not possible to design large systems >> and limit your efforts to trivial, "provable" ones. > > As I think I've said in some earlier post, IMO for hard-real-time systems the > analysability limit comes from the increasing complexity of task interactions > in large systems. That can be combated with better modular design and with > systematic task interaction patterns.
It is a separate issue from hard vs. soft. The number and nature of the modules and their interconnectedness is what drives the (lack of) "analyzability". I can create a system with hundreds of hard (or soft) RT tasks that is trivially analyzable: make all of the tasks orthogonal to each other; ensure the sum of their processing times is less than the time to the *first* deadline. I.e., you don't even have to take out a pen and paper to KNOW that this system is schedulable (with ANY algorithm!) OTOH, think of a system with a "few" tasks that are heavily interrelated (due to the nature of the application) and you can spend considerable effort convincing yourself (and others) of its schedulability under any *particular* algorithm, let alone picking the *best* algorithm.
> While small systems may be "trivial" from an academic point of view, there are > certainly applications that are small enough to be provably analysed but still > are non-trivial in terms of importance and value to the customers.
I've not said or implied otherwise! A traffic light controller is both "trivial" (conceptually) and "complex" (in actual implementation). Yet, no one would dispute its utility!
On Sun, 17 Jul 2016 09:56:32 +0300, upsidedown@downunder.com wrote:

>On Sun, 10 Jul 2016 17:55:10 +0300, Niklas Holsti ><niklas.holsti@tidorum.invalid> wrote: > >>On 16-07-10 12:41 , Don Y wrote: >>> On 7/9/2016 5:46 AM, Niklas Holsti wrote: >>>> On 16-07-09 00:56 , Les Cargill wrote: >>>>> Niklas Holsti wrote: >>>>>> [snips] >>>>>> One example of interactions I find difficult is a shared I/O channel or >>>>>> bus that must be used by various tasks, for various purposes, with most >>>>>> transmissions being sporadic and such that the sending task must wait >>>>>> for and check a response transmission. >>>>> >>>>> So the sender sends asynchronously then blocks on a receive >>>>> ( presumably with a timeout ) , with other tasks/ISRs handling >>>>> the details. You may even have separate send and receive loops, >>>>> with state indicating the timeout for each receive. >>>> >>>> I have no problem *implementing* such things, my problem is >>>> *analysing* the timing to compute worst-case task response times >>>> under various load scenarios. This computation must also consider >>>> the possible latencies of response-generation at the remote end of >>>> the channel. >>> >>> It's no different than a uniprocessor implementation -- *if* you >>> "migrate" the "scheduling criteria" (in your case, "priorities") >>> *with* the communication... THROUGHOUT the system! > >If you have the luxury of multiple cores/processors, just set the task >affinities so that the critical tasks are locking on one >core/processors and do the timing analysis with that. The other >noncritical tasks will run on the remaining core(s)/processor(s). > >This is especially important in multiprocessor applications with large >private caches. If the scheduler is allowed to through around the task >among all processors, there is a lot of cache invalidation.
Don is talking about both multicores and physically separate processors connected by network. As he has described it elsewhere - his system is able to migrate a running task to any suitable platform anywhere within the network. There have been some other operating systems capable of doing this. AIUI, the interesting thing about Don's system is that he is attempting to do real-time, real-world control ... not simply to distribute processing over a bunch of networked "compute servers". George
The 2026 Embedded Online Conference