On Fri, 08 Jul 2016 23:35:22 +0300, Niklas Holsti wrote:> On 16-07-08 19:06 , Tim Wescott wrote: >> On Thu, 07 Jul 2016 23:09:08 +0300, Niklas Holsti wrote: >>> Don Y wrote: >>> [snips] >>>> Or, is it just "some suggestive feeling" >>>> that led to the small integers chosen? >>> >>> No. From the requirements, I derive a set of tasks and deadlines, and >>> then assign priorities in deadline-monotonic order. Then I analyse or >>> measure WCETs and crank the response-time algorithm to check >>> schedulability. The main problem is that the task interactions are >>> often more complex than assumed by the simpler schedulability analysis >>> methods. Fortunately, in my projects it is usually possible to >>> separate hard-real-time tasks from soft-real-time tasks, and the >>> interactions of the hard-real-time tasks tend to be rather simple. >> >> I have never had a problem doing this separation myself, although I've >> inherited code from other people that fails at this, and rather badly. >> The most egregious example of this was code that put the most important >> two jobs into one superloop inside of one task -- and did so in such a >> way that bollixed up the whole system if incoming commands exceeded a >> rather moderate rate. >> >> I'm not entirely sure, but I think there's a possibility that if your >> code _does_ have such an interaction between tasks, then it means that >> there's something fundamentally wrong with your software design, or >> possibly your system design as a whole. > > I agree that some troublesome task-to-task interactions are design flaws > and can be eliminated by design changes, sometimes by splitting some > task into one hard-real-time task and another less urgent task. But I > have not always succeeded at that. > > One example of interactions I find difficult is a shared I/O channel or > bus that must be used by various tasks, for various purposes, with most > transmissions being sporadic and such that the sending task must wait > for and check a response transmission. I believe that there are analysis > methods which can find bounds on the response times for these tasks, > including queueing delays and communication latencies, but I haven't > studied or tried them yet. I may have to do so in my current project, > however, because the system has three inter-communicating computers, > with each computer connected by one SpaceWire link to a central router.In a sense this becomes a problem in system design, because any real-time constraints on the messages has to affect the signaling protocol. One of the Really Nice things about the CAN bus, from my perspective as both a writer of embedded software and a designer of closed-loop control systems, is that the messages are both prioritized and have a well- defined maximum length. This means that if you design your prioritization scheme correctly, low-latency traffic can coexist with far slower traffic -- you just need a higher-layer protocol that can break large messages up into many shorter ones on the transmit end, and reassemble them at the receive end. (Priority plus messages of indeterminate length won't get you there -- imagine starting a long, low-priority message just before some short, low- latency message needs to get through. Unless the protocol encompasses terminating the long message, your system is screwed.) -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
Common name for a "Task Loop"
Started by ●June 24, 2016
Reply by ●July 9, 20162016-07-09
Reply by ●July 10, 20162016-07-10
Niklas Holsti wrote:> On 16-07-09 22:26 , Les Cargill wrote: >> Niklas Holsti wrote: >>> On 16-07-09 00:48 , Les Cargill wrote: >>>> Niklas Holsti wrote: >>>>> On 16-07-08 20:50 , Les Cargill wrote: >>>>>> Tim Wescott wrote: >>> [snips] >>>>>>> I'm not entirely sure, but I think there's a possibility that if >>>>>>> your >>>>>>> code _does_ have such an interaction between tasks, then it means >>>>>>> that >>>>>>> there's something fundamentally wrong with your software design, or >>>>>>> possibly your system design as a whole. >>>>>> >>>>>> That's kind of what I mean by "dependence on priority is Bad." >>>>> >>>>> That is not how I have understood your dislike of priorities. My >>>>> current >>>>> project has about 20 tasks. One of these tasks is cyclic with a 2 ms >>>>> period; another is the background task which scrubs RAM with a >>>>> deadline >>>>> measured in several minutes. These two tasks do not interact at all, >>>>> apart from sharing the same processor. The application depends on >>>>> the 2 >>>>> ms task having a higher priority than the background task, and I >>>>> find it >>>>> hard to image how that dependency can be called "bad". >>>>> >>>> >>>> That's a case where it doesn't matter. I've seen cases where >>>> mis-assigning the task priorities caused different behavior. >>>> >>>> That's bad. >>> >>> Mis-assigning priorities is IMO the same kind of programming error as >>> mis-defining the termination condition of a loop. But it would be >>> nonsensical to say that it is bad if an application depends on the >>> termination condition of a loop. >>> >> >> I don't agree with that as a metaphor. >> >> I'm very familiar with the moon lander and the original use of >> priority. The story of the 1202 alarm. It's a critical technology. But >> really, that story is a story about two basic priorities. > > I don't understand your point. Quoting from Wikipedia > (https://en.wikipedia.org/wiki/Apollo_Guidance_Computer): "Luckily for > Apollo 11, the AGC software had been designed with priority scheduling. > Just as it had been designed to do, the software automatically > recovered, deleting lower priority tasks including the 1668 display > task, to complete its critical guidance and control tasks." >I could not make it clearer myself - don't they basically partition tasks into "low priority" and "high priority"? So there ya go.>> But... if I may restate: >> >> Any design that does not depend on task priority will be more robust >> than one that does. > > Robust against what? Requirements changes? Programming errors? Temporary > overloads? I don't see it, whatever robustness measure you use. >More robust in terms of predictable and deterministic behavior. Fewer load-related defects. Fewer asymptotic behavior cases with linear increase in load. Less pathology. And in cases, for fully cooperative multitasking, you may be able to stop being concerned about the reentrancy of libraries and the like. Maybe :) This is much less likely.>>> I have something like 20 tasks, at multiple priorities and with some >>> complex interactions. I use priorities to gain determinism, not to >>> lose it. >>> >> >> The critical thing is - I never trust that as a solution. > > And I don't understand why you don't. I don't think you have shown any > reasonable grounds for this distrust. >It's because we're using a probabilistic ... portion of the O/S to enforce ordinally better behavior rather than having a design that forces better behavior. It's implicit rather than explicit. And, again, it may just not matter for some cases. But understanding that is important.>>>> But I would also say that having the memory scrubber coded such that >>>> it would behave the same whether the kernel is preemptive or not >>>> would be a better design and implementation. >>> >>> I totally disagree. A non-preemptive system would force me to insert a >>> large number of "yield" calls in all tasks, carefully placed so that the >>> execution time between yields never exceeds the maximum latency I allow >>> for the 2 ms task. Yuck, yuck, triple yuck. >>> >> >> Is the problem the context switch time or an aesthetic ( or other cost) >> objection to the way this code would have to be structured? > > It is several things: > > - extra work to design the scrubber SW, since it now depends on the 2 ms > requirement >Oh no. This is not what I mean at all. It can still free run; it just blocks after it's done some quantum of work - scrubbed <x> pages or bytes of memory. It need not be synchronous at all. I'm sorry I drew you into this misapprehension.> - extra work to verify that the design is correct and the 2 ms > requirement is correctly implemented in the scrubber SW > > - extra maintenance work if either the 2 ms period changes, or the > scrubbing requirements change > > - in general, it adds an unnecessary dependency between the 2 ms task > and the scrubbing task, both at design time and at run-time, and so > violates design modularity. > > If the whole design would be based on this principle, the same extra > stuff would enter the other 18 or so tasks, too. Moreover, some part of > the code would still have to decide that the 2 ms task should be run > more often than other tasks, and that application code would in effect > either reimplement priority-based scheduling, or EDF scheduling, or some > similar system, or would be based on a statically predetermined > "minor/major cycle" type of scheduler, which can be severe burden on SW > maintenance. >We seem to have managed to conflate two things - synchronous, 2msec operation and whether a task is preemptive/preempted.>> It could be that the instrumentation necessary for the memory scrubber >> to figure out it needs to swap, costs too much. And I understand that. >> >> And I'm not talking about randomly inserting "sleep()" calls as if >> you were balancing a wheel :) - I mean constructing a "memory >> scrubber" object that understands its CPU budget. Running the >> memory scrubber on constraints based on estimated CPU utilization. > > My current memory scubber object understands its CPU budget: it is > allowed (and even required) to use all the CPU time available at its > priority level. >Ah, then that mainly follows what I mean then. I've just seen lots of things where there was no concept of CPU budget and then people are surprised to find that one was needed. You'd have a cluster of events occur within some horizon and things would go wonky. I've seen things like this deadlock or cripple systems in hard to reproduce ways. Not explicitly because of priority inversion but because there were now latent probabilistic defects that are devilishly hard to reproduce.> What I don't want to do is to force the scrubber task to understand the > CPU budgets of the *other*, higher-priority tasks. >That's correct; you do not.> And the scrubber task is beatifully simple, just a couple of nested > loops, the outermost being eternal. Completely robust against any > conceivable change in any other part of the application, and against any > overload. > > But I admit that the scrubber task is a special case.Indeed.> At higher > priorities, to be robust against overloads the tasks may have to > implement load-shedding or load-refusing logic such as limiting the rate > at which the task is triggered by sporadic inputs.And there you go.> However, IMO even > such things are easier in a preemptive system. >And you're absolutely right.> Can you explain how you would design a memory scrubber object, in a > non-preemptive system, to support not only the 2 ms task, but tasks at 1 > Hz, 10 Hz, and five (say) sporadic, I/O-triggered tasks at rates of up > to around 50 Hz? >I wouldn't even consider it, especially given the level of detail you've exposed now. I'd just have the memory scrubber do some accounting to estimate how much CPU it's used, have a timer tick that informs the memory scrubber it's "over budget" or the like. Stuff that's lightweight but measures the rate discipline of the task. But especially since the memory scrubber seems to be the "background task" or "soak loop", there's no good reason to even worry about it. -- Les Cargill
Reply by ●July 10, 20162016-07-10
On 7/7/2016 1:09 PM, Niklas Holsti wrote:> Don Y wrote: >>>> Exactly. "Priority" (in the sense of "set_task_priority()") is just >>>> an expedient to make coding a scheduler easy. It tells you *nothing* >>>> about the task's "importance", "timeliness constraints", etc. > > Niklas Holsti wrote: >>> That is not so, if you follow a systematic and proven >>> priority-assignment rule. >>> For example, if you use deadline-monotonic priorities then the order >>> of task >>> priorities is equivalent to the order of task deadlines (expressed as the >>> maximum duration between task activation and completion). So then the >>> task >>> priorities certainly tell you a lot about the (relative) timeliness >>> constraints. >>> >>> If you just assign priorities based on some subjective "task importance" >>> feeling, your statement is valid. > > Don Y wrote: >> Then, EVERY product that employs a priority-based scheduler should have a >> FORMAL document in part of its deliverables that clearly states these >> priorities and the method by which they were derived (as you later call >> them "the basis for one important set of the mathematical methods for >> verifying real-time performance: schedulability analysis"). So, the >> next bloke to look at the code knows exactly why they were chosen and >> which assumptions he/she must *continue* to operate under for those >> priority assignments to remain valid. > > Ideally yes, just as every mechanical engineering project should document its > design assumptions, stress and strength calculations, etc., and every SW > project should deliver full design documentation, complete user manuals, > maintenance manuals, etc. > > But seriously, documentation requirements are a different dimension from the > choice of design and implementation methods. Using a systematic design method > does not oblige you to document the design, although it makes it much easier to > document it.How can you say that? Do you expect the next soul to read your mind regarding how and why you made specific choices? Do you expect them to retrace your design steps to ensure they understand the assumptions you've ENCODED in your choice of priorities? Or, are they free to change your priorities WITH NO EFFECT ON THE RESULTING IMPLEMENTATION? If you choose a particular value/rating for a component in a circuit, you either explicitly document WHY you made that choice *or* assume it is readily apparent to someone looking at the schematic diagram: "OK, conditions in which this component operates are such that this performance criteria is adequately derated for its reliable use, here" Are the reasons for your SOFTWARE design choices somehow less important? E.g., I am particularly fond of pointers and recursive solutions. To me, they make the algorithms simpler and more robust. But, I am careful to document how and why each of these mechanisms are used and "prove" that they *can't* run amok in my description of the algorithms, etc. And, the assumptions that dictate how future uses/adaptations must operate.>> Ask yourself how many of those documents you've seen? Authored? > > Several, for both questions. All the day-job projects I work on have such, and > I am usually the author.I suspect you are in the minority as (IME) most folks using priority based schedulers just "pick numbers" and, later, "tweek them" to make their design work properly. Usually, with little formal knowledge of the limitations of the algorithms or the problems that can manifest (nor the potential solutions *or* the problems with the solutions!) Would you care to share any of those rationales? I'm hosting a course that presents various OS design techniques, scheduling algorithms, hazzards, etc. and examples of how to (and NOT to) exploit each later this Fall. "Fabricating" artificial examples always *feels* like an artifice; presenting REAL examples allows folks to see ACTUAL tradeoffs. "This is *why* this approach was chosen over this OTHER approach...">> Or, is it just "some suggestive feeling" >> that led to the small integers chosen? > > No. From the requirements, I derive a set of tasks and deadlines, and then > assign priorities in deadline-monotonic order. Then I analyse or measure WCETs > and crank the response-time algorithm to check schedulability. The main problem > is that the task interactions are often more complex than assumed by the > simpler schedulability analysis methods. Fortunately, in my projects it is > usually possible to separate hard-real-time tasks from soft-real-time tasks, > and the interactions of the hard-real-time tasks tend to be rather simple.And, I suspect, your tasks are largely *periodic*? And/or orthogonal to each other? I have had exactly one project in 40 years of RT design that fit that mold -- a process that ran at 200Hz (5ms cycle time) with "hard" deadlines (in the sense that a deadline missed resulted in a failure of that *cycle* of the process -- a "defective product" to be discarded). Note that missed hard deadlines didn't indicate a failed *implementation*! By and large, my projects have consisted of aperiodic, cooperating tasks driven by "semi-random processes"; processes that can't be easily quantified nor constrained! And, running on under-provisioned hardware. E.g., the example up-thread has a bunch of user I/O's: - barcode reader - two serial ports - keypad - sensor array - display - audio annunciator (sound synthesizer) There is nothing I can do to prevent a user from scanning a barcode label at *any* time in the normal operation of the system. I.e., "bars" (not "labels") can arrive for processing at any time. That's part of the *appeal* of the system -- the USER drives it instead of *it* driving the user! There is nothing I can do to prevent the external hosts from interacting with the device (independantly of each other as well as the user's activities). I can't even *guarantee* that they will honor handshaking and/or pacing controls. Yet, I can't erroneously behave in the presence of THEIR bad behavior! I can't prevent the user from pressing keys at any time. Indeed, he may need to "tell me" something that he deems important. The sensor array lets me track where blood samples and reagents are being placed by the user -- sort of a keyboard for "droplets". I can't limit when or where the user pipettes those samples (yet I must be able to reliably *see* them else I can't vouch for their integrity and identity). I can't even guarantee that the user won't unplug the sensor array, unplug a comm cable, turn off power, etc. whenever he chooses. The display must be refreshed else the user loses that information channel. Ditto for audio. (amusingly, this aspect of the user interface is the LEAST important!) I can't control when the power might fail. Though I can probably place an upper limit on how OFTEN it can fail! Or, when the battery backup will give up the ghost, unceremoniously (the user may have disconnected the battery *or* not given the device time to charge it!) How do I assign scheduling priorities to each of these competing, aperiodic, asynchronous tasks? Priority effectively assigns "importance" by directing resources (CPU cycles) to one task at the expense of others. Which task(s) are more important than which OTHER tasks? "Let's make everything louder than everything else" No amount of analysis will produce a schedulable task list. The system EXPECTS to encounter overload. It *expects* to miss deadlines. Exactly because it can't control the actions of those things that are interacting *with* it. And, because it doesn't have the resources to address the (boundless) overload that is possible! I can take a barcode label and rub it vigorously across the barcode sensor and watch 100% of real time be consumed by that action. Yet, when my arm eventually gets tired, the system resumes AND PROVIDES THE CORRECTLY READ LABEL! Or not. But, it will NEVER provide MISREAD data. Likewise, if any INCOMING characters were missed on either of the serial ports, the system will know *where* they have been elided so there is no possibility of a partial message being misinterpreted as some OTHER *complete* message: "You have (not) succeeded!" ^^^ [Of course, outgoing character streams are completely unaffeected] If the user pipettes blood samples into two or more locations and the system wasn't able to "notice" which came first, it will not confuse their ordering or fail to notice that *two* have, in fact, appeared. The sanctity of the transactions is preserved regardless of the detail. The *environment* (user + other equipment) determines the performance of the system. Trying to design for a worst case, UNCONSTRAINED set of activities and task interactions results in wasteful overprovisioning -- and STILL doesn't guarantee that everything can get done! (e.g., add separate processors for all of these "tasks" so that each is GUARANTEED to be met -- and you'll then discover that you now have orders of magnitude more data that must be processed in the same "real time": what do I *do* with these 400 barcode read events in the past 3 seconds? how do they relate to the numerous blood samples detected? and the simultaneous database updates in that same period?) I excel at making robust *soft* real-time systems -- recognizing that MOST applications actually *are* "soft", even those that folks want to THINK of as "hard". It's just a lot harder to implement solutions when you *accommodate* missed deadlines than it is to simply say: "I *need* the following resources in order to guarantee that I meet these deadlines" (regardless of whether they actually *have* to be met in as timely a fashion as suggested)." This has natural application with aperiodic, asynchronous environments as you can't "know" when the next deadline (or even release time) will come along. You can't *plan* for it. But, instead, have to *deal* with it -- AS BEST YOU CAN, given what else may be happening at that moment. You might try to model interarrival times of events and make some assertions based on those assumptions. But, when Reality doesn't adhere to your model, you still have to WORK! You can't throw up your hands and claim the user is misbehaving! In my current project, I use a "value function" as the scheduling criteria: a tuple that relates (initial) deadline, a notion of value and a *hard* deadline (at a hard deadline, the task has no continued value so can be terminated -- and the associated deadline handler invoked to compensate for its deletion). This allows *all* tasks to share a common "criteria system" -- important as tasks can (physically) "migrate" between nodes and yet retain their "absolute" scheduling criteria despite the mix of tasks that they will be competing with on the new node. The scheduler can dynamically decide if the "depreciated" value of a task that has missed its (soft) deadline is still "worth more" than some other task that may not yet have reached its deadline -- but that has a "slower depreciation curve" if it *does* miss its deadline as a result of that scheduling decision. At the same time, it allows services provided by one task FBO another to carry the same "significance" on the scheduler instance that handles the "service" task (because the service task may be executing on another node). So, a task's "importance/urgency" doesn't change just because of the "urgency" of the task that is CURRENTLY providing a service to it! Regardless of where that "service" task happens to be executing at the current time. Attempting ANY kind of schedulability analysis on a system this large (thousands of tasks, hundreds of cores) is simply impossible. All you can hope for is "best effort". Or, simply concede that it is not possible to design large systems and limit your efforts to trivial, "provable" ones.
Reply by ●July 10, 20162016-07-10
On 7/9/2016 5:46 AM, Niklas Holsti wrote:> On 16-07-09 00:56 , Les Cargill wrote: >> Niklas Holsti wrote: >>> [snips] >>> One example of interactions I find difficult is a shared I/O channel or >>> bus that must be used by various tasks, for various purposes, with most >>> transmissions being sporadic and such that the sending task must wait >>> for and check a response transmission. >> >> So the sender sends asynchronously then blocks on a receive >> ( presumably with a timeout ) , with other tasks/ISRs handling >> the details. You may even have separate send and receive loops, >> with state indicating the timeout for each receive. > > I have no problem *implementing* such things, my problem is *analysing* the > timing to compute worst-case task response times under various load scenarios. > This computation must also consider the possible latencies of > response-generation at the remote end of the channel.It's no different than a uniprocessor implementation -- *if* you "migrate" the "scheduling criteria" (in your case, "priorities") *with* the communication... THROUGHOUT the system! I.e., the (local) network stack doesn't just treat enqueued outgoing packets in FIFO order but, rather, "assumes" the priority of each "client" associated with each packet. So, the packet associated with the "highest priority" (using your scheduling lexicon) gets placed on the wire, "first". I.e., gets *scheduled* onto the resource (wire) first. Similarly, the incoming packet on the receiving end doesn't just get "enqueued" on a socket but, rather, gets promoted up the stack with the priority associated with its sender. I.e., gets *scheduled* through the stack, first (assuming its "priority" now exceeds those associated with other packets previously received. And, thereafter, the task that is waiting on that socket *assumes* the priority conveyed by that packet in how the task is scheduled, locally. The "scheduling criteria" of the original task (way back on the sending node) follows the packet as it "migrates" through the system -- and, the actions that it initiates! I.e., if the task that receives it (now operating with its "assumed" priority) [I'm trying to avoid using the word "inherited"] then needs to send a message to some other task (IPC or RPC), the same sort of "scheduling criteria migration" takes place. Until, ultimately, the "reply" is generated or the action "consumed". [This is why my earlier comments regarding scaling "scheduling criteria" to multiple nodes is problematic if they are just "small integers" that only make sense in a "local context"; you can't predict how the actions initiated by a task will propagate through the system as node boundaries (and subsystem boundaries!) present opportunities for those criteria (priorities) to lose meaning -- apples may reign supreme on one node while tangerines are revered on another!]>> If you do this right, very few tasks are marked "ready" at any given >> instant and you'll get more deterministic operation. > > Whenever a task is "not ready", it is blocked waiting for something. These > blockings, and in particular blockings caused by dynamic task interactions that > may or may not occur at run time, create problems for the timing *analysis*. > >>> I believe that there are analysis >>> methods which can find bounds on the response times for these tasks, >>> including queueing delays and communication latencies, but I haven't >>> studied or tried them yet. >> >> It might be worth adding references to a high-res free-running timer >> and accumulating data. If you then wish to construct a latency model, >> you can then have data to see if your model makes any sense. > > I don't want to rely on such measurements, which always depend on the scenarios > and test cases used. I want to have a method for computing response times from > the design, and proving by analysis that deadlines are met in all possible > scenarios.IME, you can only do that in small/trivial systems with predictable execution environments. Aperiodic tasks, "random" events, etc. leave you with a boundless range of possible scenarios that you're trying to cram into a neat little, well dimensioned, box. So, you make a best effort approach and GUARANTEE that you can detect exceptions/failures/faults and respond accordingly. If the Juno probe hadn't fired its engine at the "right" time, it would have used more of its reserve fuel to make a deferred course adjustment. But, there's still VALUE to it performing that operation even AFTER missing its (ideal) deadline. There's a point at which attempting a burn would be pointless -- for the *original* mission goal. But, you can be sure the scientists wouldn't just turn off all the ground equipment and walk home, head in hand! They'd find some *other*, possibly "less VALUable" use for the probe and see how the (fuel) resources could be applied to that OTHER goal -- with an entirely new deadline!> I know how to do that for systems where tasks interact in simple ways, but I > don't yet know how to do it for cases like the above shared channel. There are > tools that should be able to do it, but I have yet to try them out.
Reply by ●July 10, 20162016-07-10
On 16-07-10 12:01 , Don Y wrote:> On 7/7/2016 1:09 PM, Niklas Holsti wrote: >> Don Y wrote: >>> Then, EVERY product that employs a priority-based scheduler should >>> have a >>> FORMAL document in part of its deliverables that clearly states these >>> priorities and the method by which they were derived (as you later call >>> them "the basis for one important set of the mathematical methods for >>> verifying real-time performance: schedulability analysis"). So, the >>> next bloke to look at the code knows exactly why they were chosen and >>> which assumptions he/she must *continue* to operate under for those >>> priority assignments to remain valid. >> >> Ideally yes, just as every mechanical engineering project should >> document its >> design assumptions, stress and strength calculations, etc., and every SW >> project should deliver full design documentation, complete user manuals, >> maintenance manuals, etc. >> >> But seriously, documentation requirements are a different dimension >> from the >> choice of design and implementation methods. Using a systematic design >> method >> does not oblige you to document the design, although it makes it much >> easier to >> document it. > > How can you say that? Do you expect the next soul to read your mind > regarding how and why you made specific choices? Do you expect them > to retrace your design steps to ensure they understand the assumptions > you've ENCODED in your choice of priorities?Design decisions and assumptions resulting in priority assignments are no different from design decisions and assumptions resulting in buffer sizes, control-loop coefficients, and other kinds of design features. You either document them or you don't, depending on your customers' requirements and on your own quality standards.> Or, are they free to change your priorities WITH NO EFFECT ON THE > RESULTING IMPLEMENTATION?Just as free as they are to change buffer sizes and control-loop coefficients. They either know what they are doing (helped by documentation, if it exists) or don't know what they are doing. As I've said, all my day-job projects deliver extensive design documentation, including SW Budget Reports with task priorites and their motivations.> But, I am careful to document how and why each of these mechanisms > are used and "prove" that they *can't* run amok in my description > of the algorithms, etc. And, the assumptions that dictate how > future uses/adaptations must operate.Good for you and your customers. I hope they pay you for it, too. I suspect that for many mass-market products the customer (i.e. the entity who subcontracts SW development) only cares that the SW works, but is not interested in its design details.>>> Ask yourself how many of those documents you've seen? Authored? >> >> Several, for both questions. All the day-job projects I work on have >> such, and >> I am usually the author. > > I suspect you are in the minority as (IME) most folks using priority > based schedulers just "pick numbers" and, later, "tweek them" to make > their design work properly. Usually, with little formal knowledge of > the limitations of the algorithms or the problems that can manifest > (nor the potential solutions *or* the problems with the solutions!) > > Would you care to share any of those rationales?As I have said, I usually rely on deadline-monotonic priority assignment and its response-time analysis. There is a huge literature on scheduling methods and schedulability analysis methods, for hard real-time systems, soft real-time systems, and mixtures. I know only a little bit about the hard-real-time part and almost nothing about the soft-real-time part. In my day-job projects, the soft-real-time part is non-critical and is validated by stress tests on practical-worst-case test scenarios agreed with the customer.> I'm hosting a course > that presents various OS design techniques, scheduling algorithms, > hazzards, etc. and examples of how to (and NOT to) exploit each later > this Fall. "Fabricating" artificial examples always *feels* like an > artifice; presenting REAL examples allows folks to see ACTUAL tradeoffs.If this is about server operating systems, with an unpredictable mix of batch, interactive, and network/transaction applications, it's a different ball-game. And mobile handsets are more and more coming to resemble such servers, with the profusion of applications of every possible nature. I have no contributions to make in those domains. Where will this course be given? Will it be on the net, perhaps as a MOOC?> And, I suspect, your tasks are largely *periodic*? And/or orthogonal to > each other?The hard-real-time tasks tend to be periodic, but sporadic ones also occur. The softer tasks are usually sporadic. Configuration parameters and commands usually flow from the soft-real-time part to the hard-real-time part, and the hard-real-time part usually generates data into buffers that are emptied by the soft-real-time part. Rather typical control & measurement systems. The most common complex task interaction is a communication channel shared by several tasks. For example, a hard-real-time task may be communicating cyclically over the channel, while some soft-real-time tasks must use the same channel for sporadic communications.> I have had exactly one project in 40 years of RT design that fit that > mold -- a process that ran at 200Hz (5ms cycle time) with "hard" > deadlines (in the sense that a deadline missed resulted in a failure > of that *cycle* of the process -- a "defective product" to be discarded). > Note that missed hard deadlines didn't indicate a failed *implementation*! > > By and large, my projects have consisted of aperiodic, cooperating > tasks driven by "semi-random processes"; processes that can't be easily > quantified nor constrained! And, running on under-provisioned hardware. > > E.g., the example up-thread has a bunch of user I/O's: > - barcode reader > - two serial ports > - keypad > - sensor array > - display > - audio annunciator (sound synthesizer) > > There is nothing I can do to prevent a user from scanning a barcode > label at *any* time in the normal operation of the system. I.e., > "bars" (not "labels") can arrive for processing at any time. That's > part of the *appeal* of the system -- the USER drives it instead > of *it* driving the user![snip] If you have decided that static priorities, or some other specific scheduling principles, don't work for you in this application, fine, use something else. But IME static priorities do work for many other applications.> How do I assign scheduling priorities to each of these competing, > aperiodic, asynchronous tasks? Priority effectively assigns > "importance" by directing resources (CPU cycles) to one task at > the expense of others. Which task(s) are more important than > which OTHER tasks?Again you confuse "importance" with "urgency". For example, some inputs have HW buffers (FIFOs), some have not; typically processing the buffered inputs is less urgent than the unbuffered inputs. Or you can provide a SW buffer, filled by an interrupt handler, for an input that does not have a HW buffer, and correspondingly relax the urgency of the task reading the SW buffer. If you are saying that all *inputs* are equally urgent, fine, poll them all in a single task and delegate the input data processing to other tasks in order of urgency of the *outputs*. If you think that all the outputs are equally urgent, too, then do everything in a single task -- the "Task Loop" in our Subject.> No amount of analysis will produce a schedulable task list. The > system EXPECTS to encounter overload.In priority-based systems, the typical solution to overload is to implement load-refusing mechanisms such as ignoring interrupts that are too close together. Whether this can be made to work well enough for the application to react gracefully under overload depends on the system design (including the HW design).> It *expects* to miss deadlines. Exactly because it can't control > the actions of those things that are interacting *with* it.The SW can control the rate at which it accepts inputs for processing.> The *environment* (user + other equipment) determines the > performance of the system.The environment always determines the offered load. In case of overload, depending on the robustness requirements of the SW the SW may be allowed to fail severely or be required to degrade gracefully. Apparently in the system you describe the latter case obtains, but then the proper design depends on the precise definition of "gracefully".> I excel at making robust *soft* real-time systems -- recognizing > that MOST applications actually *are* "soft", even those that > folks want to THINK of as "hard".If a specific graceful handling of overloads is required then I agree that the system acquires soft-real-time aspects. But some hard-real-time aspects may remain, if overload is still considered an anomaly. If the load that you call "overload" is actually expected, I would not call it "overload" at all, just a load for which all inputs cannot be processed in the same way as under lesser loads.> In my current project, I use a "value function" as the scheduling > criteria: a tuple that relates (initial) deadline, a notion of value > and a *hard* deadline (at a hard deadline, the task has no continued > value so can be terminated -- and the associated deadline handler > invoked to compensate for its deletion).As I said, I know almost nothing about soft-real-time scheduling, but I remember seeing published papers with similar methods: value functions that depend on the elapsed time of a computation or the expected time of delivery of the result. I don't doubt that they can work, even if the definition of such value functions seems to me more problematic and ad-hoc than the definition of priority.> Attempting ANY kind of schedulability analysis on a system this > large (thousands of tasks, hundreds of cores) is simply impossible. > All you can hope for is "best effort".Soft-real-time "schedulability" analyses usually focus on maximising the total "value" delivered or the overall Quality of Service, not on computing or minimising worst-case response times.> Or, simply concede that it is not possible to design large systems > and limit your efforts to trivial, "provable" ones.As I think I've said in some earlier post, IMO for hard-real-time systems the analysability limit comes from the increasing complexity of task interactions in large systems. That can be combated with better modular design and with systematic task interaction patterns. While small systems may be "trivial" from an academic point of view, there are certainly applications that are small enough to be provably analysed but still are non-trivial in terms of importance and value to the customers. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by ●July 10, 20162016-07-10
On 16-07-10 12:41 , Don Y wrote:> On 7/9/2016 5:46 AM, Niklas Holsti wrote: >> On 16-07-09 00:56 , Les Cargill wrote: >>> Niklas Holsti wrote: >>>> [snips] >>>> One example of interactions I find difficult is a shared I/O channel or >>>> bus that must be used by various tasks, for various purposes, with most >>>> transmissions being sporadic and such that the sending task must wait >>>> for and check a response transmission. >>> >>> So the sender sends asynchronously then blocks on a receive >>> ( presumably with a timeout ) , with other tasks/ISRs handling >>> the details. You may even have separate send and receive loops, >>> with state indicating the timeout for each receive. >> >> I have no problem *implementing* such things, my problem is >> *analysing* the timing to compute worst-case task response times >> under various load scenarios. This computation must also consider >> the possible latencies of response-generation at the remote end of >> the channel. > > It's no different than a uniprocessor implementation -- *if* you > "migrate" the "scheduling criteria" (in your case, "priorities") > *with* the communication... THROUGHOUT the system!Sure it is different from a uniprocessor, as there are now multiple processors. I haven't yet used any multi-core/multi-processor schedulability-analysis methods, but I know they exist.> I.e., the (local) network stack doesn't just treat enqueued > outgoing packets in FIFO order but, rather, "assumes" the > priority of each "client" associated with each packet. So, > the packet associated with the "highest priority" (using > your scheduling lexicon) gets placed on the wire, "first".In my current application, there are separate transmission queues according to a (rough) priority order of the kind of data, command, or response being transmitted. This will certainly help to limit the queueing delays for high-priority traffic.>>> It might be worth adding references to a high-res free-running timer >>> and accumulating data. If you then wish to construct a latency model, >>> you can then have data to see if your model makes any sense. >> >> I don't want to rely on such measurements, which always depend on the >> scenarios and test cases used. I want to have a method for computing >> response times from the design, and proving by analysis that deadlines >> are met in all possible scenarios. > > IME, you can only do that in small/trivial systems with predictable > execution environments.I don't disagree. But this happens to be my situation. My applications may be "trivial" in this sense, but they provide my bread & butter (or margarine, as the case may be), so they are non-trivial to me.> Aperiodic tasks, "random" events, etc. > leave you with a boundless range of possible scenarios that you're > trying to cram into a neat little, well dimensioned, box.Aperiodic tasks can be limited by minimum inter-arrival times, agreed with the SW customer and (if required) enforced in the SW, and then they can be analysed in the same way as periodic tasks. Alternatively, scheduling methods called "servers" exist that can handle aperiodic or sporadic stuff, in a priority-based or EDF-based environment, while using at most a defined fraction of the resources. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by ●July 11, 20162016-07-11
On Sun, 10 Jul 2016 03:29:01 -0500, Les Cargill <lcargill99@comcast.com> wrote:>>> It could be that the instrumentation necessary for the memory scrubber >>> to figure out it needs to swap, costs too much. And I understand that. >>> >>> And I'm not talking about randomly inserting "sleep()" calls as if >>> you were balancing a wheel :) - I mean constructing a "memory >>> scrubber" object that understands its CPU budget. Running the >>> memory scrubber on constraints based on estimated CPU utilization. >> >> My current memory scubber object understands its CPU budget: it is >> allowed (and even required) to use all the CPU time available at its >> priority level. >> > >Ah, then that mainly follows what I mean then. I've just seen lots of >things where there was no concept of CPU budget and then people are >surprised to find that one was needed. You'd have a cluster >of events occur within some horizon and things would go wonky.What exactly is your "memory scrubber" actually doing ? * Read/ECC correction/writeback a memory location to avoid cosmic ray problems ? * Dynamic memory garbage collection ? * Virtual memory "dirty page" write to the page file ? * Something completely different ? These could all be done in the background in the null task with no knowledge of the CPU budget. Of course, the actual useful work load must be below 100 %, but that is the case for any hard/soft RT to operate properly with or without a scrubber.
Reply by ●July 11, 20162016-07-11
On 16-07-11 06:51 , upsidedown@downunder.com wrote:> On Sun, 10 Jul 2016 03:29:01 -0500, Les Cargill > <lcargill99@comcast.com> wrote: > >>>> It could be that the instrumentation necessary for the memory scrubber >>>> to figure out it needs to swap, costs too much. And I understand that. >>>> >>>> And I'm not talking about randomly inserting "sleep()" calls as if >>>> you were balancing a wheel :) - I mean constructing a "memory >>>> scrubber" object that understands its CPU budget. Running the >>>> memory scrubber on constraints based on estimated CPU utilization. >>> >>> My current memory scubber object understands its CPU budget: it is >>> allowed (and even required) to use all the CPU time available at its >>> priority level. >>> >> >> Ah, then that mainly follows what I mean then. I've just seen lots of >> things where there was no concept of CPU budget and then people are >> surprised to find that one was needed. You'd have a cluster >> of events occur within some horizon and things would go wonky. > > What exactly is your "memory scrubber" actually doing ? > > * Read/ECC correction/writeback a memory location to avoid cosmic ray > problems ?Yes. I thought this was the (only) common meaning of "memory scrubbing", but perhaps I was wrong.> These could all be done in the background in the null task with no > knowledge of the CPU budget.Rest easy, my memory scrubber *is* the background task (as I said in my original mention of the scrubber, a number of posts ago) in the sense that it has the lowest priority of all tasks and never blocks. But the point of the discussion is more general: for real-time SW using priority-based scheduling, it is IMO nonsense to demand that the SW behaviour should not depend on the priority assignment. It may be that I misunderstood that initial statement by Les Cargill, and that we have been discussing and resolving only this misunderstanding, but at least it seems that Les now accepts that the priority-dependence in this example application is not reprehensible. Furthermore, IMO each task should be designed according to its own timing requirements, with inter-task conflicts and scheduling handled in the global design (decomposition into tasks and their priorities). This principle cannot always be followed strictly (for example when tasks are triggered with known offsets relative to some periodic activity) but it should IMO always be the starting point. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by ●July 11, 20162016-07-11
Niklas Holsti wrote:> On 16-07-11 06:51 , upsidedown@downunder.com wrote: >> On Sun, 10 Jul 2016 03:29:01 -0500, Les Cargill >> <lcargill99@comcast.com> wrote: >> >>>>> It could be that the instrumentation necessary for the memory scrubber >>>>> to figure out it needs to swap, costs too much. And I understand that. >>>>> >>>>> And I'm not talking about randomly inserting "sleep()" calls as if >>>>> you were balancing a wheel :) - I mean constructing a "memory >>>>> scrubber" object that understands its CPU budget. Running the >>>>> memory scrubber on constraints based on estimated CPU utilization. >>>> >>>> My current memory scubber object understands its CPU budget: it is >>>> allowed (and even required) to use all the CPU time available at its >>>> priority level. >>>> >>> >>> Ah, then that mainly follows what I mean then. I've just seen lots of >>> things where there was no concept of CPU budget and then people are >>> surprised to find that one was needed. You'd have a cluster >>> of events occur within some horizon and things would go wonky. >> >> What exactly is your "memory scrubber" actually doing ? >> >> * Read/ECC correction/writeback a memory location to avoid cosmic ray >> problems ? > > Yes. I thought this was the (only) common meaning of "memory scrubbing", > but perhaps I was wrong. > >> These could all be done in the background in the null task with no >> knowledge of the CPU budget. > > Rest easy, my memory scrubber *is* the background task (as I said in my > original mention of the scrubber, a number of posts ago) in the sense > that it has the lowest priority of all tasks and never blocks. > > But the point of the discussion is more general: for real-time SW using > priority-based scheduling, it is IMO nonsense to demand that the SW > behaviour should not depend on the priority assignment. It may be that I > misunderstood that initial statement by Les Cargill, and that we have > been discussing and resolving only this misunderstanding, but at least > it seems that Les now accepts that the priority-dependence in this > example application is not reprehensible. >I certainly didn't mean to imply anything "reprehensible" about this state of affairs. Indeed, as you described the design in question in detail, it looks to be perfectly acceptable.> Furthermore, IMO each task should be designed according to its own > timing requirements, with inter-task conflicts and scheduling handled in > the global design (decomposition into tasks and their priorities). This > principle cannot always be followed strictly (for example when tasks are > triggered with known offsets relative to some periodic activity) but it > should IMO always be the starting point. >Also agreed. -- Les Cargill
Reply by ●July 12, 20162016-07-12
On Mon, 11 Jul 2016 16:11:27 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:>On 16-07-11 06:51 , upsidedown@downunder.com wrote: >> On Sun, 10 Jul 2016 03:29:01 -0500, Les Cargill >> <lcargill99@comcast.com> wrote: >> >>>>> It could be that the instrumentation necessary for the memory scrubber >>>>> to figure out it needs to swap, costs too much. And I understand that. >>>>> >>>>> And I'm not talking about randomly inserting "sleep()" calls as if >>>>> you were balancing a wheel :) - I mean constructing a "memory >>>>> scrubber" object that understands its CPU budget. Running the >>>>> memory scrubber on constraints based on estimated CPU utilization. >>>> >>>> My current memory scubber object understands its CPU budget: it is >>>> allowed (and even required) to use all the CPU time available at its >>>> priority level. >>>> >>> >>> Ah, then that mainly follows what I mean then. I've just seen lots of >>> things where there was no concept of CPU budget and then people are >>> surprised to find that one was needed. You'd have a cluster >>> of events occur within some horizon and things would go wonky. >> >> What exactly is your "memory scrubber" actually doing ? >> >> * Read/ECC correction/writeback a memory location to avoid cosmic ray >> problems ? > >Yes. I thought this was the (only) common meaning of "memory scrubbing", >but perhaps I was wrong.That had earlier been my impression too, but googling, quite a lot of strange hits emerged :-). IIRC, when the Hubble (HST) had problems flying through the SAA radiation zone, it was called memory "flushing". Any earlier space usage of scrubbing/flushing ?>> These could all be done in the background in the null task with no >> knowledge of the CPU budget. > >Rest easy, my memory scrubber *is* the background task (as I said in my >original mention of the scrubber, a number of posts ago) in the sense >that it has the lowest priority of all tasks and never blocks.Doing the scrubbing in the null task is easy, as long as the hardware supports uninterruptable memory to memory instructions, so no need to disable interrupts an hence task switching. Things get more problematic with RISC style load/store architectures. Also caches complicates the situation. A single byte read is sufficient to load a full cache line such as 32 bytes, but the writeback is an other issue (write through or write back) how to handle it.







