On 1/7/2020 1:21 AM, upsidedown@downunder.com wrote:> On Mon, 6 Jan 2020 16:10:03 -0700, Don Y <blockedofcourse@foo.invalid> > wrote: > >>> In a typical RT system nearly all (and most of the time all) task are >>> in some sort of wait state waiting for a hardware (or software) >> >> Misconception. That depends on the system's utilization factor >> which, in turn, depends on the pairing of hardware resources to application >> needs. > > In my experience, keep the long time average CPU load below 40-60 % > and the RT system behaves quite nicely.The devices I've designed typically operate at, near or *BEYOND* 100%. Time spent doing nothing means you've got resources that you don't need. Resources = product cost. (most of my past work was done pinching pennies; the idea of *adding* hardware to a design was anathema!) [I can recall recoding FINISHED code to replace 3-byte instructions with 2-byte instructions; or eliding an opcode by rearranging the order of register assignments; or noticing if the CY was set so I could eliminate an increment following an add by replacing it with an add-with carry.... all to see if I could trim a few hundred bytes out of the running/tested binary to remove a ROM device from the bill of materials!] If you design systems to gracefully degrade and recognize that almost all RT problems are "soft" -- or can be made so -- and know how to design in that arena, then there's no real downside to designing for no margin. The system with the barcode reader that I described elsewhere frequently ran into overload. Things would get "sluggish", occasionally. But, never broke. (The barcode reader would typically tie up much of the machine's "real time" for a tenth of a second) The barcode reader, itself, became sort of a game; folks would rub labels back and forth across the detector as fast as they could in an attempt to crash the device. You'd watch the display start to get sluggish. UARTS would stop transmitting. Keypresses would take longer to be recognized. etc. But, it wouldn't take long before "arms got tired" and the system rebounded. (lots of colorful/obscene comments when you'd watch someone doing this sort of thing! :> )
Task priorities in non strictly real-time systems
Started by ●January 3, 2020
Reply by ●January 7, 20202020-01-07
Reply by ●January 7, 20202020-01-07
On 1/6/2020 10:37 PM, upsidedown@downunder.com wrote:>> A "regular" timer IRQ typically drives your timing system. Apps >> then use *that* for their notion of elapsed time (both to measure and >> to wait/delay) >> >>> In some systems an I/O may arm a one shot clock interrupt waiting for >>> the I/O or until a timeout occurs. If the I/O is completed in time, >>> the receiver routine disables the timer request and the timer >>> interrupt never occurs. >> >> Wonderful if you have a HARDWARE timer to dedicate to each such use. > > If you do not need regular timer interrupts, you have at least one > free timer. You just need a clock queue. When the first entry is > entered into the queue also arm the timer for that expiration time. > > When the timer expires, check the clock queue for the next time and > arm the timer accordingly. If there are two entries in the queue with > nearly the same expiration time, combine these to one entry. Special > care is needed when entering new or canceling entries from the queue.That's how I used to handle time ~40 years ago. But, it quickly becomes counterproductive when the number of timers starts to increase. Consider, you may have times that: - control "blinking" indicators/annunciators ("Wax on; wax off") - monitor for loss of AC mains frequency - debounce switches - timeouts for mechanisms (prevent overrun in case of feedback failure) - timeouts on user interactions - "process" timers (e.g., "agitate solution for 3 minutes") - track time-of-day - sample frequency based sensors (VtoFs, CtoFs, etc.) And, it's likely that all of these can be active simultaneously. Silly to put a debounce timer per switch. Likewise, the indicators can (and probably SHOULD) share a timer -- so they all blink in unison, when called upon to do so. If you work at it, you can arrange for mechanisms to share a timer, despite the fact that their *individual* timing requirements won't be coincident. So, you're migrating the timing functionality up into the application -- map the timeouts of each individual mechanism onto a "mechanism timeout timebase" that you maintain in the application, driven by the "mechanism timeout timer". (my current design lets you define a "clock" -- timebase -- to do this sort of thing for you so that the OS can manage that timing, even though it's driven by a coarser "timebase" than other timers). You might be able to combine the mains detection timer with the time-of-day (cuz you're really only interested in *sampling* the mains and, thus, only need frequency lock, not phase). You can "round off" the user's actions to some nearby "clock tick". But, you'll still have to track the time elapsed from that point (in the application??). Etc. What you discover is that you spend a sh*tload of effort manipulating a hardware timer -- and generating MORE interrupts (that need to be serviced). And, despite it all, your "tasks" won't ever achieve that level of precision in their actions unless you wire each clock interrupt to a specific waiting task. Then, the performance of the system of a whole becomes harder to control. And, the design becomes more brittle.> The problem with regular timer interrupt is that the interrupt rate > must be kept quite low (say 10 ms) to avoid interrupt overhead, so if > you sometimes need some 100 us timing resolution, how do you do it ?Huh? Your approach generates MORE interrupts. Task A runs. At time Ta it decides in wants to start a 100us timer. Some time later (epsilon?), task B starts and wants to start a 800us timer. At Ta+100u, the timer IRQ is signaled. At Ta+e+800u, the timer again interrupts for task B. When does task A *respond* to the expiration of that 100us? Will it be in 100.0us? 150us? 200us? And, what about task B?> With a clock queue you can have a 100 us timing and perhaps the next > timing event might happen after 70 ms. Only two timer interrupts in > 70.1 msWhat if tasks decide they like 100us events? And, you end up with 701 of them in 70.1ms? Or, do you support the possibility of a timer request being denied (or -- gasp -- deferred??) If you rethink what your trying to do with these delays/sleeps/timeouts, you can see that there usually isn't a lot of precision nor accuracy required in them. That's what led me to moving to a "timing service" instead of manipulating a hardware timer for that role. If I say I want to delay for 0.5 seconds, I pass that on to the timing service "asynchronously" (wrt the system's "clock"). The timing service -- itself a task! -- is responsible for monitoring the passage of time and notifying me when that interval has passed. It most assuredly WON'T be 500ms later! I may have started the timer 0.99ms after the most recent "tick". Or, perhaps 0.01ms. So, there's an uncertainty of a whole clock period baked in, to start (average error of half a clock). So, we add one clock period to ensure the actual delay is never LESS than what we've requested (e.g., if a 1ms clock, then we want to wait for 501 clocks to expire). Then, the timing service may not run exactly coincident with the clock interrupt (why would we want to do anything in the ISR more than NOTICE that the clock interrupted?). The ISR sets a flag (or raises an event) and the timing service task EVENTUALLY runs to respond to "timer interrupt occurred". Note that this need not be a high-priority activity -- if you are already willing to tolerate "slop" in the timing system! So, the timing system (task) updates the timer(s) and notifies the tasks waiting on each of them (or, lets them poll the results). The task waiting on the particular timer EVENTUALLY runs -- some time after the timing service... which was some time after the timer event... which was some time after the timer actually expired (interrupt latency along with interrupt priorities!) If the task that initiated the timer is a low priority (or, if the system is operating in overload), it may not get a chance to run for considerable time! It's possible that several timer interrupts have transpired beyond the one that expired YOUR timer! <shrug> All you know is that something more than 500ms has expired by the time your "sleep(500)" returns. Is this bad? Only if you RELY on having precise time! Look at the types of applications for delays, timeouts, etc. If I start a motor and KNOW that, worst case, the mechanism will take 8 seconds to return to its home position and, thus, set a timer for ~9 seconds, how upset will I be if that timeout actually occurs in 9.001? Or, 9.1? Or *10* seconds?? If I sense a switch closure and set a timer to delay for 30ms (to allow for debounce) before checking it again, do I care if that delay turns out to be 40ms? 100ms? [Ah, you can complain that the switch's *action* will be variably delayed, based on this debounce timer! Not if you initiate the action on the leading edge of the switch action -- and use the debounce timer to inhibit a *subsequent* "detection"! There, the upper limit on the timer's tolerance is how soon you want to recognize ANOTHER switch action by that same switch!] If I can't poll the AC line *promptly* every ~10ms, will I falsely conclude that the mains have failed? Not if I count on the fact that the line frequency SHOULD be there continuously so I will likely run across a valid signal "soon" -- and detecting the loss of power a few cycles later than it was actually lost isn't going to crash the system. If I'm supposed to blink an indicator at a 1Hz rate, will the user be upset if it blinks at 0.9Hz, instead? IME, time is typically most important when it comes to MEASURING it. Will your scheme let me MEASURE events to that 100us resolution? :> But, when time is used in measurements (often in the design of precision sensors), you already have some specific/dedicated hardware in place for the item to be measured. So, you can allocate hardware to facilitate that measurement. E.g., an early barcode reader that I designed was nothing more than a photodiode wired to a timer input. When the diode "saw" white, it would output a HIGH; LOW for black (or, maybe vice versa... too long ago for details). The operator would move the barcoded item across the sensor. This would output a pulse train that essentially translated bar/space widths into times between transitions. I.e., the time between a to-HIGH transition and the following to-LOW transition would correlate with the width of the white (space) portion of the label. If you assume that the operator's hand speed remains relatively constant over the course of a 1 inch label, then you can normalize "pulse widths" to "bar/space widths". You can program the timer to "watch" alternating edges -- when the input is HIGH (white), tell it to watch for a LOW-going transition; when LOW (black), watch for HIGH-going. In that way, you can read the widths of each bar/space as it passes. Ah, but the timer "runs", once started! You want to be able to capture *when* the desired transition occurred -- despite the fact that the particular timer doesn't have a "capture" register! So, set up the timer to start running when the HIGH-going edge is detected. And, trigger an interrupt when it times out. Preload the timer with "1" so that it counts down to zero very soon after it is triggered. Let the timer interrupt when it reaches zero. This interrupt will always occur one count after the desired edge has been detected on the input pin! As long as a bar/space is not narrower (in time) than this one count, you will not lose data. [A bar/space can take as little as 70us to pass the detector. So, IRQs can be 70us apart -- and you can't inhibit them lest you miss data! And, you *definitely* don't want to force the operator -- already engaged in a "mindless task" -- to have to do something to TELL YOU that he wants to read a barcode. Or, that he wants to discard the previously read barcode in favor of this NEW barcode] But, the IRQ may be sluggish being serviced (you don't want to be forced into making it the highest priority IRQ). So, you rely on the fact that the timer is now RUNNING -- reloaded from a "256" preset -- to tell you how LONG AGO the interrupt was triggered (when the timer was reloaded). This tells you the latency in servicing THIS interrupt instance. You capture the current system time off the system timer and pass the latency and system time into a FIFO. Then, reprogram the timer to watch for the OPPOSITE edge -- and play the latency timing trick, again. A task can leisurely POLL the FIFO to see if there is data available. (no need to make that a high priority task... as long as it can get around to checking the FIFO before it can fill, completely!) When data is available, the task can extract it from the FIFO and, in the process, compute the time differences between successive "edges", taking into account the latencies recorded for each edge observation. The buffer empty, means the IRQ never needs to be disabled as there's nothing to stop the user from scanning another barcode, at any time! Another task can examine these bar/space widths and apply the dimensional rules of the particular codes to convert them into *possible* characters, noting that the first character can start on ANY black bar (which may not be the first one in the buffer!). That task can discard leading noise (one bar plus its following space) each time it determines that its interpretation of the data is inconsistent with the rules of the particular code. Eventually, a valid set of "characters" are encountered. Then, another task can examine the character string to see if *it* conforms to the rules for the code (valid start and stop characters, number of digits, checksum, etc.). Again, a failure means that the raw widths didn't map into a valid label so the first bar+space needs to be discarded and another attempt made to see if the remaining data can map to a valid label. All the while, more data is being produced by the ISR. And, converted to widths by that first task. And those widths converted to characters. And... The design also supplies its own "timeouts" -- e.g., detecting a patch of whitespace after the label found NOT to include an "addendum label" (which should be treated AS IF part of the original label) can be accommodated by mapping the bar TIME widths (of a valid detected label) into physical dimensions -- to get a figure for the rate that the label was passed by the detector. Then, looking for gaps exceeding a fixed physical size, translated back into time. Of course, none of these tasks are given free reign to "run to completion" as they could delay other tasks (add latency to other actions). The sole cost of the reader interface (ignoring the photodiode) is an 8b timer channel. Without the timer, you'd need an edge-programmable interrupt source and low, *fixed* latency NMI (cuz it has to be able to interrupt ANYTHING to guarantee that latency).
Reply by ●January 7, 20202020-01-07
On 1/7/2020 1:54 AM, upsidedown@downunder.com wrote:> On Mon, 6 Jan 2020 16:10:03 -0700, Don Y <blockedofcourse@foo.invalid> > wrote: > > <description of simple RT kernel for 6809 with typically 6 to 12 tasks > skipped> > >>>>> The scheduler checked the task state byte of each created task >>>>> (requiring just 3 instructions/task). If no new higher priority task >>>>> had become runnable, the scheduler performed a simple return from >>>>> interrupt restoring the registers from the local stack and the >>>>> suspended task was resumed. >>>> >>>> Deciding and picking can often be far more complicated actions. >>>> And, the criteria used to make that selection are often not simple >>>> (static?) priorities. > > Assuming a task can be in a WAITING state (not READY), runnable > (READY) and actually RUNNING. The scheduler simply scans the priority > list in priority order and when it detects the first task in READY > state (or already RUNNING) execute that task. The scan omits all tasks > in WAITING state.Something PUT those tasks into those states. Sort them into the correct queues, by "priority", when *that* happens. In that way, the decision time is made constant when you need to do a context switch.> Only if a very large number (dozens) of tasks are expected to be in > WAITING state before detecting the first task in READY state it might > make sense to have a separate queue for READY or RUNNING tasks onlyDo you *force* the developer NOT to have more than a certain number of tasks? It's the use of these "simple" algorithms that don't scale well that I consider as differentiating an RTOS from an MTOS.>> In practice, they tend to be twiddled after-the-fact >> when things don't quite work as you'd HOPED (i.e., you screwed up >> the design and are now using a rejiggering of priorities to try to >> coax it into behaving the way you'd HOPED) > > I may sometimes have to split a task that consumes more than expected > CPU time for its priority level into two task, moving the high time > requirement to a separate task and run it at a lower priority. > >> When you assign arbitrary numeric values to impose an ordering on >> tasks ("relative importance"), do you prepare a formal justification >> for those choices? A *rationale* that defends your choices? > > Of course I check the deadline requirement of each task before > assigning priorities. Those tasks without formal deadline requirement > can often be execute in the null task with lowest priority. This of > course requires that the average CPU load is well below 100 % so that > the null task is sometimes executed.But deadlines vary, over time. This is why priorities are a bad fit as a scheduling criteria. As a deadline approaches, do you tweek the priority to ensure the task gets a chance to run? "Lets make EVERYTHING louder than EVERYTHING else" -- Deep Purple (?) ACTUAL deadlines are relatively easy to grok. And, express numerically. So, folks/developers can relate to the scheduler opting to run X instead of Y at a particular time. But, explaining why X should run at "3" while Y runs at "7" gets into a lot of hand-waving ("What's running at 2? And 5? What about 8?") I'm out. I only "dropped in" to verify that I had set up Thunderbird correctly on this rebuilt machine. I have a fixed budget of time for "correspondence" (my plate is *WAY* too full) and USENET burns up too much for the benefit that it provides (me! :> ) As of this moment, it appears that I've not left any conversations in which I was a participant "dangling". Happy New Year!
Reply by ●January 7, 20202020-01-07
On Tue, 07 Jan 2020 07:05:58 +0200, upsidedown@downunder.com wrote:>On Mon, 06 Jan 2020 13:29:46 -0500, George Neuner ><gneuner2@comcast.net> wrote: > >>On Mon, 06 Jan 2020 15:39:01 +0200, upsidedown@downunder.com wrote: >> >>>Smells like a time sharing system, in which the quantum defines how >>>much CPU time is given to each time share user before switching to >>>next user. >> >>A "jiffy" typically is a single clock increment, and a timeslice >>"quantum" is some (maybe large) number of jiffies. >> >>E.g., a modern system can have a 1 microsecond clock increment, but a >>typical Linux timeslice quantum is 10 milliseconds. >, >There is no way that the interrupt frequency would be 1 MHz. The old >Linux default interrupt rate (HZ) was 100 and IIRC it is now 1000 Hz. >That microsecond is just the time unit used in the time accumulator. >With HZ=100 (10 ms) 10000 was added to the time accumulator in each >clock interrupt. By using addends >= 10001 the clock will run faster >and <= 9999 slower. This is useful if the interrupt rate is not >exactly as specified or when you want a NTP client to slowly catch up >to the NTP server time, without causing time jumps.Note that I said "can" rather than "does". You do have to modify the kernel to get the constants right ... but the clock interrupt handler is extremely simple: it's in kernel mode, and all it does is increment a variable. You easily _can_ run a 10KHz clock on most Intel/AMD systems of the last 15 years. More modern multicore chips (e.g, Haswell onward) can handle much higher interrupt rates - it's just a question of whether your clock can provide them ... and while commodity system board clocks don't, there certainly there are addons available that will.>100 ns time units in the time accumulator have been used in VMS and >WinNT for decades.Yes. But that really was an illusion because the clock interrupt rate in NT was 1Khz or less (again depending on hardware). George
Reply by ●January 7, 20202020-01-07
On Mon, 6 Jan 2020 19:50:23 -0700, Don Y <blockedofcourse@foo.invalid> wrote:>Hi George, > >On 1/6/2020 1:05 PM, George Neuner wrote: >> On Sun, 5 Jan 2020 01:58:04 +0100, pozz <pozzugno@gmail.com> wrote: >> >>> Could you point on a good simple material to study (online or book)? >> >> Tony Hoare's book "Communicating Sequential Processes" is online at: >> http://www.usingcsp.com/cspbook.pdf >> >> It will teach you everything you need to know about how to use message >> passing to solve synchronization and coordination problems. It won't >> teach you about your specific operating system's messaging facilities. > >It's not a panacea. It introduces design and run-time overhead, as well >as adding another opportunity to "get things wrong". <frown>Yes, but not the point. The question was study material, and Hoare's book shows how to do it right.>And, can complicate the cause-effect understanding of what's happening >in your system (esp if you support asynchronous messages... note how >many folks have trouble handling "signal()"!)Yeah, but signal is a lossy channel. MPC over network also might be lossy (though not necessarily), but within a single host it is reliable. The case of within a single host (mostly) is the subject of this discussion.>*But*, the UNDERstated appeal is that it exposes the interactions between >entities (that can be largely *isolated*, otherwise). If you engineer >those entities well, this "goodness" will be apparent in the nature of >the messages and their frequency. > >E.g., in my world, looking at the IDL goes a LONG way to telling you >what CAN happen -- and WHERE!George
Reply by ●January 7, 20202020-01-07
On 1/7/2020 20:09, George Neuner wrote:> ..... > > You do have to modify the kernel to get the constants right ... but > the clock interrupt handler is extremely simple: it's in kernel mode, > and all it does is increment a variable. You easily _can_ run a 10KHz > clock on most Intel/AMD systems of the last 15 years.Didn't they get around to introducing an architecture level timebase register yet? Like the one power has since its origin (80-s I think)? Or anything like the decrementer register? I used to check Intel every 5 years or so to see if they had become usable to me, gave up on that about maybe 15 years ago. Dimiter ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
Reply by ●January 7, 20202020-01-07
On 8/1/20 5:53 am, Dimiter_Popoff wrote:> On 1/7/2020 20:09, George Neuner wrote: >> ..... >> >> You do have to modify the kernel to get the constants right ... but >> the clock interrupt handler is extremely simple: it's in kernel mode, >> and all it does is increment a variable. You easily _can_ run a 10KHz >> clock on most Intel/AMD systems of the last 15 years. > > Didn't they get around to introducing an architecture level timebase > register yet? Like the one power has since its origin (80-s I think)? > Or anything like the decrementer register? > > I used to check Intel every 5 years or so to see if they had become > usable to me, gave up on that about maybe 15 years ago.You didn't look closely enough. The TSC register has been in the Intel architecture since the first Pentium. CH
Reply by ●January 7, 20202020-01-07
On Tue, 07 Jan 2020 13:09:29 -0500, George Neuner <gneuner2@comcast.net> wrote:>On Tue, 07 Jan 2020 07:05:58 +0200, upsidedown@downunder.com wrote: > >>On Mon, 06 Jan 2020 13:29:46 -0500, George Neuner >><gneuner2@comcast.net> wrote: >> >>>On Mon, 06 Jan 2020 15:39:01 +0200, upsidedown@downunder.com wrote: >>> >>>>Smells like a time sharing system, in which the quantum defines how >>>>much CPU time is given to each time share user before switching to >>>>next user. >>> >>>A "jiffy" typically is a single clock increment, and a timeslice >>>"quantum" is some (maybe large) number of jiffies. >>> >>>E.g., a modern system can have a 1 microsecond clock increment, but a >>>typical Linux timeslice quantum is 10 milliseconds. >>, >>There is no way that the interrupt frequency would be 1 MHz. The old >>Linux default interrupt rate (HZ) was 100 and IIRC it is now 1000 Hz. >>That microsecond is just the time unit used in the time accumulator. >>With HZ=100 (10 ms) 10000 was added to the time accumulator in each >>clock interrupt. By using addends >= 10001 the clock will run faster >>and <= 9999 slower. This is useful if the interrupt rate is not >>exactly as specified or when you want a NTP client to slowly catch up >>to the NTP server time, without causing time jumps. > >Note that I said "can" rather than "does". > >You do have to modify the kernel to get the constants right ... but >the clock interrupt handler is extremely simple: it's in kernel mode, >and all it does is increment a variable. You easily _can_ run a 10KHz >clock on most Intel/AMD systems of the last 15 years.In WinNT you can use the user mode function SetSystemTimeAdjustment() to control what number of 100 ns units are added to the time accumulator during each clock interrupt.>More modern multicore chips (e.g, Haswell onward) can handle much >higher interrupt rates - it's just a question of whether your clock >can provide them ... and while commodity system board clocks don't, >there certainly there are addons available that will.It doesn't seem to be very sensible to use an extremely high interrupt rate (from any source) in highly pipelined processors, since each interrupt will flush the pipeline.>>100 ns time units in the time accumulator have been used in VMS and >>WinNT for decades. > >Yes. But that really was an illusion because the clock interrupt rate >in NT was 1Khz or less (again depending on hardware).When I experimented with the SetSystemTimeAdjustment() in Win2000 the single processor interrupt frequency was 100 Hz (10 ms) and 64 Hz (15.625 ms) in dual processor machines, thus in a dual processor machine 156250 is added to the 64 bit time accumulator during each clock interrupt. Divide the time accumulator by 10 million, you get the number of seconds since a start date. By adding a large constant (instead of 1) into the time accumulator, no matte what exotic interrupt frequency (say 123.456789 Hz) is used, rounding errors are much smaller. Conceptually the time accumulator is similar to the phase accumulator used in direct digital RF synthesis (DDS).
Reply by ●January 7, 20202020-01-07
On Wed, 8 Jan 2020 08:43:56 +1100, Clifford Heath <no.spam@please.net> wrote:>On 8/1/20 5:53 am, Dimiter_Popoff wrote: >> On 1/7/2020 20:09, George Neuner wrote: >>> ..... >>> >>> You do have to modify the kernel to get the constants right ... but >>> the clock interrupt handler is extremely simple: it's in kernel mode, >>> and all it does is increment a variable.� You easily _can_ run a 10KHz >>> clock on most Intel/AMD systems of the last 15 years. >> >> Didn't they get around to introducing an architecture level timebase >> register yet? Like the one power has since its origin (80-s I think)? >> Or anything like the decrementer register? >> >> I used to check Intel every 5 years or so to see if they had become >> usable to me, gave up on that about maybe 15 years ago. > >You didn't look closely enough. The TSC register has been in the Intel >architecture since the first Pentium.The CPU clock, which drives the TSC, has horrible temperature stability. There are also issues with various power saving modes as well as with multicore and multiprocessors (set affinity to one processor only).
Reply by ●January 7, 20202020-01-07