Reply by D Yuniskis January 13, 20112011-01-13
Hi Stefan,

On 1/7/2011 11:04 AM, Stefan Reuther wrote:
> D Yuniskis wrote: >> On 1/4/2011 3:29 PM, Stefan Reuther wrote: >>>>> That's just my point: design the system that this never happens. Sure >>>>> this is harder than doing a desktop best-effort system. >>>> >>>> See above. (In such an environment) you *eventually* come >>>> to a situation where a user is asking more of you (device) >>>> than you can do with the fixed resources in your "box". >>>> If you *must* always be able to do everything, you end up >>>> with more in the box than you need -- or, lots of dedicated >>>> "little boxes". >>> >>> You still have the option to know this beforehand and reject it. >> >> Then you are essentially removing features/capabilities from >> your product just to avoid the POSSIBILITY of having to deal >> with this at run time. Even if the circumstances never actually >> materialize! > > Exactly. And if you express it this way, why not. I call it "better safe > than sorry".
The problem is you can only solve problems that can be 100% specified at design time. I.e., you'll never come up with an iPhone (e.g.) or other "expandable" device.
>> You're avoiding the issue (i.e., not even *knowing* if you have >> missed a deadline) by claiming that you handle "all cases, 100% >> of the time". I.e., why *detect* something if you can't handle it? > > I know that I have to produce audio samples at 44.1 kHz rate. I have > designed my system this way. The hardware can still handle that I not > produce them fast enough, because I configured my hardware transmitter > to send silence in this case. This catches the case that I happened to > make a mistake in the design (which I do not make alone, and do not > implement alone, and cannot formally prove in any case).
What do you do if you miss an audio packet for your cell phone? Do you even *know* that you missed it?
>> For example, I have a tiny audio client (NIC, CPU, stereo amp) >> with fixed (minimal) resources. It has some signal processing >> abilities that consume resources. If the current network >> (server) conditions deteriorate to a point where the client >> can't reliably produce audio with the existing buffer sizes, >> it has three options: > > But then you have a non-realtime component in the data path, namely the > network, and reacting on that is of course necessary.
Why is the network *not* a real-time component? In my case, I control the entire "system" so the traffic on the network is of my design, the protocol stacks have been designed with deterministic behavior, etc. But, like the other components, it is explicitly designed to deal with "overload" because it knows that the other components using it have mechanisms to cope with this. If, OTOH, the "server" happened to "notice" that packets were not getting out onto the wire "before the deadline" and simply *stopped* working, then I will have designed a brittle system.
> Or do you measure "oops, this DecodeMPEGFrame took too long, this seems > to be a complicated MPEG file, let's ask if they have this in cheap > ADPCM, too?".
I look at the actual timeliness of each "result" in the system and adjust the system's resources, dynamically, to maximize the "value" of the functionality that it provides. E.g., if that means shutting down or altering a "desirable" feature in order to continue providing a "necessary" feature, so be it.
> Of course my audio also starts stuttering if the CD drive doesn't give > me enough audio data in time. But the system is designed to have enough > CPU power under any circumstances, and have enough memory to compensate > "typical" CD problems, so I don't have to ask the GUI people "hey, drop > your frame rate a bit, I need more power to decode this file".
Reply by D Yuniskis January 12, 20112011-01-12
Hi Thad,

<frown>  I'm still trying to wrap my head around how to
commit this to a real implementation -- and relate it to the
user in a scheme he/she can "grok".

On 1/8/2011 9:16 AM, Thad Smith wrote:
> On 1/1/2011 10:52 PM, D Yuniskis wrote: >> >> On 1/1/2011 9:54 AM, Thad Smith wrote: >>> I suggest focusing on the real-world cost of shedding resources. If a >>> tasks yields 100 kB of memory, what is the cost to the user -- an extra >>> 300 ms response time, perhaps? The manager may say "I'm willing to >>> accept 1000 ms additional delay, how much memory can you release for >>> me?"
>> The appeal of your approach is that it speaks directly to the >> reason *behind* "using surplus resources" -- most often, to >> improve response time or execution speed (similar aspects). > > My understanding is that you want an intelligent tradeoff. Relating them > to a common single parameter is the technique. This is done within the > context of satisfying fixed constraints. It is similar to an economic > system where money is the common metric within physical, legal, and > chosen ethical constraints.
Understood. What I am trying to do is figure out how this "currency" would work. E.g., the only way I can visualize a scheme where "time" can be the currency is if a task makes bids like "T time for M memory" (again, I am only dealing with memory in these examples, so far). So, an MP3 player might bid "20 ms for 1 unit" while another application that needs 10 units at a time (to actually *do* anything useful) could bid "35 ms for 10 units". In this scenario, the MP3 player wins as the other application is effectively bidding 3.5ms/unit. However, this other task would never bid on just one unit as it can't *do* anything with just one unit. Similarly, the MP3 player might never bid on 10 units. Or, if "forced" to do so, it would bid something disproportionate since it isn't "worth much" to it to have all that extra memory. (that's the only way I can see time being a "negotiable" quantity)
>> I see several problems with committing this to practice, though. >> >> First, the degenerate case -- where the kernel is DEMANDING those >> resources. How does it frame the "proposition" to the task: >> "I'm willing to accept INFINITE additional delay, how much memory >> can you release for me?" :-/ > > That's the easy case! Release all of it ;-) or all of it within the > fixed system constraints.
The point I was making was how to express the "bidding". I guess the first step is to decide how the "pricing" process works. I.e., does the kernel set a price and have tasks say how much they would be willing to buy *at* that price? Or, do the tasks make bids for what they would like and the kernel arbitrates between them... (the reference framework has a big impact on the semantics)
>> You also have to deal with relating resources to time/latency. >> For example, the "thinking ahead" chess automaton can probably give >> you a numeric value: megabytes per millisecond (i.e., looking >> at how much memory it takes to "think ahead" 1 millisecond). >> But, this might not be a finely divisible quantum. The automaton >> might need 10MB chunks to do anything "worthwhile" (note that >> I have no idea what the resource requirements of such an algorithm >> would actually be. I am just throwing wild numbers out for >> illustrative purposes. If I used a "counting application" as an >> example, it would be hard to talk about megabytes with a straight >> face! :> ) > > The primary task is to choose the best metric. I choose elapsed time, > but that might not be the primary one for any particular system. For > each task you need to establish an approximate correlation to the > metric. It doesn't have to be perfect.
Time makes sense from an engineering perspective. But, i am not sure it makes sense from the user's point of view. E.g., it requires the user to understand more about the nature of the various tasks.
>> Furthermore, it might be difficult for that automaton to decide *which* >> chunk of memory to discard (if, for example, it only is currently >> using enough to think one move ahead... what *fraction* of that >> move should it discard?). > > That's a separate problem which faces any task needing to shed resources > given an overall optimization technique.
Yes. I'm just thinking aloud...
>> The other problem is that it might penalize or reward applications >> unfairly. I.e., one application could end up frequently forfeiting >> its resources while others never do. For exxample, telling the >> MP3 player that it can be 1000ms late on it's deadline(s) would >> essentially cause it to dump ALL of its resources: "Heck, I don't >> even have to START decoding because I'll have plenty of time >> AFTER my deadline to do what needs to be done!" (and, does the >> 1000ms apply to all periodic deadlines thereafter?) > > And that could be the best system response. The goal is fairness among
<?> I assume you mean "is NOT fairness" (?)
> tasks but overall system performance, which may be best served in > demanding phases by shutting down certain functions. > >> But, the biggest implementation problem I find is trying to map this >> into units that you could use to value specific resources. How >> do tasks decide what they want and whether or not they can "afford" >> it? > > What tasks "want" is to satisfy their constraints and optimize certain > parameters, such as update rate. Again it comes down to mapping separate > metrics (display refresh rate, for example) onto an overall system > quality. Some analysis is required and perhaps some configuration for > particular applications.
What I would like is a "currency" that implicitly constrains tasks based on the current value of that currency (wrt the resources it is buying). So, if a task can't find a way to optimize itself in a given "commodity market", it can just drop out of the market (i.e. exit()). So, this activity can be handled automatically without having to prompt the user. E.g., if gasoline is $4/gallon, people "self select" whether they will be traveling over a given holiday -- or, alter their destinations to fit within their "resource budget"
>> How does the user configure "priorities"? I guess he could >> specify a "tardiness tolerance" for each application and let them >> figure out what that would cost (in terms of resources). But, what >> prevents an application from saying, "Ah, I have a tardiness tolerance >> of 10 minutes! I *need* 100MB of memory!!" (how does the kernel >> decide what a fair mapping of resources to "tardiness" would be? > > Look at an economic analogy: if release is delayed two months it will > cost us $500 k in sales. If we release on time, we estimate an > additional $200 k in support and $200 k in extra engineering time. This > is contrived, but a similar type of problem trading off disparate > choices with a common metric.
Understood. I'm just trying to see how to express it in a way that makes sense to a user. E.g., what if another release (task) has corresponding figures of $100K sales/40K support/40K engineering and yet another has 1M sales/400K support/400K engineering...? And, change the "two months" to "3 weeks" in one case vs. 1 year in another (i.e., it is hard to look at the numbers *intuitively* and figure out where the dollars are best spent)
Reply by Thad Smith January 8, 20112011-01-08
On 1/1/2011 10:52 PM, D Yuniskis wrote:
> Hi Thad, > > On 1/1/2011 9:54 AM, Thad Smith wrote: >> I suggest focusing on the real-world cost of shedding resources. If a >> tasks yields 100 kB of memory, what is the cost to the user -- an extra >> 300 ms response time, perhaps? The manager may say "I'm willing to >> accept 1000 ms additional delay, how much memory can you release for me?" > > I had to read this a couple of times to make sure I understood > your point. So, if I've *missed* it, I guess that means "a couple" > wasn;t enough! :> > > [by "manager" I assume you mean the kernel -- or it's agent -- in > regards to "renegotiating" resource (re)distribution.]
Yes -- maybe a little too much anthropomorphism.
> The appeal of your approach is that it speaks directly to the > reason *behind* "using surplus resources" -- most often, to > improve response time or execution speed (similar aspects).
My understanding is that you want an intelligent tradeoff. Relating them to a common single parameter is the technique. This is done within the context of satisfying fixed constraints. It is similar to an economic system where money is the common metric within physical, legal, and chosen ethical constraints.
> I see several problems with committing this to practice, though. > > First, the degenerate case -- where the kernel is DEMANDING those > resources. How does it frame the "proposition" to the task: > "I'm willing to accept INFINITE additional delay, how much memory > can you release for me?" :-/
That's the easy case! Release all of it ;-) or all of it within the fixed system constraints.
> You also have to deal with relating resources to time/latency. > For example, the "thinking ahead" chess automaton can probably give > you a numeric value: megabytes per millisecond (i.e., looking > at how much memory it takes to "think ahead" 1 millisecond). > But, this might not be a finely divisible quantum. The automaton > might need 10MB chunks to do anything "worthwhile" (note that > I have no idea what the resource requirements of such an algorithm > would actually be. I am just throwing wild numbers out for > illustrative purposes. If I used a "counting application" as an > example, it would be hard to talk about megabytes with a straight > face! :> )
The primary task is to choose the best metric. I choose elapsed time, but that might not be the primary one for any particular system. For each task you need to establish an approximate correlation to the metric. It doesn't have to be perfect.
> Furthermore, it might be difficult for that automaton to decide *which* > chunk of memory to discard (if, for example, it only is currently > using enough to think one move ahead... what *fraction* of that > move should it discard?).
That's a separate problem which faces any task needing to shed resources given an overall optimization technique.
> The other problem is that it might penalize or reward applications > unfairly. I.e., one application could end up frequently forfeiting > its resources while others never do. For exxample, telling the > MP3 player that it can be 1000ms late on it's deadline(s) would > essentially cause it to dump ALL of its resources: "Heck, I don't > even have to START decoding because I'll have plenty of time > AFTER my deadline to do what needs to be done!" (and, does the > 1000ms apply to all periodic deadlines thereafter?)
And that could be the best system response. The goal is fairness among tasks but overall system performance, which may be best served in demanding phases by shutting down certain functions.
> But, the biggest implementation problem I find is trying to map this > into units that you could use to value specific resources. How > do tasks decide what they want and whether or not they can "afford" > it?
What tasks "want" is to satisfy their constraints and optimize certain parameters, such as update rate. Again it comes down to mapping separate metrics (display refresh rate, for example) onto an overall system quality. Some analysis is required and perhaps some configuration for particular applications.
> How does the user configure "priorities"? I guess he could > specify a "tardiness tolerance" for each application and let them > figure out what that would cost (in terms of resources). But, what > prevents an application from saying, "Ah, I have a tardiness tolerance > of 10 minutes! I *need* 100MB of memory!!" (how does the kernel > decide what a fair mapping of resources to "tardiness" would be?
Look at an economic analogy: if release is delayed two months it will cost us $500 k in sales. If we release on time, we estimate an additional $200 k in support and $200 k in extra engineering time. This is contrived, but a similar type of problem trading off disparate choices with a common metric. -- Thad
Reply by Stefan Reuther January 7, 20112011-01-07
D Yuniskis wrote:
> On 1/4/2011 3:29 PM, Stefan Reuther wrote: >>>> That's just my point: design the system that this never happens. Sure >>>> this is harder than doing a desktop best-effort system. >>> >>> See above. (In such an environment) you *eventually* come >>> to a situation where a user is asking more of you (device) >>> than you can do with the fixed resources in your "box". >>> If you *must* always be able to do everything, you end up >>> with more in the box than you need -- or, lots of dedicated >>> "little boxes". >> >> You still have the option to know this beforehand and reject it. > > Then you are essentially removing features/capabilities from > your product just to avoid the POSSIBILITY of having to deal > with this at run time. Even if the circumstances never actually > materialize!
Exactly. And if you express it this way, why not. I call it "better safe than sorry".
> You're avoiding the issue (i.e., not even *knowing* if you have > missed a deadline) by claiming that you handle "all cases, 100% > of the time". I.e., why *detect* something if you can't handle it?
I know that I have to produce audio samples at 44.1 kHz rate. I have designed my system this way. The hardware can still handle that I not produce them fast enough, because I configured my hardware transmitter to send silence in this case. This catches the case that I happened to make a mistake in the design (which I do not make alone, and do not implement alone, and cannot formally prove in any case).
> For example, I have a tiny audio client (NIC, CPU, stereo amp) > with fixed (minimal) resources. It has some signal processing > abilities that consume resources. If the current network > (server) conditions deteriorate to a point where the client > can't reliably produce audio with the existing buffer sizes, > it has three options:
But then you have a non-realtime component in the data path, namely the network, and reacting on that is of course necessary. Or do you measure "oops, this DecodeMPEGFrame took too long, this seems to be a complicated MPEG file, let's ask if they have this in cheap ADPCM, too?". Of course my audio also starts stuttering if the CD drive doesn't give me enough audio data in time. But the system is designed to have enough CPU power under any circumstances, and have enough memory to compensate "typical" CD problems, so I don't have to ask the GUI people "hey, drop your frame rate a bit, I need more power to decode this file". Stefan
Reply by D Yuniskis January 6, 20112011-01-06
Hi Stefan,

On 1/4/2011 3:29 PM, Stefan Reuther wrote:
>>> That's just my point: design the system that this never happens. Sure >>> this is harder than doing a desktop best-effort system. >> >> See above. (In such an environment) you *eventually* come >> to a situation where a user is asking more of you (device) >> than you can do with the fixed resources in your "box". >> If you *must* always be able to do everything, you end up >> with more in the box than you need -- or, lots of dedicated >> "little boxes". > > You still have the option to know this beforehand and reject it.
Then you are essentially removing features/capabilities from your product just to avoid the POSSIBILITY of having to deal with this at run time. Even if the circumstances never actually materialize! Visit a medical office and see what the lack of integration results in. Do you think a company that designs EKG's can't *also* design a pulse oximiter, infrared thermometer, digital sphygmomanometer, heparin pump, etc.? So, why have so many dedicated boxes -- each with their own screen and "user interface conventions"? (this is slowly changing as that industry realizes they can't afford the duplication of hardware, maintenance costs, etc.) E.g., one would think someone shelling out $1,000,000 for a tablet press could *surely* afford an extra $10,000 for an ejection force monitor -- yet, you find that they *don't*! OTOH, if you offer that feature as one of a suite of features (NOT ALL OF WHICH CAN WORK AT ALL TIMES IN ALL CONDITIONS) and charge ~$1,000 for it, suddenly you have a competitive advantage: "Sure, we'll take it!"
> I prefer this a lot over "trying, hoping for the best, and cleaning up > the mess if it didn't work" aka handling missed deadlines.
You are assuming you can predict everything that can happen and address all of those things. Sure, you can say, "well, if this happens we need X time to recover..." but that's just a CYA way of saying "we won't deal with conditions where we have to react quicker (it's the customer's problem). Handling missed deadlines doesn't have to be expensive. E.g, the tablet press example I mentioned (another reply) can handle the worst case missed deadline (e.g., a "bad" tablet being erroneously accepted) by shutting down the tablet press and lighting a big red light. If it misses a less important event (e.g., an ejection force profile), it simply "returns no data" for that event. You're avoiding the issue (i.e., not even *knowing* if you have missed a deadline) by claiming that you handle "all cases, 100% of the time". I.e., why *detect* something if you can't handle it?
> If I know I cannot do X and Y simultaneously, I decide which of them is > more important, and then *deterministically* suspend one of them.
Which is exactly what *I* do. But, only after I *know* I can't handle both of them (because the LEAST IMPORTANT ONE ends up missing its deadline). You can watch how "it" is working and tailor your approach/algorithm to what your current operating conditions are. For example, I have a tiny audio client (NIC, CPU, stereo amp) with fixed (minimal) resources. It has some signal processing abilities that consume resources. If the current network (server) conditions deteriorate to a point where the client can't reliably produce audio with the existing buffer sizes, it has three options: - get the server to transcode the audio to a lower bit rate (but, I am at its mercy so I can't count on this being a viable option in any particular situation) - get the server to switch to a different codec (this is expensive as it can require replacing the code in the client "on-the-fly"; and, the server may not want to comply) - shed capabilities (e.g., some of the signal processing though this affects the ongoing quality of the audio experience -- different aspects have different costs) - drop frames (least desirable) Sure, I can avoid all of this "work" -- I can either increase the resources in the client *or* change the specification of the device (i.e., make the problem go away by just claiming it is beyond the scope of the device) When I do a new design, the first thing I do is research the application. Often, that means talking to users. *Usually*, it means disregarding what they *say* (in favor of determining what they actually *mean*). It is helpful to pose value questions to the user: "What if..." and "What if it *can't*...". My favorite scenario (which *ALWAYS* comes up) is the "I don't care" response. My stock reply is to focus on the notes I am taking and just audibly say something like "... shut down and catch fire". :> It's amazing how quickly they can rephrase that "I don't care"! Then, the application is factored into a reasonably fine set of "chores" (avoiding the use of the word "tasks"). The temporal requirements of each are identified -- do they have hard deadlines or soft ones. Most "chores" have *some* temporal aspects even if they aren't what you would traditionally think of as RT (but they tend to have very *soft* ones). Then, the *consequences* of missed deadlines are considered -- what is it worth to the application/user to meet this deadline (and what do you lose if you meet it "late"). It's only at this point that you can begin to address real hardware/software/requirements tradeoffs. The most important chores get addressed first -- regardless of their temporal requirements. Then, lesser important chores get added into the mix until you have fleshed out all of the wish list. You can map "importances" (avoiding the term "prioritites") to each of these chores. This lets you apportion resources and gives you an idea of what the maximal capabilities of your system will be. E.g., "I can keep the ignition timing dead to nuts, ensure the ABS is always available, run the emissions controls and any three of the following..." If your marketing folks tell you "that's not acceptable, it must do...", you can now counter with "*that* will cost you..." I had one group *insist* that they needed a certain feature in a product design. That feature would complicate the design and add considerably to the product's cost. I was able to tell them (from *their* sales records) that only *one* of their customers had ever asked for that particular optional configuration (at which point, top management reminisced that the particular option had probably never been *used* by that customer!). [i.e., you have to know what to *ignore* from your users. Salesmen always *want* everything imaginable -- and don't want it to COST anything! Forcing them to put sales projections on particular configurations so that pricing related to development costs is the easiest way to get them to rethink their "demands"]
> However, this happens rarely enough that I can't come up with a real- > time example (we've a few instances of this happening in batch tasks > which would miss their "soft" deadlines otherwise).
Have your satellite radio *also* control the ignition timing of the vehicle. Then, post your results :>
Reply by Stefan Reuther January 4, 20112011-01-04
Hi,

D Yuniskis wrote:
> On 1/3/2011 6:01 AM, Stefan Reuther wrote: >> Of course this is the case in the real world, too. >> >> User inputs have debouncing, so you can be sure the user will not hit >> that switch more than three times in a second. Networks have bitrates, > > Sure, but you don't know that he isn't going to hit > "Full Speed Forward" and, a tenth of a second later, hit > "Full Speed Reverse", etc. I.e., you can't (reliably) > predict the future -- yet have to cope with it.
But, for that given example, it's easy, because I'm allowed certain reaction times :-) The "keyboard driver" must react upon user input immediately. It must recognize the "Forward" request and the "Reverse" request to make sure nothing gets lost. I can just periodically check the user's last will, at places I'm ready to process it.
>> Honestly? No. When I buy a hard-disk recorder which claims to be able to >> record two channels at once and let me watch a third, I expect it to >> work. That's what I pay them for. Plugging a TV receiver into my > > Correct. But, if that "appliance" can also make phone calls, > control the spark plug firing sequence in your automobile > *and* receive/decode satellite radio broadcasts, would you > be upset if that third video stream had visual artifacts > resulting from "missed deadlines", etc.?
If it's noticeable, yes! Of course I get annoyed if audio gets distorted when I'm driving at 4000 rpm (when spark plug control has much work to do).
> *That's* the sort of devices I'm involved with. The user > knows the device can't do *everything* (just like a user > knows his PC can't run *every* application CONCURRENTLY that > it has loaded onto it).
People who follow my company's press releases know that we make car stereos / satnav. And the people who drive these cars do not know what computationally-intensive processes happen in there. Okay, people who are into computer graphics may understand that the digital map frame rate drops in the center of Paris with thousands of little streets compared to some Australian outbacks with the next village after 500 miles. But even they - let alone Joe Sixpack - will not understand that the frame rate depends on the radio channel they're listening to. Digital radio is much more computationally intensive than analog FM, plus it depends heavily upon the codec and configuration in use by the transmitter, which the user doesn't even see. Well, it was hard to make this work, but we did it.
>> That's just my point: design the system that this never happens. Sure >> this is harder than doing a desktop best-effort system. > > See above. (In such an environment) you *eventually* come > to a situation where a user is asking more of you (device) > than you can do with the fixed resources in your "box". > If you *must* always be able to do everything, you end up > with more in the box than you need -- or, lots of dedicated > "little boxes".
You still have the option to know this beforehand and reject it. I prefer this a lot over "trying, hoping for the best, and cleaning up the mess if it didn't work" aka handling missed deadlines. If I know I cannot do X and Y simultaneously, I decide which of them is more important, and then *deterministically* suspend one of them. However, this happens rarely enough that I can't come up with a real- time example (we've a few instances of this happening in batch tasks which would miss their "soft" deadlines otherwise). Stefan
Reply by D Yuniskis January 3, 20112011-01-03
Hi Stefan,

On 1/3/2011 6:01 AM, Stefan Reuther wrote:
>> If you treat hard deadlines as MUST be met (else the system >> is considered broken/failed), then anything with asynchronous >> inputs is a likely candidate for "can't be solved" -- because >> you can't guarantee that another input won't come along >> before you have finished dealing with the first... "running out >> of REAL time". Clearly, that isn't the case in the real >> world so, either these aren't "hard" deadlines *or* they >> are being missed and the world isn't coming to an end! :> > > Of course this is the case in the real world, too. > > User inputs have debouncing, so you can be sure the user will not hit > that switch more than three times in a second. Networks have bitrates,
Sure, but you don't know that he isn't going to hit "Full Speed Forward" and, a tenth of a second later, hit "Full Speed Reverse", etc. I.e., you can't (reliably) predict the future -- yet have to cope with it.
> so you can be sure that you don't get more than X frames per second. > Audio has sample rates, so you can be sure to receive exactly 44100 / > 48000 samples per second (and have to produce the same amount). Mass > storage has seek and read times. Video has frame rates. > > At least in the systems I work on. So I know precisely how many CPU > cycles I may use to decode a MP3 frame. > >>> associated problems, like having to convince the customer that the file >>> they've found which misses deadlines on every other frame is the >>> absolute exception, because nobody else puts 20 Mbps video on a floppy >>> disc with 99% fragmentation or something like that. >> >> What if he wants to put a 100MB video on that floppy? >> Some things just can't be done. Deal with it. :> >> >> Even consumers are starting to get enough sophistication >> that they understand that a machine (tends to) "slows down" when >> doing multiple things at once. > > Honestly? No. When I buy a hard-disk recorder which claims to be able to > record two channels at once and let me watch a third, I expect it to > work. That's what I pay them for. Plugging a TV receiver into my
Correct. But, if that "appliance" can also make phone calls, control the spark plug firing sequence in your automobile *and* receive/decode satellite radio broadcasts, would you be upset if that third video stream had visual artifacts resulting from "missed deadlines", etc.? *That's* the sort of devices I'm involved with. The user knows the device can't do *everything* (just like a user knows his PC can't run *every* application CONCURRENTLY that it has loaded onto it). So, if given a means of expressing "preferences" ("values") for those activities/applications, the device itself could take measures to satisfy those preferences (instead of forcing the user to respond to an "insufficient resources" message and decide which things to *kill* (since he can't tell them to "shed resources" :> ).
> computer's USB port, run three instances of an MPEG codec, and hope for > the best - that's what I can do myself. > > I would accept if the recorder says, "hey, these channels have such a > high bitrate that I cannot record two of them at once". But I would not > accept if it "silently" damages the recording. At least not if it does > that in a very noticable way. If it drops a single frame every three > hours, I'll never notice. > >> But, they would not be very tolerant of a video player that >> simply shut down (because it decided it was *broken* since >> it missed a deadline). > > That's just my point: design the system that this never happens. Sure > this is harder than doing a desktop best-effort system.
See above. (In such an environment) you *eventually* come to a situation where a user is asking more of you (device) than you can do with the fixed resources in your "box". If you *must* always be able to do everything, you end up with more in the box than you need -- or, lots of dedicated "little boxes". If, instead, you allow the user to trade performance and preferences, you can do more with less (money, space, power, MIPS, etc.)
>>>> I.e., if your HRT system misses a deadline, does it even >>>> KNOW that it did??). >>> >>> My favourite design priniciple: never check for an error condition you >>> don't know to handle :-) >> >> Yeah, I think ostriches have a similar "defense mechanism". :> >> Not sure how effective it is, though, if the problem still >> exists. >> >> I try to arrange things to eliminate the possibility of >> errors, where possible. > > That's probably similar things. For example, every UTF-8 related > document says you should treat non-minimally encoded UTF-8 runes as an > error. Now what should I do? Show a pop-up error message to the user? > "Hey, your playlist file contains bad UTF-8!" 95% of them do not even > know what UTF-8 is. So I ignore that problem. Which also simplifies a > lot of other code because it can assume that I'll decode every 'char*' > into a 'wchar_t*'.
Yes. In my case, often even heavier handed (e.g., my calculator discussion restricting the character set to USASCII). Or, little things like using unsigned data types for "counts" (so the problem of dealing with negative values simply doesn't exist)
> [kernel asks task to free resources] >>>> Uncooperative tasks make life difficult! That;s the whole >>>> point of this :> >>> >>> I'm not sure I understood you correctly (maybe we mean the same thing?), >>> but the problem that immediately comes to my mind is applications that >>> claim to be good citizens, but by intention/bug/sabotage aren't. >>> Something like a heap overwrite error causing it to run into an infinite >>> loop, not finding the page to free. >> >> "Thou shalt not release buggy code" :> >> >> Why assume the "bug" lies in the application? If you are going >> to *tolerate* bugs, what if the bug lies in the kernel itself?? >> <frown> > > That's why kernels are usually written by much smaller (and better) > teams than user-land code. Thus the kernel can isolate the buggy tasks
Yes, but that is no guarantee that there are no bugs. It just shifts the probabilities around.
> from the proven error-free[tm] supervisor tasks, for example. Okay, it's > annoying if the MPEG decoder crashes on that particular file, but the > kernel should isolate that crash from the power management task, so the > device can at least be turned off without needing a powercycle. In > particular if powercycle means disassembling your car. > > At least, that approach works quite well for "our" devices. > Unfortunately, we cannot prove (in a mathematical sense) that our > userland code is completely bug-free. I can construct a (far-fetched, > unlikely) case that crashes my code, just because I simply have no idea > how to reliably detect that. At least, my code crashes a magnitude less > often than that of our favourite competitor :-)
The problem I am usually faced with is very long up-times, limited/constrained user interfaces (a user might not even be "present") and, often, significant "costs" associated with failures (financial or safety). I enjoy spending resources (MHz, memory, complexity, etc.) to improve these aspects of a product's design instead of "cosmetic crap". supper. Another bowl of pasta *really* would go down quite nicely! Though I suspect I should probably have something a bit more "substantial"... :<
Reply by D Yuniskis January 3, 20112011-01-03
Hi Stefan,

On 1/3/2011 5:33 AM, Stefan Reuther wrote:
>>>> The big (huge!) problem seems to be a direct consequence of >>>> that, though. Namely, the lack of involvement of the task >>>> in selecting which (if any) resources it can most afford to >>>> "lose". (I think I can deal with the signaling that >>>> would also be required using existing mechanisms). >>> >>> This is where the L4 guys stay on their convenient island saying "hey, >>> we've provided a mechanism, the rest is userland" :-) >> >> It *sounds* arrogant but, once you embrace that ideology, >> you can come up with much cleaner and more robust systems. >> The "policies" can benefit from the services and protections >> that are explicitly provided *to* make user-land tasks >> more robust! (instead of complicating the kernel with >> still more layers which are inherently difficult to debug) > > The art is making your mechanisms in a way that they are actually > practically usable, not just from an Ivory Tower. This was the point of > L4, to prove that microkernels can actually be used for efficient systems.
Yes. My first exposure to microkernels was through Mach. Many of the *ideas* made great sense. But, their implementation was too "kitchen sink"-ish. And, I think their attempt to chase a UN*X implementation as a "justification" for that architectural approach was a huge mistake. Had they, instead, said, "We're different" in much the same way UN*X "disowned" it's MULTICS, er, "roots" (bad choice of words), I think they would have been more successful in "proving something"
> Same thing here: your idea sounded really cool to me, I just had doubts > that the callback method can be implemented for a safe system.
I'm sure it can be if a "select team" implements the system. The problem is trying to open that system up for every TD&H (Tom, Dick & Harry). :<
>>> Yes. If you want this fine-grained, you'd better make it *very cheap*. >>> For example, one way could be a data structure in user-land with one >>> word per page / memory range containing the current value. As far as I >> >> I'm sure I can get the "cost" down. The bigger problem was the >> second: how to involve the task in the decision making process >> WITHOUT "involving it" (i.e., having it run any code). > > A task must be tracking its memory usage anyway. "This page contains > only free()d memory". "This page contains already-played audio". Now it
Yes. And, when you *expect* to have to forfeit those resources, you refocus *how* you keep track of what you are doing. For example, keeping the control structures associated with particular data *with* that data (since holding onto the control structures after discarding the data doesn't buy you anything).
> would need an experiment to figure out whether that knowledge can be > "exported" to an operating system memory manager somehow in a performant > way (i.e. without needing an 'mprotect' syscall for every audio sample > played).
I think the notification aspects and "value ordering" of held resources can be accomplished -- the kernel could always peek into the task to grab data concerning these resources IF it knows where to find that data. The bigger problem is giving the task a say in holding onto those resources in a flexible enough way that allows the task to determine it's own "resource pricing policy". If the task were to *know* that it has no further chance of reclaiming these resources AT THIS TIME, then the scheme by which it values them could be refined more. If, however, it knows/thinks it may lose some/all of them, then it wants to be able to place conditional bids on keeping various subsets of them -- subsets that *it* defines. (e.g., I'm willing to pay 100 for these three pages; if that bid fails, I'll pay 100 for these *two* pages, forfeiting the third; if *that* fails, I'll pay 200 for this *one* page!) I am hoping for an epiphany when my cumulative sleep deficit is in a bit better shape... :<
Reply by Stefan Reuther January 3, 20112011-01-03
Hi there,

D Yuniskis wrote:
> On 1/2/2011 6:44 AM, Stefan Reuther wrote: >>> You can complain about my choice of words ("hope") -- but, despite >>> that, my point is that it doesn't HAVE TO "guarantee" results. >>> Hard deadlines have no value once they have passed. I.e., >>> once the deadline comes STOP WORKING ON IT. >> >> Here, my lecture told me that (hard) real-time means designing a system >> so that this doesn't happen. Because if you miss one deadline, how can >> you be sure that this was just a small hiccup, and you won't miss the >> next 500 deadlines as well? By that definition, one would be able to fit >> a real-time H.264 1080p decoder on an 8051 :-)
[...]
> If you treat hard deadlines as MUST be met (else the system > is considered broken/failed), then anything with asynchronous > inputs is a likely candidate for "can't be solved" -- because > you can't guarantee that another input won't come along > before you have finished dealing with the first... "running out > of REAL time". Clearly, that isn't the case in the real > world so, either these aren't "hard" deadlines *or* they > are being missed and the world isn't coming to an end! :>
Of course this is the case in the real world, too. User inputs have debouncing, so you can be sure the user will not hit that switch more than three times in a second. Networks have bitrates, so you can be sure that you don't get more than X frames per second. Audio has sample rates, so you can be sure to receive exactly 44100 / 48000 samples per second (and have to produce the same amount). Mass storage has seek and read times. Video has frame rates. At least in the systems I work on. So I know precisely how many CPU cycles I may use to decode a MP3 frame.
>> associated problems, like having to convince the customer that the file >> they've found which misses deadlines on every other frame is the >> absolute exception, because nobody else puts 20 Mbps video on a floppy >> disc with 99% fragmentation or something like that. > > What if he wants to put a 100MB video on that floppy? > Some things just can't be done. Deal with it. :> > > Even consumers are starting to get enough sophistication > that they understand that a machine (tends to) "slows down" when > doing multiple things at once.
Honestly? No. When I buy a hard-disk recorder which claims to be able to record two channels at once and let me watch a third, I expect it to work. That's what I pay them for. Plugging a TV receiver into my computer's USB port, run three instances of an MPEG codec, and hope for the best - that's what I can do myself. I would accept if the recorder says, "hey, these channels have such a high bitrate that I cannot record two of them at once". But I would not accept if it "silently" damages the recording. At least not if it does that in a very noticable way. If it drops a single frame every three hours, I'll never notice.
> But, they would not be very tolerant of a video player that > simply shut down (because it decided it was *broken* since > it missed a deadline).
That's just my point: design the system that this never happens. Sure this is harder than doing a desktop best-effort system.
>>> I.e., if your HRT system misses a deadline, does it even >>> KNOW that it did??). >> >> My favourite design priniciple: never check for an error condition you >> don't know to handle :-) > > Yeah, I think ostriches have a similar "defense mechanism". :> > Not sure how effective it is, though, if the problem still > exists. > > I try to arrange things to eliminate the possibility of > errors, where possible.
That's probably similar things. For example, every UTF-8 related document says you should treat non-minimally encoded UTF-8 runes as an error. Now what should I do? Show a pop-up error message to the user? "Hey, your playlist file contains bad UTF-8!" 95% of them do not even know what UTF-8 is. So I ignore that problem. Which also simplifies a lot of other code because it can assume that I'll decode every 'char*' into a 'wchar_t*'. [kernel asks task to free resources]
>>> Uncooperative tasks make life difficult! That;s the whole >>> point of this :> >> >> I'm not sure I understood you correctly (maybe we mean the same thing?), >> but the problem that immediately comes to my mind is applications that >> claim to be good citizens, but by intention/bug/sabotage aren't. >> Something like a heap overwrite error causing it to run into an infinite >> loop, not finding the page to free. > > "Thou shalt not release buggy code" :> > > Why assume the "bug" lies in the application? If you are going > to *tolerate* bugs, what if the bug lies in the kernel itself?? > <frown>
That's why kernels are usually written by much smaller (and better) teams than user-land code. Thus the kernel can isolate the buggy tasks from the proven error-free[tm] supervisor tasks, for example. Okay, it's annoying if the MPEG decoder crashes on that particular file, but the kernel should isolate that crash from the power management task, so the device can at least be turned off without needing a powercycle. In particular if powercycle means disassembling your car. At least, that approach works quite well for "our" devices. Unfortunately, we cannot prove (in a mathematical sense) that our userland code is completely bug-free. I can construct a (far-fetched, unlikely) case that crashes my code, just because I simply have no idea how to reliably detect that. At least, my code crashes a magnitude less often than that of our favourite competitor :-) Stefan
Reply by Stefan Reuther January 3, 20112011-01-03
Good morning,

D Yuniskis wrote:
> On 1/2/2011 6:32 AM, Stefan Reuther wrote: >>> The big (huge!) problem seems to be a direct consequence of >>> that, though. Namely, the lack of involvement of the task >>> in selecting which (if any) resources it can most afford to >>> "lose". (I think I can deal with the signaling that >>> would also be required using existing mechanisms). >> >> This is where the L4 guys stay on their convenient island saying "hey, >> we've provided a mechanism, the rest is userland" :-) > > It *sounds* arrogant but, once you embrace that ideology, > you can come up with much cleaner and more robust systems. > The "policies" can benefit from the services and protections > that are explicitly provided *to* make user-land tasks > more robust! (instead of complicating the kernel with > still more layers which are inherently difficult to debug)
The art is making your mechanisms in a way that they are actually practically usable, not just from an Ivory Tower. This was the point of L4, to prove that microkernels can actually be used for efficient systems. Same thing here: your idea sounded really cool to me, I just had doubts that the callback method can be implemented for a safe system.
>> Yes. If you want this fine-grained, you'd better make it *very cheap*. >> For example, one way could be a data structure in user-land with one >> word per page / memory range containing the current value. As far as I > > I'm sure I can get the "cost" down. The bigger problem was the > second: how to involve the task in the decision making process > WITHOUT "involving it" (i.e., having it run any code).
A task must be tracking its memory usage anyway. "This page contains only free()d memory". "This page contains already-played audio". Now it would need an experiment to figure out whether that knowledge can be "exported" to an operating system memory manager somehow in a performant way (i.e. without needing an 'mprotect' syscall for every audio sample played). Stefan