Hi Stefan,

On 1/7/2011 11:04 AM, Stefan Reuther wrote:
> D Yuniskis wrote:
>> On 1/4/2011 3:29 PM, Stefan Reuther wrote:
>>>>> That's just my point: design the system that this never happens. Sure
>>>>> this is harder than doing a desktop best-effort system.
>>>>
>>>> See above.  (In such an environment) you *eventually* come
>>>> to a situation where a user is asking more of you (device)
>>>> than you can do with the fixed resources in your "box".
>>>> If you *must* always be able to do everything, you end up
>>>> with more in the box than you need -- or, lots of dedicated
>>>> "little boxes".
>>>
>>> You still have the option to know this beforehand and reject it.
>>
>> Then you are essentially removing features/capabilities from
>> your product just to avoid the POSSIBILITY of having to deal
>> with this at run time.  Even if the circumstances never actually
>> materialize!
>
> Exactly. And if you express it this way, why not. I call it "better safe
> than sorry".

The problem is you can only solve problems that can be
100% specified at design time.  I.e., you'll never come
up with an iPhone (e.g.) or other "expandable" device.

>> You're avoiding the issue (i.e., not even *knowing* if you have
>> missed a deadline) by claiming that you handle "all cases, 100%
>> of the time".  I.e., why *detect* something if you can't handle it?
>
> I know that I have to produce audio samples at 44.1 kHz rate. I have
> designed my system this way. The hardware can still handle that I not
> produce them fast enough, because I configured my hardware transmitter
> to send silence in this case. This catches the case that I happened to
> make a mistake in the design (which I do not make alone, and do not
> implement alone, and cannot formally prove in any case).

What do you do if you miss an audio packet for your cell phone?
Do you even *know* that you missed it?

>> For example, I have a tiny audio client (NIC, CPU, stereo amp)
>> with fixed (minimal) resources.  It has some signal processing
>> abilities that consume resources.  If the current network
>> (server) conditions deteriorate to a point where the client
>> can't reliably produce audio with the existing buffer sizes,
>> it has three options:
>
> But then you have a non-realtime component in the data path, namely the
> network, and reacting on that is of course necessary.

Why is the network *not* a real-time component?  In my case, I
control the entire "system" so the traffic on the network is
of my design, the protocol stacks have been designed with
deterministic behavior, etc.

But, like the other components, it is explicitly designed to
deal with "overload" because it knows that the other components
using it have mechanisms to cope with this.

If, OTOH, the "server" happened to "notice" that packets were
not getting out onto the wire "before the deadline" and simply
*stopped* working, then I will have designed a brittle system.

> Or do you measure "oops, this DecodeMPEGFrame took too long, this seems
> to be a complicated MPEG file, let's ask if they have this in cheap
> ADPCM, too?".

I look at the actual timeliness of each "result" in the system
and adjust the system's resources, dynamically, to maximize
the "value" of the functionality that it provides.  E.g., if
that means shutting down or altering a "desirable" feature
in order to continue providing a "necessary" feature, so be it.

> Of course my audio also starts stuttering if the CD drive doesn't give
> me enough audio data in time. But the system is designed to have enough
> CPU power under any circumstances, and have enough memory to compensate
> "typical" CD problems, so I don't have to ask the GUI people "hey, drop
> your frame rate a bit, I need more power to decode this file".

Hi Thad,

<frown>  I'm still trying to wrap my head around how to
commit this to a real implementation -- and relate it to the
user in a scheme he/she can "grok".

On 1/8/2011 9:16 AM, Thad Smith wrote:
> On 1/1/2011 10:52 PM, D Yuniskis wrote:
>>
>> On 1/1/2011 9:54 AM, Thad Smith wrote:
>>> I suggest focusing on the real-world cost of shedding resources. If a
>>> tasks yields 100 kB of memory, what is the cost to the user -- an extra
>>> 300 ms response time, perhaps? The manager may say "I'm willing to
>>> accept 1000 ms additional delay, how much memory can you release for
>>> me?"

>> The appeal of your approach is that it speaks directly to the
>> reason *behind* "using surplus resources" -- most often, to
>> improve response time or execution speed (similar aspects).
>
> My understanding is that you want an intelligent tradeoff. Relating them
> to a common single parameter is the technique. This is done within the
> context of satisfying fixed constraints. It is similar to an economic
> system where money is the common metric within physical, legal, and
> chosen ethical constraints.

Understood.  What I am trying to do is figure out how this
"currency" would work.

E.g., the only way I can visualize a scheme where "time" can
be the currency is if a task makes bids like "T time for M memory"
(again, I am only dealing with memory in these examples, so far).

So, an MP3 player might bid "20 ms for 1 unit" while another
application that needs 10 units at a time (to actually *do*
anything useful) could bid "35 ms for 10 units".  In this
scenario, the MP3 player wins as the other application is
effectively bidding 3.5ms/unit.

However, this other task would never bid on just one unit as
it can't *do* anything with just one unit.

Similarly, the MP3 player might never bid on 10 units.  Or,
if "forced" to do so, it would bid something disproportionate
since it isn't "worth much" to it to have all that extra
memory.

(that's the only way I can see time being a "negotiable"
quantity)

>> I see several problems with committing this to practice, though.
>>
>> First, the degenerate case -- where the kernel is DEMANDING those
>> resources. How does it frame the "proposition" to the task:
>> "I'm willing to accept INFINITE additional delay, how much memory
>> can you release for me?" :-/
>
> That's the easy case! Release all of it ;-) or all of it within the
> fixed system constraints.

The point I was making was how to express the "bidding".

I guess the first step is to decide how the "pricing" process
works.  I.e., does the kernel set a price and have tasks
say how much they would be willing to buy *at* that price?
Or, do the tasks make bids for what they would like and
the kernel arbitrates between them...  (the reference framework
has a big impact on the semantics)

>> You also have to deal with relating resources to time/latency.
>> For example, the "thinking ahead" chess automaton can probably give
>> you a numeric value: megabytes per millisecond (i.e., looking
>> at how much memory it takes to "think ahead" 1 millisecond).
>> But, this might not be a finely divisible quantum. The automaton
>> might need 10MB chunks to do anything "worthwhile" (note that
>> I have no idea what the resource requirements of such an algorithm
>> would actually be. I am just throwing wild numbers out for
>> illustrative purposes. If I used a "counting application" as an
>> example, it would be hard to talk about megabytes with a straight
>> face! :> )
>
> The primary task is to choose the best metric. I choose elapsed time,
> but that might not be the primary one for any particular system. For
> each task you need to establish an approximate correlation to the
> metric. It doesn't have to be perfect.

Time makes sense from an engineering perspective.  But, i am not
sure it makes sense from the user's point of view.  E.g., it
requires the user to understand more about the nature of the
various tasks.

>> Furthermore, it might be difficult for that automaton to decide *which*
>> chunk of memory to discard (if, for example, it only is currently
>> using enough to think one move ahead... what *fraction* of that
>> move should it discard?).
>
> That's a separate problem which faces any task needing to shed resources
> given an overall optimization technique.

Yes.  I'm just thinking aloud...

>> The other problem is that it might penalize or reward applications
>> unfairly. I.e., one application could end up frequently forfeiting
>> its resources while others never do. For exxample, telling the
>> MP3 player that it can be 1000ms late on it's deadline(s) would
>> essentially cause it to dump ALL of its resources: "Heck, I don't
>> even have to START decoding because I'll have plenty of time
>> AFTER my deadline to do what needs to be done!" (and, does the
>> 1000ms apply to all periodic deadlines thereafter?)
>
> And that could be the best system response. The goal is fairness among

<?> I assume you mean "is NOT fairness" (?)

> tasks but overall system performance, which may be best served in
> demanding phases by shutting down certain functions.
>
>> But, the biggest implementation problem I find is trying to map this
>> into units that you could use to value specific resources. How
>> do tasks decide what they want and whether or not they can "afford"
>> it?
>
> What tasks "want" is to satisfy their constraints and optimize certain
> parameters, such as update rate. Again it comes down to mapping separate
> metrics (display refresh rate, for example) onto an overall system
> quality. Some analysis is required and perhaps some configuration for
> particular applications.

What I would like is a "currency" that implicitly constrains tasks
based on the current value of that currency (wrt the resources it
is buying).  So, if a task can't find a way to optimize itself
in a given "commodity market", it can just drop out of the market
(i.e. exit()).  So, this activity can be handled automatically
without having to prompt the user.

E.g., if gasoline is $4/gallon, people "self select" whether they
will be traveling over a given holiday -- or, alter their destinations
to fit within their "resource budget"

>> How does the user configure "priorities"? I guess he could
>> specify a "tardiness tolerance" for each application and let them
>> figure out what that would cost (in terms of resources). But, what
>> prevents an application from saying, "Ah, I have a tardiness tolerance
>> of 10 minutes! I *need* 100MB of memory!!" (how does the kernel
>> decide what a fair mapping of resources to "tardiness" would be?
>
> Look at an economic analogy: if release is delayed two months it will
> cost us $500 k in sales. If we release on time, we estimate an
> additional $200 k in support and $200 k in extra engineering time. This
> is contrived, but a similar type of problem trading off disparate
> choices with a common metric.

Understood.  I'm just trying to see how to express it in a way that
makes sense to a user.  E.g., what if another release (task) has
corresponding figures of $100K sales/40K support/40K engineering
and yet another has 1M sales/400K support/400K engineering...?
And, change the "two months" to "3 weeks" in one case vs.
1 year in another (i.e., it is hard to look at the numbers
*intuitively* and figure out where the dollars are best spent)

On 1/1/2011 10:52 PM, D Yuniskis wrote:
> Hi Thad,
>
> On 1/1/2011 9:54 AM, Thad Smith wrote:
>> I suggest focusing on the real-world cost of shedding resources. If a
>> tasks yields 100 kB of memory, what is the cost to the user -- an extra
>> 300 ms response time, perhaps? The manager may say "I'm willing to
>> accept 1000 ms additional delay, how much memory can you release for me?"
>
> I had to read this a couple of times to make sure I understood
> your point. So, if I've *missed* it, I guess that means "a couple"
> wasn;t enough! :>
>
> [by "manager" I assume you mean the kernel -- or it's agent -- in
> regards to "renegotiating" resource (re)distribution.]

Yes -- maybe a little too much anthropomorphism.

> The appeal of your approach is that it speaks directly to the
> reason *behind* "using surplus resources" -- most often, to
> improve response time or execution speed (similar aspects).

My understanding is that you want an intelligent tradeoff.  Relating them to a 
common single parameter is the technique.  This is done within the context of 
satisfying fixed constraints.  It is similar to an economic system where money 
is the common metric within physical, legal, and chosen ethical constraints.

> I see several problems with committing this to practice, though.
>
> First, the degenerate case -- where the kernel is DEMANDING those
> resources. How does it frame the "proposition" to the task:
> "I'm willing to accept INFINITE additional delay, how much memory
> can you release for me?" :-/

That's the easy case!  Release all of it ;-) or all of it within the fixed 
system constraints.

> You also have to deal with relating resources to time/latency.
> For example, the "thinking ahead" chess automaton can probably give
> you a numeric value: megabytes per millisecond (i.e., looking
> at how much memory it takes to "think ahead" 1 millisecond).
> But, this might not be a finely divisible quantum. The automaton
> might need 10MB chunks to do anything "worthwhile" (note that
> I have no idea what the resource requirements of such an algorithm
> would actually be. I am just throwing wild numbers out for
> illustrative purposes. If I used a "counting application" as an
> example, it would be hard to talk about megabytes with a straight
> face! :> )

The primary task is to choose the best metric.  I choose elapsed time, but that 
might not be the primary one for any particular system.  For each task you need 
to establish an approximate correlation to the metric.  It doesn't have to be 
perfect.

> Furthermore, it might be difficult for that automaton to decide *which*
> chunk of memory to discard (if, for example, it only is currently
> using enough to think one move ahead... what *fraction* of that
> move should it discard?).

That's a separate problem which faces any task needing to shed resources given 
an overall optimization technique.

> The other problem is that it might penalize or reward applications
> unfairly. I.e., one application could end up frequently forfeiting
> its resources while others never do. For exxample, telling the
> MP3 player that it can be 1000ms late on it's deadline(s) would
> essentially cause it to dump ALL of its resources: "Heck, I don't
> even have to START decoding because I'll have plenty of time
> AFTER my deadline to do what needs to be done!" (and, does the
> 1000ms apply to all periodic deadlines thereafter?)

And that could be the best system response.  The goal is fairness among tasks 
but overall system performance, which may be best served in demanding phases by 
shutting down certain functions.

> But, the biggest implementation problem I find is trying to map this
> into units that you could use to value specific resources. How
> do tasks decide what they want and whether or not they can "afford"
> it?

What tasks "want" is to satisfy their constraints and optimize certain 
parameters, such as update rate.  Again it comes down to mapping separate 
metrics (display refresh rate, for example) onto an overall system quality. 
Some analysis is required and perhaps some configuration for particular 
applications.

> How does the user configure "priorities"? I guess he could
> specify a "tardiness tolerance" for each application and let them
> figure out what that would cost (in terms of resources). But, what
> prevents an application from saying, "Ah, I have a tardiness tolerance
> of 10 minutes! I *need* 100MB of memory!!" (how does the kernel
> decide what a fair mapping of resources to "tardiness" would be?

Look at an economic analogy: if release is delayed two months it will cost us 
$500 k in sales.  If we release on time, we estimate an additional $200 k in 
support and $200 k in extra engineering time.  This is contrived, but a similar 
type of problem trading off disparate choices with a common metric.

-- 
Thad

D Yuniskis wrote:
> On 1/4/2011 3:29 PM, Stefan Reuther wrote:
>>>> That's just my point: design the system that this never happens. Sure
>>>> this is harder than doing a desktop best-effort system.
>>>
>>> See above.  (In such an environment) you *eventually* come
>>> to a situation where a user is asking more of you (device)
>>> than you can do with the fixed resources in your "box".
>>> If you *must* always be able to do everything, you end up
>>> with more in the box than you need -- or, lots of dedicated
>>> "little boxes".
>>
>> You still have the option to know this beforehand and reject it.
> 
> Then you are essentially removing features/capabilities from
> your product just to avoid the POSSIBILITY of having to deal
> with this at run time.  Even if the circumstances never actually
> materialize!

Exactly. And if you express it this way, why not. I call it "better safe
than sorry".

> You're avoiding the issue (i.e., not even *knowing* if you have
> missed a deadline) by claiming that you handle "all cases, 100%
> of the time".  I.e., why *detect* something if you can't handle it?

I know that I have to produce audio samples at 44.1 kHz rate. I have
designed my system this way. The hardware can still handle that I not
produce them fast enough, because I configured my hardware transmitter
to send silence in this case. This catches the case that I happened to
make a mistake in the design (which I do not make alone, and do not
implement alone, and cannot formally prove in any case).

> For example, I have a tiny audio client (NIC, CPU, stereo amp)
> with fixed (minimal) resources.  It has some signal processing
> abilities that consume resources.  If the current network
> (server) conditions deteriorate to a point where the client
> can't reliably produce audio with the existing buffer sizes,
> it has three options:

But then you have a non-realtime component in the data path, namely the
network, and reacting on that is of course necessary.

Or do you measure "oops, this DecodeMPEGFrame took too long, this seems
to be a complicated MPEG file, let's ask if they have this in cheap
ADPCM, too?".

Of course my audio also starts stuttering if the CD drive doesn't give
me enough audio data in time. But the system is designed to have enough
CPU power under any circumstances, and have enough memory to compensate
"typical" CD problems, so I don't have to ask the GUI people "hey, drop
your frame rate a bit, I need more power to decode this file".

  Stefan

Hi Stefan,

On 1/4/2011 3:29 PM, Stefan Reuther wrote:
>>> That's just my point: design the system that this never happens. Sure
>>> this is harder than doing a desktop best-effort system.
>>
>> See above.  (In such an environment) you *eventually* come
>> to a situation where a user is asking more of you (device)
>> than you can do with the fixed resources in your "box".
>> If you *must* always be able to do everything, you end up
>> with more in the box than you need -- or, lots of dedicated
>> "little boxes".
>
> You still have the option to know this beforehand and reject it.

Then you are essentially removing features/capabilities from
your product just to avoid the POSSIBILITY of having to deal
with this at run time.  Even if the circumstances never actually
materialize!

Visit a medical office and see what the lack of integration
results in.  Do you think a company that designs EKG's can't
*also* design a pulse oximiter, infrared thermometer,
digital sphygmomanometer, heparin pump, etc.?  So, why have
so many dedicated boxes -- each with their own screen and
"user interface conventions"?  (this is slowly changing as
that industry realizes they can't afford the duplication of
hardware, maintenance costs, etc.)

E.g., one would think someone shelling out $1,000,000 for a
tablet press could *surely* afford an extra $10,000 for an
ejection force monitor -- yet, you find that they *don't*!
OTOH, if you offer that feature as one of a suite of features
(NOT ALL OF WHICH CAN WORK AT ALL TIMES IN ALL CONDITIONS)
and charge ~$1,000 for it, suddenly you have a competitive
advantage:  "Sure, we'll take it!"

> I prefer this a lot over "trying, hoping for the best, and cleaning up
> the mess if it didn't work" aka handling missed deadlines.

You are assuming you can predict everything that can happen
and address all of those things.  Sure, you can say, "well, if
this happens we need X time to recover..." but that's just
a CYA way of saying "we won't deal with conditions where we
have to react quicker (it's the customer's problem).

Handling missed deadlines doesn't have to be expensive.  E.g,
the tablet press example I mentioned (another reply) can handle
the worst case missed deadline (e.g., a "bad" tablet being
erroneously accepted) by shutting down the tablet press and
lighting a big red light.  If it misses a less important event
(e.g., an ejection force profile), it simply "returns no data"
for that event.

You're avoiding the issue (i.e., not even *knowing* if you have
missed a deadline) by claiming that you handle "all cases, 100%
of the time".  I.e., why *detect* something if you can't handle it?

> If I know I cannot do X and Y simultaneously, I decide which of them is
> more important, and then *deterministically* suspend one of them.

Which is exactly what *I* do.  But, only after I *know* I can't
handle both of them (because the LEAST IMPORTANT ONE ends up
missing its deadline).  You can watch how "it" is working and
tailor your approach/algorithm to what your current operating
conditions are.

For example, I have a tiny audio client (NIC, CPU, stereo amp)
with fixed (minimal) resources.  It has some signal processing
abilities that consume resources.  If the current network
(server) conditions deteriorate to a point where the client
can't reliably produce audio with the existing buffer sizes,
it has three options:
- get the server to transcode the audio to a lower bit rate
   (but, I am at its mercy so I can't count on this being
   a viable option in any particular situation)
- get the server to switch to a different codec (this is
   expensive as it can require replacing the code in the
   client "on-the-fly"; and, the server may not want to
   comply)
- shed capabilities (e.g., some of the signal processing
   though this affects the ongoing quality of the audio
   experience -- different aspects have different costs)
- drop frames (least desirable)
Sure, I can avoid all of this "work" -- I can either increase
the resources in the client *or* change the specification
of the device (i.e., make the problem go away by just claiming
it is beyond the scope of the device)

When I do a new design, the first thing I do is research the
application.  Often, that means talking to users.  *Usually*,
it means disregarding what they *say* (in favor of determining
what they actually *mean*).  It is helpful to pose value
questions to the user:  "What if..." and "What if it *can't*...".
My favorite scenario (which *ALWAYS* comes up) is the "I don't
care" response.  My stock reply is to focus on the notes
I am taking and just audibly say something like "... shut
down and catch fire".  :>  It's amazing how quickly they can
rephrase that "I don't care"!

Then, the application is factored into a reasonably fine set
of "chores" (avoiding the use of the word "tasks").  The
temporal requirements of each are identified -- do they have
hard deadlines or soft ones.  Most "chores" have *some* temporal
aspects even if they aren't what you would traditionally think
of as RT (but they tend to have very *soft* ones).

Then, the *consequences* of missed deadlines are considered -- what
is it worth to the application/user to meet this deadline (and what
do you lose if you meet it "late").  It's only at this point that
you can begin to address real hardware/software/requirements
tradeoffs.

The most important chores get addressed first -- regardless
of their temporal requirements.  Then, lesser important chores
get added into the mix until you have fleshed out all of the
wish list.

You can map "importances" (avoiding the term "prioritites")
to each of these chores.  This lets you apportion resources
and gives you an idea of what the maximal capabilities of
your system will be.  E.g., "I can keep the ignition timing
dead to nuts, ensure the ABS is always available, run the
emissions controls and any three of the following..."

If your marketing folks tell you "that's not acceptable, it
must do...", you can now counter with "*that* will cost you..."

I had one group *insist* that they needed a certain feature
in a product design.  That feature would complicate the design
and add considerably to the product's cost.  I was able to
tell them (from *their* sales records) that only *one* of
their customers had ever asked for that particular optional
configuration (at which point, top management reminisced
that the particular option had probably never been *used*
by that customer!).

[i.e., you have to know what to *ignore* from your users.
Salesmen always *want* everything imaginable -- and don't
want it to COST anything!  Forcing them to put sales
projections on particular configurations so that pricing
related to development costs is the easiest way to get
them to rethink their "demands"]

> However, this happens rarely enough that I can't come up with a real-
> time example (we've a few instances of this happening in batch tasks
> which would miss their "soft" deadlines otherwise).

Have your satellite radio *also* control the ignition timing
of the vehicle.  Then, post your results  :>

Hi,

D Yuniskis wrote:
> On 1/3/2011 6:01 AM, Stefan Reuther wrote:
>> Of course this is the case in the real world, too.
>>
>> User inputs have debouncing, so you can be sure the user will not hit
>> that switch more than three times in a second. Networks have bitrates,
> 
> Sure, but you don't know that he isn't going to hit
> "Full Speed Forward" and, a tenth of a second later, hit
> "Full Speed Reverse", etc.  I.e., you can't (reliably)
> predict the future -- yet have to cope with it.

But, for that given example, it's easy, because I'm allowed certain
reaction times :-)

The "keyboard driver" must react upon user input immediately. It must
recognize the "Forward" request and the "Reverse" request to make sure
nothing gets lost. I can just periodically check the user's last will,
at places I'm ready to process it.

>> Honestly? No. When I buy a hard-disk recorder which claims to be able to
>> record two channels at once and let me watch a third, I expect it to
>> work. That's what I pay them for. Plugging a TV receiver into my
> 
> Correct.  But, if that "appliance" can also make phone calls,
> control the spark plug firing sequence in your automobile
> *and* receive/decode satellite radio broadcasts, would you
> be upset if that third video stream had visual artifacts
> resulting from "missed deadlines", etc.?

If it's noticeable, yes! Of course I get annoyed if audio gets distorted
when I'm driving at 4000 rpm (when spark plug control has much work to do).

> *That's* the sort of devices I'm involved with.  The user
> knows the device can't do *everything* (just like a user
> knows his PC can't run *every* application CONCURRENTLY that
> it has loaded onto it).

People who follow my company's press releases know that we make car
stereos / satnav. And the people who drive these cars do not know what
computationally-intensive processes happen in there.

Okay, people who are into computer graphics may understand that the
digital map frame rate drops in the center of Paris with thousands of
little streets compared to some Australian outbacks with the next
village after 500 miles. But even they - let alone Joe Sixpack - will
not understand that the frame rate depends on the radio channel they're
listening to. Digital radio is much more computationally intensive than
analog FM, plus it depends heavily upon the codec and configuration in
use by the transmitter, which the user doesn't even see.

Well, it was hard to make this work, but we did it.

>> That's just my point: design the system that this never happens. Sure
>> this is harder than doing a desktop best-effort system.
> 
> See above.  (In such an environment) you *eventually* come
> to a situation where a user is asking more of you (device)
> than you can do with the fixed resources in your "box".
> If you *must* always be able to do everything, you end up
> with more in the box than you need -- or, lots of dedicated
> "little boxes".

You still have the option to know this beforehand and reject it.
I prefer this a lot over "trying, hoping for the best, and cleaning up
the mess if it didn't work" aka handling missed deadlines.

If I know I cannot do X and Y simultaneously, I decide which of them is
more important, and then *deterministically* suspend one of them.
However, this happens rarely enough that I can't come up with a real-
time example (we've a few instances of this happening in batch tasks
which would miss their "soft" deadlines otherwise).

  Stefan

Hi Stefan,

On 1/3/2011 6:01 AM, Stefan Reuther wrote:
>> If you treat hard deadlines as MUST be met (else the system
>> is considered broken/failed), then anything with asynchronous
>> inputs is a likely candidate for "can't be solved" -- because
>> you can't guarantee that another input won't come along
>> before you have finished dealing with the first... "running out
>> of REAL time".  Clearly, that isn't the case in the real
>> world so, either these aren't "hard" deadlines *or* they
>> are being missed and the world isn't coming to an end!  :>
>
> Of course this is the case in the real world, too.
>
> User inputs have debouncing, so you can be sure the user will not hit
> that switch more than three times in a second. Networks have bitrates,

Sure, but you don't know that he isn't going to hit
"Full Speed Forward" and, a tenth of a second later, hit
"Full Speed Reverse", etc.  I.e., you can't (reliably)
predict the future -- yet have to cope with it.

> so you can be sure that you don't get more than X frames per second.
> Audio has sample rates, so you can be sure to receive exactly 44100 /
> 48000 samples per second (and have to produce the same amount). Mass
> storage has seek and read times. Video has frame rates.
>
> At least in the systems I work on. So I know precisely how many CPU
> cycles I may use to decode a MP3 frame.
>
>>> associated problems, like having to convince the customer that the file
>>> they've found which misses deadlines on every other frame is the
>>> absolute exception, because nobody else puts 20 Mbps video on a floppy
>>> disc with 99% fragmentation or something like that.
>>
>> What if he wants to put a 100MB video on that floppy?
>> Some things just can't be done.  Deal with it.  :>
>>
>> Even consumers are starting to get enough sophistication
>> that they understand that a machine (tends to) "slows down" when
>> doing multiple things at once.
>
> Honestly? No. When I buy a hard-disk recorder which claims to be able to
> record two channels at once and let me watch a third, I expect it to
> work. That's what I pay them for. Plugging a TV receiver into my

Correct.  But, if that "appliance" can also make phone calls,
control the spark plug firing sequence in your automobile
*and* receive/decode satellite radio broadcasts, would you
be upset if that third video stream had visual artifacts
resulting from "missed deadlines", etc.?  *That's* the sort
of devices I'm involved with.  The user knows the device
can't do *everything* (just like a user knows his PC can't
run *every* application CONCURRENTLY that it has loaded onto it).
So, if given a means of expressing "preferences" ("values")
for those activities/applications, the device itself could
take measures to satisfy those preferences (instead of
forcing the user to respond to an "insufficient resources"
message and decide which things to *kill* (since he can't
tell them to "shed resources"  :> ).

> computer's USB port, run three instances of an MPEG codec, and hope for
> the best - that's what I can do myself.
>
> I would accept if the recorder says, "hey, these channels have such a
> high bitrate that I cannot record two of them at once". But I would not
> accept if it "silently" damages the recording. At least not if it does
> that in a very noticable way. If it drops a single frame every three
> hours, I'll never notice.
>
>> But, they would not be very tolerant of a video player that
>> simply shut down (because it decided it was *broken* since
>> it missed a deadline).
>
> That's just my point: design the system that this never happens. Sure
> this is harder than doing a desktop best-effort system.

See above.  (In such an environment) you *eventually* come
to a situation where a user is asking more of you (device)
than you can do with the fixed resources in your "box".
If you *must* always be able to do everything, you end up
with more in the box than you need -- or, lots of dedicated
"little boxes".

If, instead, you allow the user to trade performance
and preferences, you can do more with less (money, space,
power, MIPS, etc.)

>>>> I.e., if your HRT system misses a deadline, does it even
>>>> KNOW that it did??).
>>>
>>> My favourite design priniciple: never check for an error condition you
>>> don't know to handle :-)
>>
>> Yeah, I think ostriches have a similar "defense mechanism".  :>
>> Not sure how effective it is, though, if the problem still
>> exists.
>>
>> I try to arrange things to eliminate the possibility of
>> errors, where possible.
>
> That's probably similar things. For example, every UTF-8 related
> document says you should treat non-minimally encoded UTF-8 runes as an
> error. Now what should I do? Show a pop-up error message to the user?
> "Hey, your playlist file contains bad UTF-8!" 95% of them do not even
> know what UTF-8 is. So I ignore that problem. Which also simplifies a
> lot of other code because it can assume that I'll decode every 'char*'
> into a 'wchar_t*'.

Yes.  In my case, often even heavier handed (e.g., my calculator
discussion restricting the character set to USASCII).

Or, little things like using unsigned data types for "counts"
(so the problem of dealing with negative values simply doesn't
exist)

> [kernel asks task to free resources]
>>>> Uncooperative tasks make life difficult!  That;s the whole
>>>> point of this  :>
>>>
>>> I'm not sure I understood you correctly (maybe we mean the same thing?),
>>> but the problem that immediately comes to my mind is applications that
>>> claim to be good citizens, but by intention/bug/sabotage aren't.
>>> Something like a heap overwrite error causing it to run into an infinite
>>> loop, not finding the page to free.
>>
>> "Thou shalt not release buggy code" :>
>>
>> Why assume the "bug" lies in the application?  If you are going
>> to *tolerate* bugs, what if the bug lies in the kernel itself??
>> <frown>
>
> That's why kernels are usually written by much smaller (and better)
> teams than user-land code. Thus the kernel can isolate the buggy tasks

Yes, but that is no guarantee that there are no bugs.  It just
shifts the probabilities around.

> from the proven error-free[tm] supervisor tasks, for example. Okay, it's
> annoying if the MPEG decoder crashes on that particular file, but the
> kernel should isolate that crash from the power management task, so the
> device can at least be turned off without needing a powercycle. In
> particular if powercycle means disassembling your car.
>
> At least, that approach works quite well for "our" devices.
> Unfortunately, we cannot prove (in a mathematical sense) that our
> userland code is completely bug-free. I can construct a (far-fetched,
> unlikely) case that crashes my code, just because I simply have no idea
> how to reliably detect that. At least, my code crashes a magnitude less
> often than that of our favourite competitor :-)

The problem I am usually faced with is very long up-times,
limited/constrained user interfaces (a user might not even
be "present") and, often, significant "costs" associated with
failures (financial or safety).

I enjoy spending resources (MHz, memory, complexity, etc.)
to improve these aspects of a product's design instead of
"cosmetic crap".

supper.  Another bowl of pasta *really* would go down
quite nicely!  Though I suspect I should probably have
something a bit more "substantial"...  :<

Hi Stefan,

On 1/3/2011 5:33 AM, Stefan Reuther wrote:
>>>> The big (huge!) problem seems to be a direct consequence of
>>>> that, though.  Namely, the lack of involvement of the task
>>>> in selecting which (if any) resources it can most afford to
>>>> "lose".  (I think I can deal with the signaling that
>>>> would also be required using existing mechanisms).
>>>
>>> This is where the L4 guys stay on their convenient island saying "hey,
>>> we've provided a mechanism, the rest is userland" :-)
>>
>> It *sounds* arrogant but, once you embrace that ideology,
>> you can come up with much cleaner and more robust systems.
>> The "policies" can benefit from the services and protections
>> that are explicitly provided *to* make user-land tasks
>> more robust!  (instead of complicating the kernel with
>> still more layers which are inherently difficult to debug)
>
> The art is making your mechanisms in a way that they are actually
> practically usable, not just from an Ivory Tower. This was the point of
> L4, to prove that microkernels can actually be used for efficient systems.

Yes.  My first exposure to microkernels was through Mach.
Many of the *ideas* made great sense.  But, their implementation
was too "kitchen sink"-ish.  And, I think their attempt to
chase a UN*X implementation as a "justification" for that
architectural approach was a huge mistake.  Had they, instead,
said, "We're different" in much the same way UN*X "disowned"
it's MULTICS, er, "roots" (bad choice of words), I think they
would have been more successful in "proving something"

> Same thing here: your idea sounded really cool to me, I just had doubts
> that the callback method can be implemented for a safe system.

I'm sure it can be if a "select team" implements the system.
The problem is trying to open that system up for every TD&H
(Tom, Dick & Harry).  :<

>>> Yes. If you want this fine-grained, you'd better make it *very cheap*.
>>> For example, one way could be a data structure in user-land with one
>>> word per page / memory range containing the current value. As far as I
>>
>> I'm sure I can get the "cost" down.  The bigger problem was the
>> second:  how to involve the task in the decision making process
>> WITHOUT "involving it" (i.e., having it run any code).
>
> A task must be tracking its memory usage anyway. "This page contains
> only free()d memory". "This page contains already-played audio". Now it

Yes.  And, when you *expect* to have to forfeit those resources,
you refocus *how* you keep track of what you are doing.

For example, keeping the control structures associated with
particular data *with* that data (since holding onto the
control structures after discarding the data doesn't buy
you anything).

> would need an experiment to figure out whether that knowledge can be
> "exported" to an operating system memory manager somehow in a performant
> way (i.e. without needing an 'mprotect' syscall for every audio sample
> played).

I think the notification aspects and "value ordering" of
held resources can be accomplished -- the kernel could
always peek into the task to grab data concerning these
resources IF it knows where to find that data.

The bigger problem is giving the task a say in holding
onto those resources in a flexible enough way that
allows the task to determine it's own "resource pricing
policy".  If the task were to *know* that it has no further
chance of reclaiming these resources AT THIS TIME, then
the scheme by which it values them could be refined more.

If, however, it knows/thinks it may lose some/all of them,
then it wants to be able to place conditional bids on
keeping various subsets of them -- subsets that *it*
defines.  (e.g.,  I'm willing to pay 100 for these three
pages; if that bid fails, I'll pay 100 for these *two*
pages, forfeiting the third; if *that* fails, I'll pay
200 for this *one* page!)

I am hoping for an epiphany when my cumulative sleep deficit
is in a bit better shape...  :<

Hi there,

D Yuniskis wrote:
> On 1/2/2011 6:44 AM, Stefan Reuther wrote:
>>> You can complain about my choice of words ("hope") -- but, despite
>>> that, my point is that it doesn't HAVE TO "guarantee" results.
>>> Hard deadlines have no value once they have passed.  I.e.,
>>> once the deadline comes STOP WORKING ON IT.
>>
>> Here, my lecture told me that (hard) real-time means designing a system
>> so that this doesn't happen. Because if you miss one deadline, how can
>> you be sure that this was just a small hiccup, and you won't miss the
>> next 500 deadlines as well? By that definition, one would be able to fit
>> a real-time H.264 1080p decoder on an 8051 :-)
[...]
> If you treat hard deadlines as MUST be met (else the system
> is considered broken/failed), then anything with asynchronous
> inputs is a likely candidate for "can't be solved" -- because
> you can't guarantee that another input won't come along
> before you have finished dealing with the first... "running out
> of REAL time".  Clearly, that isn't the case in the real
> world so, either these aren't "hard" deadlines *or* they
> are being missed and the world isn't coming to an end!  :>

Of course this is the case in the real world, too.

User inputs have debouncing, so you can be sure the user will not hit
that switch more than three times in a second. Networks have bitrates,
so you can be sure that you don't get more than X frames per second.
Audio has sample rates, so you can be sure to receive exactly 44100 /
48000 samples per second (and have to produce the same amount). Mass
storage has seek and read times. Video has frame rates.

At least in the systems I work on. So I know precisely how many CPU
cycles I may use to decode a MP3 frame.

>> associated problems, like having to convince the customer that the file
>> they've found which misses deadlines on every other frame is the
>> absolute exception, because nobody else puts 20 Mbps video on a floppy
>> disc with 99% fragmentation or something like that.
> 
> What if he wants to put a 100MB video on that floppy?
> Some things just can't be done.  Deal with it.  :>
> 
> Even consumers are starting to get enough sophistication
> that they understand that a machine (tends to) "slows down" when
> doing multiple things at once.

Honestly? No. When I buy a hard-disk recorder which claims to be able to
record two channels at once and let me watch a third, I expect it to
work. That's what I pay them for. Plugging a TV receiver into my
computer's USB port, run three instances of an MPEG codec, and hope for
the best - that's what I can do myself.

I would accept if the recorder says, "hey, these channels have such a
high bitrate that I cannot record two of them at once". But I would not
accept if it "silently" damages the recording. At least not if it does
that in a very noticable way. If it drops a single frame every three
hours, I'll never notice.

> But, they would not be very tolerant of a video player that
> simply shut down (because it decided it was *broken* since
> it missed a deadline).

That's just my point: design the system that this never happens. Sure
this is harder than doing a desktop best-effort system.

>>> I.e., if your HRT system misses a deadline, does it even
>>> KNOW that it did??).
>>
>> My favourite design priniciple: never check for an error condition you
>> don't know to handle :-)
> 
> Yeah, I think ostriches have a similar "defense mechanism".  :>
> Not sure how effective it is, though, if the problem still
> exists.
> 
> I try to arrange things to eliminate the possibility of
> errors, where possible.

That's probably similar things. For example, every UTF-8 related
document says you should treat non-minimally encoded UTF-8 runes as an
error. Now what should I do? Show a pop-up error message to the user?
"Hey, your playlist file contains bad UTF-8!" 95% of them do not even
know what UTF-8 is. So I ignore that problem. Which also simplifies a
lot of other code because it can assume that I'll decode every 'char*'
into a 'wchar_t*'.

[kernel asks task to free resources]
>>> Uncooperative tasks make life difficult!  That;s the whole
>>> point of this  :>
>>
>> I'm not sure I understood you correctly (maybe we mean the same thing?),
>> but the problem that immediately comes to my mind is applications that
>> claim to be good citizens, but by intention/bug/sabotage aren't.
>> Something like a heap overwrite error causing it to run into an infinite
>> loop, not finding the page to free.
> 
> "Thou shalt not release buggy code" :>
> 
> Why assume the "bug" lies in the application?  If you are going
> to *tolerate* bugs, what if the bug lies in the kernel itself??
> <frown>

That's why kernels are usually written by much smaller (and better)
teams than user-land code. Thus the kernel can isolate the buggy tasks
from the proven error-free[tm] supervisor tasks, for example. Okay, it's
annoying if the MPEG decoder crashes on that particular file, but the
kernel should isolate that crash from the power management task, so the
device can at least be turned off without needing a powercycle. In
particular if powercycle means disassembling your car.

At least, that approach works quite well for "our" devices.
Unfortunately, we cannot prove (in a mathematical sense) that our
userland code is completely bug-free. I can construct a (far-fetched,
unlikely) case that crashes my code, just because I simply have no idea
how to reliably detect that. At least, my code crashes a magnitude less
often than that of our favourite competitor :-)

  Stefan

Good morning,

D Yuniskis wrote:
> On 1/2/2011 6:32 AM, Stefan Reuther wrote:
>>> The big (huge!) problem seems to be a direct consequence of
>>> that, though.  Namely, the lack of involvement of the task
>>> in selecting which (if any) resources it can most afford to
>>> "lose".  (I think I can deal with the signaling that
>>> would also be required using existing mechanisms).
>>
>> This is where the L4 guys stay on their convenient island saying "hey,
>> we've provided a mechanism, the rest is userland" :-)
> 
> It *sounds* arrogant but, once you embrace that ideology,
> you can come up with much cleaner and more robust systems.
> The "policies" can benefit from the services and protections
> that are explicitly provided *to* make user-land tasks
> more robust!  (instead of complicating the kernel with
> still more layers which are inherently difficult to debug)

The art is making your mechanisms in a way that they are actually
practically usable, not just from an Ivory Tower. This was the point of
L4, to prove that microkernels can actually be used for efficient systems.

Same thing here: your idea sounded really cool to me, I just had doubts
that the callback method can be implemented for a safe system.

>> Yes. If you want this fine-grained, you'd better make it *very cheap*.
>> For example, one way could be a data structure in user-land with one
>> word per page / memory range containing the current value. As far as I
> 
> I'm sure I can get the "cost" down.  The bigger problem was the
> second:  how to involve the task in the decision making process
> WITHOUT "involving it" (i.e., having it run any code).

A task must be tracking its memory usage anyway. "This page contains
only free()d memory". "This page contains already-played audio". Now it
would need an experiment to figure out whether that knowledge can be
"exported" to an operating system memory manager somehow in a performant
way (i.e. without needing an 'mprotect' syscall for every audio sample
played).

  Stefan