Applications "buying" resources| page 7

Reply by David Brown ●January 2, 20112011-01-02

On 01/01/11 16:05, Jon Kirwan wrote:
> On Sat, 01 Jan 2011 15:13:34 +0100, David Brown
> <david.brown@removethis.hesbynett.no>  wrote:
>
>> On 01/01/11 14:57, Jon Kirwan wrote:
>>> On Sat, 01 Jan 2011 14:03:44 +0100, David Brown
>>> <david.brown@removethis.hesbynett.no>   wrote:
>>>
>>>> On 31/12/10 00:37, Jon Kirwan wrote:
>>>>> Just thought I'd let you know that your worldview is much
>>>>> closer to mine than David's is.  I almost could have written
>>>>> your words for you.  Interesting; and enjoying the discussion
>>>>> too.
>>>>>
>>>>> Jon
>>>>
>>>> It all boils down to how you specify a correctly working system.  A
>>>> system that does not work as specified is broken, and a system that you
>>>> /know/ will not always work as specified, is badly designed.  If you
>>>> want to design a system that will do a particular job most of the time,
>>>> then that's how you specify it - "in /these/ circumstances, the system
>>>> will do this job - in other circumstances, it will do something else".
>>>>
>>>> You /cannot/ just make software that will work some of the time, and
>>>> call it a working system!  If you want to save resources and make a
>>>> system that can do A, B and C, but no more than two at a time, then
>>>> that's your specification.  You have very definitely /not/ made a system
>>>> that can do A, B and C.
>>>>
>>>> I don't mean to say that a system should not have parts (or even its
>>>> whole) working in a "best effort" way.  It's unusual to have a system
>>>> that doesn't have such parts.  But they are not real time, and certainly
>>>> not hard real time, and the rest of the system must be specified and
>>>> designed with the understanding that those parts may fail.
>>>
>>> I just find my reasoning for using an operating system in my
>>> application spaces more closely aligned with DY's arguments.
>>> I'm not arguing that there aren't other application spaces
>>> that are quite different, or that you are wrong-minded in
>>> your own perspective.  I'm just enjoying the discussion.
>>>
>>
>> It is possible to argue that this is all shades of grey, and it's a
>> matter of emphasis.  After all, it is true that a real-time system must
>> do it's best in the face of problems, such as hardware failures - and
>> you /could/ say that "lack of resources" is just another such failure.
>> But this thread has been about "real-time" systems from the start -
>
> I thought it was about DY's RTOS.
>

And "RTOS" stands for ... Real-Time Operating System.

If he hadn't explicitly said real-time in the first sentence, then I 
would just have thought that the system was overly complex, and unlikely 
to provide gains commensurate with the effort.  But since we are 
discussing real-time systems, I think it is simply wrong.  On the other 
hand, it seems DY is viewing all "real time" as "soft real time", where 
being late is allowed.  Then the solution becomes merely overly 
complicated rather than wrong.

>> you
>> simply don't design a real-time system with such lack-of-resource
>> failures.  Such a system is all about guarantees - /if/ there are no
>> hardware failures, then the system /will/ work.  You can use operating
>> systems for other purposes too, such as to mediate between different
>> best-effort tasks - but that is not real-time.
>
> I think you are just arguing, now, when I wasn't wanting to.

I wasn't really wanting to argue - many of the posts in this thread are 
just too long, and it would take too much time to follow everything.  I 
know that many of the people here, including yourself and D Yuniskis, 
are thoughtful and experienced developers, so it is unreasonable for me 
to write off your ideas so curtly.  I just think that the system DY is 
describing is complex, and seems to be based on the idea that some tasks 
(or rather, their developers) won't follow the rules for cooperating 
within the system, and yet it depends on tasks that /do/ follow the 
rules to make the system "economy" work.

> So, okay:
>
> I use an operating system for software partitioning,
> simplicity of design, and flexibility for future change.  I
> generally do applications where EVERYTHING COUNTS, though.
>
> DY's comments about those making a customer pay more than
> they otherwise have to (in terms of excess hardware costs,
> size, battery/power usage, heat, lower precision or
> repeatability, etc) is about where I'm at.  All corners in my
> application space count.  I don't want to make a customer pay
> for 256k flash parts when 8k would have done had I avoided
> linking in STL, nor do I want to "accept" for the customer a
> 5mA average draw in a battery application when 200uA is
> sufficient for the same job.
>
> Of course, the application has to work.  That's given.  Since
> everything I do is instrumentation, real time is also a
> given.  As is precision of measurement sampling.  Where
> appropriate, I will go to great lengths to ensure that
> measurement variation is zero clock cycles.  Meaning that I
> don't even allow any variation of interrupt latency and will
> select a micro that can guarantee that or else arrange the
> design to avoid it, if needed.  The operating system and
> threads in it will be arranged to support that with absolute
> precision.
>
> Operating systems that are targeted at wider audiences
> generally cannot address my needs very well.  I may have
> selected a micro with 128 bytes of RAM... total.  And need
> 100 of it for the application, leaving 28 for the O/S and the
> three threads needing support for sleep and semaphore queues.
> Just as an example that isn't too far from what might take
> place.
>
> Jon

Reply by Jon Kirwan ●January 2, 20112011-01-02

On Sun, 02 Jan 2011 15:28:28 +0100, David Brown
<david.brown@removethis.hesbynett.no> wrote:

>On 01/01/11 16:05, Jon Kirwan wrote:
>> On Sat, 01 Jan 2011 15:13:34 +0100, David Brown
>> <david.brown@removethis.hesbynett.no>  wrote:
>>
>>> On 01/01/11 14:57, Jon Kirwan wrote:
>>>> On Sat, 01 Jan 2011 14:03:44 +0100, David Brown
>>>> <david.brown@removethis.hesbynett.no>   wrote:
>>>>
>>>>> On 31/12/10 00:37, Jon Kirwan wrote:
>>>>>> Just thought I'd let you know that your worldview is much
>>>>>> closer to mine than David's is.  I almost could have written
>>>>>> your words for you.  Interesting; and enjoying the discussion
>>>>>> too.
>>>>>>
>>>>>> Jon
>>>>>
>>>>> It all boils down to how you specify a correctly working system.  A
>>>>> system that does not work as specified is broken, and a system that you
>>>>> /know/ will not always work as specified, is badly designed.  If you
>>>>> want to design a system that will do a particular job most of the time,
>>>>> then that's how you specify it - "in /these/ circumstances, the system
>>>>> will do this job - in other circumstances, it will do something else".
>>>>>
>>>>> You /cannot/ just make software that will work some of the time, and
>>>>> call it a working system!  If you want to save resources and make a
>>>>> system that can do A, B and C, but no more than two at a time, then
>>>>> that's your specification.  You have very definitely /not/ made a system
>>>>> that can do A, B and C.
>>>>>
>>>>> I don't mean to say that a system should not have parts (or even its
>>>>> whole) working in a "best effort" way.  It's unusual to have a system
>>>>> that doesn't have such parts.  But they are not real time, and certainly
>>>>> not hard real time, and the rest of the system must be specified and
>>>>> designed with the understanding that those parts may fail.
>>>>
>>>> I just find my reasoning for using an operating system in my
>>>> application spaces more closely aligned with DY's arguments.
>>>> I'm not arguing that there aren't other application spaces
>>>> that are quite different, or that you are wrong-minded in
>>>> your own perspective.  I'm just enjoying the discussion.
>>>>
>>>
>>> It is possible to argue that this is all shades of grey, and it's a
>>> matter of emphasis.  After all, it is true that a real-time system must
>>> do it's best in the face of problems, such as hardware failures - and
>>> you /could/ say that "lack of resources" is just another such failure.
>>> But this thread has been about "real-time" systems from the start -
>>
>> I thought it was about DY's RTOS.
>>
>
>And "RTOS" stands for ... Real-Time Operating System.
>
>If he hadn't explicitly said real-time in the first sentence, then I 
>would just have thought that the system was overly complex, and unlikely 
>to provide gains commensurate with the effort.  But since we are 
>discussing real-time systems, I think it is simply wrong.  On the other 
>hand, it seems DY is viewing all "real time" as "soft real time", where 
>being late is allowed.  Then the solution becomes merely overly 
>complicated rather than wrong.
>
>>> you
>>> simply don't design a real-time system with such lack-of-resource
>>> failures.  Such a system is all about guarantees - /if/ there are no
>>> hardware failures, then the system /will/ work.  You can use operating
>>> systems for other purposes too, such as to mediate between different
>>> best-effort tasks - but that is not real-time.
>>
>> I think you are just arguing, now, when I wasn't wanting to.
>
>I wasn't really wanting to argue - many of the posts in this thread are 
>just too long, and it would take too much time to follow everything.  I 
>know that many of the people here, including yourself and D Yuniskis, 
>are thoughtful and experienced developers, so it is unreasonable for me 
>to write off your ideas so curtly.  I just think that the system DY is 
>describing is complex, and seems to be based on the idea that some tasks 
>(or rather, their developers) won't follow the rules for cooperating 
>within the system, and yet it depends on tasks that /do/ follow the 
>rules to make the system "economy" work.

I like the introspection that DY is going through.  It isn't
so much the conclusions, but the path with which I'm finding
kinship.

Jon


>
>> So, okay:
>>
>> I use an operating system for software partitioning,
>> simplicity of design, and flexibility for future change.  I
>> generally do applications where EVERYTHING COUNTS, though.
>>
>> DY's comments about those making a customer pay more than
>> they otherwise have to (in terms of excess hardware costs,
>> size, battery/power usage, heat, lower precision or
>> repeatability, etc) is about where I'm at.  All corners in my
>> application space count.  I don't want to make a customer pay
>> for 256k flash parts when 8k would have done had I avoided
>> linking in STL, nor do I want to "accept" for the customer a
>> 5mA average draw in a battery application when 200uA is
>> sufficient for the same job.
>>
>> Of course, the application has to work.  That's given.  Since
>> everything I do is instrumentation, real time is also a
>> given.  As is precision of measurement sampling.  Where
>> appropriate, I will go to great lengths to ensure that
>> measurement variation is zero clock cycles.  Meaning that I
>> don't even allow any variation of interrupt latency and will
>> select a micro that can guarantee that or else arrange the
>> design to avoid it, if needed.  The operating system and
>> threads in it will be arranged to support that with absolute
>> precision.
>>
>> Operating systems that are targeted at wider audiences
>> generally cannot address my needs very well.  I may have
>> selected a micro with 128 bytes of RAM... total.  And need
>> 100 of it for the application, leaving 28 for the O/S and the
>> three threads needing support for sleep and semaphore queues.
>> Just as an example that isn't too far from what might take
>> place.
>>
>> Jon

Reply by D Yuniskis ●January 2, 20112011-01-02

Hi Stefan,

On 1/2/2011 6:32 AM, Stefan Reuther wrote:
>> OK, I've stewed on this "overnight" (overnap?).
>>
>> The advantages of being able to push all of the mechanism
>> into the kernel (or agent) without involving the task(s)
>> is *huge*.  The whole process becomes more deterministic.
>>
>> The big (huge!) problem seems to be a direct consequence of
>> that, though.  Namely, the lack of involvement of the task
>> in selecting which (if any) resources it can most afford to
>> "lose".  (I think I can deal with the signaling that
>> would also be required using existing mechanisms).
>
> This is where the L4 guys stay on their convenient island saying "hey,
> we've provided a mechanism, the rest is userland" :-)

It *sounds* arrogant but, once you embrace that ideology,
you can come up with much cleaner and more robust systems.
The "policies" can benefit from the services and protections
that are explicitly provided *to* make user-land tasks
more robust!  (instead of complicating the kernel with
still more layers which are inherently difficult to debug)

>> First, there needs to be a means by which the task can
>> assign relative values to it's "holdings".  It must be
>> inexpensive to update these values dynamically -- since
>> they typically will change during execution.  If the
>> cost of updating them "gets large", then it starts to
>> interfere with the gains this is intended to provide
>> (note there are undoubtedly bookkeeping costs other
>> than time).
>
> Yes. If you want this fine-grained, you'd better make it *very cheap*.
> For example, one way could be a data structure in user-land with one
> word per page / memory range containing the current value. As far as I

I'm sure I can get the "cost" down.  The bigger problem was the
second:  how to involve the task in the decision making process
WITHOUT "involving it" (i.e., having it run any code).

> remember, in L4, requesting a page is not necessarily a system call, but
> you just access it; the page fault is sent to the pager task. So if the

Correct.  Same in my case.  Likewise, if the page has disappeared,
you can still *try* to access it -- but it will generate a fault
which I can chose to handle in some predictable way (i.e., give you
bogus data and signal an exception for you to handle -- so you
*know* not to use that data that you MAY have fetched)

> pager can take away pages and return them with different content later,
> you'd also need a way to tell the task that the content is gone. So
> you'd probably need a user-land "check-and-lock" routine (but you'd need
> that as well when page unmap is a user-land routine, so that the
> SIGFREEMEM handler can synchronize with the interrupted user code).

If you have to tolerate the ability to lose *all* pages (including
those "wired down" for the task, itself), then you have to be prepared
to just kill the task.  Note that some tasks could be good candidates
for this!  E.g., any mechanism needs to be able to apply "grouped
value" to sets of pages -- remove one and you might just as well
remove them *all* (for example, some of the task's TEXT).

In my cooperative scheme, a task that has been asked to relinquish
resources can opt to say, "Sure, I'll just terminate myself!  All
that I am doing is blinking a light..."

I need to come up with an encoding scheme that lets me group
arbitrary sets of "pages" into "biddable entities" (still
thinking along Robert's Dutch auction line).  Then, the tougher
task is figuring out a way of representing conditional actions
in a rigid structure:  "if I lose *this* bid on this set of
biddable entities, then my *next* bid would be..."

Reply by D Yuniskis ●January 2, 20112011-01-02

Hi Stefan,

On 1/2/2011 6:44 AM, Stefan Reuther wrote:
>>>>> You're just explaining what I shortened to "it will work": whereas a
>>>>> batch system will take as long as it takes, a real-time system must
>>>>> produce a result within certain time constraints. Those differ
>>>>> between a
>>>>> VoIP system and a deep space probe, but the principle is the same.
>>>>
>>>> No.  A real-time system *hopes* to produce a result -- of
>>>> some time-valued function -- in some timeframe relative to
>>>> a deadline.
>>>
>>> According to my definition of real-time, it doesn't "hope", but it
>>
>> You can complain about my choice of words ("hope") -- but, despite
>> that, my point is that it doesn't HAVE TO "guarantee" results.
>> Hard deadlines have no value once they have passed.  I.e.,
>> once the deadline comes STOP WORKING ON IT.
>
> Here, my lecture told me that (hard) real-time means designing a system
> so that this doesn't happen. Because if you miss one deadline, how can
> you be sure that this was just a small hiccup, and you won't miss the
> next 500 deadlines as well? By that definition, one would be able to fit
> a real-time H.264 1080p decoder on an 8051 :-)

It's called "probabilistic systems analysis"  :>  "What are the
chances of THIS happening?  And, what are the chances of my
solution giving THAT result?"

The problem with the "real time" world is too many terms are
"overloaded".  E.g., "real time" vs. "real-time" vs. "Real Time".

I is much easier to think (and speak) in terms of "value functions"
and "deadlines" (or their analogs).  This makes it much clearer
to all parties what the exact nature of the problem is and the
approaches available to "solve" it.

If, for example, you have "hard deadlines" in the design, that
*actually* says, "get this done before the deadline OR DON'T
BOTHER TO DO/FINISH IT".  The presence of *soft* deadlines
tells you, "Hmmm... if we can't get this done in time, how
are we going to are we going to address what remains of the
task/chore thereafter AND what consequences will that have
on the remaining tasks in the system?"  (note that you
didn't have to worry about this aspect for the hard deadline
tasks -- when the deadline passed, you could AND SHOULD simply
forget about them.)

If you treat hard deadlines as MUST be met (else the system
is considered broken/failed), then anything with asynchronous
inputs is a likely candidate for "can't be solved" -- because
you can't guarantee that another input won't come along
before you have finished dealing with the first... "running out
of REAL time".  Clearly, that isn't the case in the real
world so, either these aren't "hard" deadlines *or* they
are being missed and the world isn't coming to an end!  :>

Separate the *consequences* (in your mind) of the missed deadline
from the processing of the task, itself.  I've found this makes
it a lot easier to describe what you *realistically* want the
system to do "in the REAL world".   E.g., just because grabbing
that bottle off the conveyor belt *is* a HARD deadline, that
doesn't mean that you should devote an exhorbitant amount of
resources to RELIABLY catching every single one of them -- at
the expense of, perhaps, OVERFILLING dozens of other bottles
upstream of that!

This is where desktop/"best effort" environments differ.
They treat everything as "the same".

> Handling missed deadlines would be soft real-time to me. With all its

No.  A missed deadline is just a notification of a "fact".
"Uh, General, Sir?  The antiballistic missile failed to
make it's course correction at the correct time.  We're
about to be nuked..."  Clearly there is some value in
knowing that a deadline was missed (or *abandoned*)
regardless of whether it was a hard or soft deadline.
Handling the missed deadline is just a means of formally
recognizing the fact (you could decide to increment
a counter, light a big red light, chuckle softly, etc.).

My point is, how do you *know* that you haven't missed
any?  You're relying on your a priori design ("on paper").
If you forgot to take something into consideration *then*,
chances are you've wrongly convinced yourself that you
have "solved" the problem (and will dutifully ignore
all evidence to the contrary!  :> )

> associated problems, like having to convince the customer that the file
> they've found which misses deadlines on every other frame is the
> absolute exception, because nobody else puts 20 Mbps video on a floppy
> disc with 99% fragmentation or something like that.

What if he wants to put a 100MB video on that floppy?
Some things just can't be done.  Deal with it.  :>

Even consumers are starting to get enough sophistication
that they understand that a machine (tends to) "slows down" when
doing multiple things at once.  And, that in those stressed
operating conditions, things might not perform as they would
"otherwise".  E.g., choppy audio or video.

But, they would not be very tolerant of a video player that
simply shut down (because it decided it was *broken* since
it missed a deadline).

>> Soft deadlines still have (typically diminishing) value to their
>> completion after the deadline.  As such, designing in that
>> scenario is harder as you have to decide how/if you should
>> continue to try to "meet the GOAL" (now that the deadline has
>> passed).  HRT is easy to do -- add up the numbers, make sure
>> they fit on the timeline *before* the deadline and you are done.
>> Often, a completely static analysis can yield a "working"
>> system -- if you assume all deadlines MUST be met (most
>> RTOS's don't include support for missed deadline handlers!
>> I.e., if your HRT system misses a deadline, does it even
>> KNOW that it did??).
>
> My favourite design priniciple: never check for an error condition you
> don't know to handle :-)

Yeah, I think ostriches have a similar "defense mechanism".  :>
Not sure how effective it is, though, if the problem still
exists.

I try to arrange things to eliminate the possibility of
errors, where possible.

> There is a proof that given a set of periods and deadlines, once can
> derive an entirely priority-based scheduling scheme, so that's what the
> RTOSes known to me offer.

This can be done with some SCHEDULING ALGORITHMS (and suitable
criteria placed on the tasks).  But, again, that requires you
to be able to predict *reliably* the characteristics of
your tasks, etc.  Wonderful if *you* are driving those tasks
(e.g., everything is related to some periodic interrupt
that *you* have configured).  But, if you are responding to
the outside world, it's a lot harder to get *it* (the outside
world) to adhere to your wishes!

> [kernel asks task to free resources]
>>> But how does the kernel know that the task is done? How can the task be
>>> sure that the kernel knows it cooperates? Let's say I'm making a task
>>> that wants to play nice. It has a SIGFREEMEM handler. It goes through my
>>> data structures and frees a lot of memory (which is BTW not trivial if
>>> it can, like a signal, happen at any time, even within malloc). How can
>>> I be sure that I'm being given the time needed to do the cleanup, and
>>> not being struck by a kernel timeout just because other tasks ate up the
>>> CPU?
>>
>> Currently, the kernel "knows" because I (good citizen) cooperate
>> and inform it when I have relinquished what I *can* relinquish.
>> If I change this interface to an asynchronous one, then the
>> kernel would have to impose a timeout on the task and/or look
>> into other ways of restricting what the task could do while
>> (allegedly) "cleaning up".
>>
>> Uncooperative tasks make life difficult!  That;s the whole
>> point of this  :>
>
> I'm not sure I understood you correctly (maybe we mean the same thing?),
> but the problem that immediately comes to my mind is applications that
> claim to be good citizens, but by intention/bug/sabotage aren't.
> Something like a heap overwrite error causing it to run into an infinite
> loop, not finding the page to free.

"Thou shalt not release buggy code" :>

Why assume the "bug" lies in the application?  If you are going
to *tolerate* bugs, what if the bug lies in the kernel itself??
<frown>

Granted, expecting the task to be well behaved is an assumption.
Just like expecting a task in a cooperative multitasking
environment to relinquish the processor frequently is an
assumption.

These constraints/expectations don't make system designs
impossible -- they just increase the amount of "due diligence"
that the developer must exercise before pronouncing the
system "fit".  If you (i.e., *I*) can come up with a
MECHANISM (again, avoiding POLICY) that is tolerant of
programming errors, uncooperative or malicious tasks, etc.
then you have a more robust environment to work in.
"Violators will be shot"  :>

There are costs associated with this, of course.  Both in
terms of resources and performance.  I *like* pushing
functionality into the kernel when it can buy me some
piece of mind (e.g., not having to worry about another
task stomping on *my* memory; or hogging the processor;
or ...)

Reply by D Yuniskis ●January 2, 20112011-01-02

Hi Arlet,

On 1/2/2011 6:52 AM, Arlet Ottens wrote:
> On Sun, 02 Jan 2011 06:45:48 -0700, D Yuniskis wrote:
>
>>>> You're making my point!  Voluntary multitasking *works* when a *fixed*
>>>> developer (or group of developers) undertakes the entire system
>>>> implementation.  They are aware of the risks and *own* the design --
>>>> if they opt to bend the rules for a particular portion of the design,
>>>> they are aware of the costs/benefits.
>>>
>>> While cooperative scheduling is unfashionable, you cannot win this
>>> argument. The reality that there are many 24/7 apps running cooperative
>>> schedulers and apps written by one team is not important. There are
>>> times when I cynically believe that software is more part of the
>>> fashion industry than any technical industry.
>>
>> <grin>   I'll concede that working in such an environment requires LOTS
>> more discipline than, for example, a preemptive one -- especially for
>> real-time applications.  I know when I work in such environments, I
>> shudder at the inordinate number of "yield()'s" that I do -- until I
>> look at the cost of each and see their non-impact on the overall
>> product.
>
> A preemptive environment needs a lot of discipline to solve all the
> critical sections.

Can I rephrase that for you?

"A preemptive environment needs a lot of discipline to IDENTIFY all the
critical sections"  (?)

E.g., usually, you have a mechanism to help you "solve" the
problem.  The trick is *finding* them all since you have to
methodically think:  "what happens if an interrupt happens
INSIDE this statement and my task is swapped out.  Can
some other task come along and, coincidentally, alter
something that I was in the process of using?"

> I generally try to solve my problems with cooperative scheduling, but
> without using yield()-like mechanisms. I agree yield()'s are ugly, and
> usually a sign that the problem has been solved the wrong way.

There is some value to an explicit "yield()" -- it tells the
reader "OK, this is a convenient place for some other task to
run".  E.g., if I see a large block of code *without* a yield()
someplace in it, I am alerted to the fact that there is
probably some relationship (that might not immediately
be obvious) between the instructions represented in that
code block that doesn't take kindly to being interrupted.

Of course, a lot depends on how expensive your task switch
is.  If it is expensive, then you want to minimize
superfluous yield()'s.  But, this comes at the cost of
increased latency (for other tasks).

The point is, you can readily use a cooperative environment
in a lot of applications.  I surely wouldn't use a preemptive
scheduler when designing a microwave oven controller (unless
the microwave oven also acted as a television...).  Why bear
that cost when a little discipline can do the trick?

Reply by Arlet Ottens ●January 2, 20112011-01-02

On Sun, 02 Jan 2011 14:31:05 -0700, D Yuniskis wrote:

>> A preemptive environment needs a lot of discipline to solve all the
>> critical sections.
> 
> Can I rephrase that for you?
> 
> "A preemptive environment needs a lot of discipline to IDENTIFY all the
> critical sections"  (?)
> 
> E.g., usually, you have a mechanism to help you "solve" the problem. 
> The trick is *finding* them all since you have to methodically think: 
> "what happens if an interrupt happens INSIDE this statement and my task
> is swapped out.  Can some other task come along and, coincidentally,
> alter something that I was in the process of using?"

Sure, the discipline is mostly about identification. Solving the critical 
sections has a cost too, and may introduce other problems like priority 
inversion.

>> I generally try to solve my problems with cooperative scheduling, but
>> without using yield()-like mechanisms. I agree yield()'s are ugly, and
>> usually a sign that the problem has been solved the wrong way.
> 
> There is some value to an explicit "yield()" -- it tells the reader "OK,
> this is a convenient place for some other task to run".  E.g., if I see
> a large block of code *without* a yield() someplace in it, I am alerted
> to the fact that there is probably some relationship (that might not
> immediately be obvious) between the instructions represented in that
> code block that doesn't take kindly to being interrupted.

True, every rule has its exception. It's probably because I never write 
stuff that has big blocks of code. Most of the stuff I write only runs 
for short bits and then needs to wait for something, so you get automatic 
calls to the scheduler. I'd still frown on large blocks of code that are 
peppered with yields(). It's too easy to add some extra code, and forget 
to update them. It's different if you have a small piece of code that 
takes a long time, like writing flash sectors in a loop. Having a yield() 
just before you write a sector wouldn't be so bad.

> The point is, you can readily use a cooperative environment in a lot of
> applications.  I surely wouldn't use a preemptive scheduler when
> designing a microwave oven controller (unless the microwave oven also
> acted as a television...).  Why bear that cost when a little discipline
> can do the trick?

You can also run stuff in soft interrupts. Those are easier to track than 
tasks, and sometimes just as powerful. If your tasks end up like this:

while( 1 )
{
    wait_for_event();
    do_things();
}

then you can replace them with a soft interrupt mechanism, get rid of a 
task stack, and possibly simplify the critical sections.

Reply by D Yuniskis ●January 2, 20112011-01-02

Hi Arlet,

On 1/2/2011 2:47 PM, Arlet Ottens wrote:
>>> I generally try to solve my problems with cooperative scheduling, but
>>> without using yield()-like mechanisms. I agree yield()'s are ugly, and
>>> usually a sign that the problem has been solved the wrong way.
>>
>> There is some value to an explicit "yield()" -- it tells the reader "OK,
>> this is a convenient place for some other task to run".  E.g., if I see
>> a large block of code *without* a yield() someplace in it, I am alerted
>> to the fact that there is probably some relationship (that might not
>> immediately be obvious) between the instructions represented in that
>> code block that doesn't take kindly to being interrupted.
>
> True, every rule has its exception. It's probably because I never write
> stuff that has big blocks of code. Most of the stuff I write only runs
> for short bits and then needs to wait for something, so you get automatic

Yes.  If the "wait" is a "system call" (however you want to define
that), then it can embed a reschedule() (contrast that with
spinning in a tight loop)

> calls to the scheduler. I'd still frown on large blocks of code that are
> peppered with yields(). It's too easy to add some extra code, and forget
> to update them. It's different if you have a small piece of code that
> takes a long time, like writing flash sectors in a loop. Having a yield()
> just before you write a sector wouldn't be so bad.

Or, *start* the write and then yield().

>> The point is, you can readily use a cooperative environment in a lot of
>> applications.  I surely wouldn't use a preemptive scheduler when
>> designing a microwave oven controller (unless the microwave oven also
>> acted as a television...).  Why bear that cost when a little discipline
>> can do the trick?
>
> You can also run stuff in soft interrupts. Those are easier to track than
> tasks, and sometimes just as powerful. If your tasks end up like this:
>
> while( 1 )
> {
>      wait_for_event();
>      do_things();
> }
>
> then you can replace them with a soft interrupt mechanism, get rid of a
> task stack, and possibly simplify the critical sections.

I have an executive that uses a structure like:

{
    ...

    do_stuff();

    mark();

    do_more_stuff();

    if (something)
       yield();

    keep_going();

    mark();

    ...
}

I.e., "yield()" relinquishes the processor but
the task is resumed from the most recent "mark()".
So, the task explicitly declares the points
in its code that it "starts from" (mark doesn't
implicitly yield)

At first blush, it looks very clumsy.  But, it can be
very effective (for small projects).

Reply by David Brown ●January 3, 20112011-01-03

On 02/01/2011 16:44, Jon Kirwan wrote:

> I like the introspection that DY is going through.  It isn't
> so much the conclusions, but the path with which I'm finding
> kinship.
>

Fair enough.  He is certainly good at showing the thought processes, and 
is not afraid to change his mind as he works through the ideas - that is 
something at least some of us can learn from.

Reply by Stefan Reuther ●January 3, 20112011-01-03

Good morning,

D Yuniskis wrote:
> On 1/2/2011 6:32 AM, Stefan Reuther wrote:
>>> The big (huge!) problem seems to be a direct consequence of
>>> that, though.  Namely, the lack of involvement of the task
>>> in selecting which (if any) resources it can most afford to
>>> "lose".  (I think I can deal with the signaling that
>>> would also be required using existing mechanisms).
>>
>> This is where the L4 guys stay on their convenient island saying "hey,
>> we've provided a mechanism, the rest is userland" :-)
> 
> It *sounds* arrogant but, once you embrace that ideology,
> you can come up with much cleaner and more robust systems.
> The "policies" can benefit from the services and protections
> that are explicitly provided *to* make user-land tasks
> more robust!  (instead of complicating the kernel with
> still more layers which are inherently difficult to debug)

The art is making your mechanisms in a way that they are actually
practically usable, not just from an Ivory Tower. This was the point of
L4, to prove that microkernels can actually be used for efficient systems.

Same thing here: your idea sounded really cool to me, I just had doubts
that the callback method can be implemented for a safe system.

>> Yes. If you want this fine-grained, you'd better make it *very cheap*.
>> For example, one way could be a data structure in user-land with one
>> word per page / memory range containing the current value. As far as I
> 
> I'm sure I can get the "cost" down.  The bigger problem was the
> second:  how to involve the task in the decision making process
> WITHOUT "involving it" (i.e., having it run any code).

A task must be tracking its memory usage anyway. "This page contains
only free()d memory". "This page contains already-played audio". Now it
would need an experiment to figure out whether that knowledge can be
"exported" to an operating system memory manager somehow in a performant
way (i.e. without needing an 'mprotect' syscall for every audio sample
played).

  Stefan

Reply by Stefan Reuther ●January 3, 20112011-01-03

Hi there,

D Yuniskis wrote:
> On 1/2/2011 6:44 AM, Stefan Reuther wrote:
>>> You can complain about my choice of words ("hope") -- but, despite
>>> that, my point is that it doesn't HAVE TO "guarantee" results.
>>> Hard deadlines have no value once they have passed.  I.e.,
>>> once the deadline comes STOP WORKING ON IT.
>>
>> Here, my lecture told me that (hard) real-time means designing a system
>> so that this doesn't happen. Because if you miss one deadline, how can
>> you be sure that this was just a small hiccup, and you won't miss the
>> next 500 deadlines as well? By that definition, one would be able to fit
>> a real-time H.264 1080p decoder on an 8051 :-)
[...]
> If you treat hard deadlines as MUST be met (else the system
> is considered broken/failed), then anything with asynchronous
> inputs is a likely candidate for "can't be solved" -- because
> you can't guarantee that another input won't come along
> before you have finished dealing with the first... "running out
> of REAL time".  Clearly, that isn't the case in the real
> world so, either these aren't "hard" deadlines *or* they
> are being missed and the world isn't coming to an end!  :>

Of course this is the case in the real world, too.

User inputs have debouncing, so you can be sure the user will not hit
that switch more than three times in a second. Networks have bitrates,
so you can be sure that you don't get more than X frames per second.
Audio has sample rates, so you can be sure to receive exactly 44100 /
48000 samples per second (and have to produce the same amount). Mass
storage has seek and read times. Video has frame rates.

At least in the systems I work on. So I know precisely how many CPU
cycles I may use to decode a MP3 frame.

>> associated problems, like having to convince the customer that the file
>> they've found which misses deadlines on every other frame is the
>> absolute exception, because nobody else puts 20 Mbps video on a floppy
>> disc with 99% fragmentation or something like that.
> 
> What if he wants to put a 100MB video on that floppy?
> Some things just can't be done.  Deal with it.  :>
> 
> Even consumers are starting to get enough sophistication
> that they understand that a machine (tends to) "slows down" when
> doing multiple things at once.

Honestly? No. When I buy a hard-disk recorder which claims to be able to
record two channels at once and let me watch a third, I expect it to
work. That's what I pay them for. Plugging a TV receiver into my
computer's USB port, run three instances of an MPEG codec, and hope for
the best - that's what I can do myself.

I would accept if the recorder says, "hey, these channels have such a
high bitrate that I cannot record two of them at once". But I would not
accept if it "silently" damages the recording. At least not if it does
that in a very noticable way. If it drops a single frame every three
hours, I'll never notice.

> But, they would not be very tolerant of a video player that
> simply shut down (because it decided it was *broken* since
> it missed a deadline).

That's just my point: design the system that this never happens. Sure
this is harder than doing a desktop best-effort system.

>>> I.e., if your HRT system misses a deadline, does it even
>>> KNOW that it did??).
>>
>> My favourite design priniciple: never check for an error condition you
>> don't know to handle :-)
> 
> Yeah, I think ostriches have a similar "defense mechanism".  :>
> Not sure how effective it is, though, if the problem still
> exists.
> 
> I try to arrange things to eliminate the possibility of
> errors, where possible.

That's probably similar things. For example, every UTF-8 related
document says you should treat non-minimally encoded UTF-8 runes as an
error. Now what should I do? Show a pop-up error message to the user?
"Hey, your playlist file contains bad UTF-8!" 95% of them do not even
know what UTF-8 is. So I ignore that problem. Which also simplifies a
lot of other code because it can assume that I'll decode every 'char*'
into a 'wchar_t*'.

[kernel asks task to free resources]
>>> Uncooperative tasks make life difficult!  That;s the whole
>>> point of this  :>
>>
>> I'm not sure I understood you correctly (maybe we mean the same thing?),
>> but the problem that immediately comes to my mind is applications that
>> claim to be good citizens, but by intention/bug/sabotage aren't.
>> Something like a heap overwrite error causing it to run into an infinite
>> loop, not finding the page to free.
> 
> "Thou shalt not release buggy code" :>
> 
> Why assume the "bug" lies in the application?  If you are going
> to *tolerate* bugs, what if the bug lies in the kernel itself??
> <frown>

That's why kernels are usually written by much smaller (and better)
teams than user-land code. Thus the kernel can isolate the buggy tasks
from the proven error-free[tm] supervisor tasks, for example. Okay, it's
annoying if the MPEG decoder crashes on that particular file, but the
kernel should isolate that crash from the power management task, so the
device can at least be turned off without needing a powercycle. In
particular if powercycle means disassembling your car.

At least, that approach works quite well for "our" devices.
Unfortunately, we cannot prove (in a mathematical sense) that our
userland code is completely bug-free. I can construct a (far-fetched,
unlikely) case that crashes my code, just because I simply have no idea
how to reliably detect that. At least, my code crashes a magnitude less
often than that of our favourite competitor :-)

  Stefan