Hi Robert,

robertwessel2@yahoo.com wrote:
> On Dec 12, 3:29 pm, D Yuniskis <not.going.to...@seen.com> wrote:
>> It seems like this approach is geared towards lots of timers
>> of wide dynamic range.  The overhead seems geared towards
>> letting you ignore most/many of them until close to their
>> expiration time (though you have an ongoing, sporadic effort
>> to *maintain* them as they move to finer grained "buckets"
>> (i.e., you have to revisit them several times before they
>> expire)
> 
> Pretty much.  Although I would consider the timers to be basically
> wholly ignored between initial insertion and dispatch, except for any
> required redistributions.
> 
> It has the advantage of scalability, as the amount of work done per
> timer does not increase as the number of timers in the system
> increases (which is not the case for most schemes), and not placing
> any limits on the range of the timers (other than being tick-grained).
> 
> The number of redistributions (each of which is very low cost), is
> always small (no more than six in the configuration we're discussing),
> and occurs over the lifetime of the timer.  The total work expended
> for a timer over its lifetime is of a fairly small range - on the
> order of 3:1 for a very long duration timer initially deposited into
> the highest tier vs. a short one initially placed in the lowest.  The
> most expensive operation being the initial tier selection (for which a
> good Count Leading Zeros instruction is very helpful).  Certainly the
> simple redistribution algorithm is bursty, but smoothing that out is
> not hard (you have 64 ticks to complete the redistribution of a bucket
> from the second tier to the first, and you can just do 1/64th of the

Yes -- though you would also have to deal with new timer activity
while doing that processing.

> work each tick).  The initial insertion and final dispatch are also
> lower cost than most schemes (for example, the traditional priority
> queue), for even a modest number of timers.
> 
> Obviously not counting the actual work of processing each timer when
> it fires.
> 
> As applied to your original post, you could attach timers to things
> pretty much at whim, and other than storage costs, have them add very
> little overhead to the system, without the complexity of having to
> share timers.

I think this is worth "filing away" for use in a "richer"
environment.  But, I think (for this project), I can hack
a "service timer" into each thread and fold the support for
it into the scheduler as if it was a core system feature
(which it will be) -- just like any other aspect of the
system.  I think the distinguishing characteristic will be
the range of "intervals" and their temporal distribution
(e.g., something that effectively sits in a loop "steady state"
*probably* won't benefit from the extra structure ?)

On Dec 12, 3:29=A0pm, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Robert,
>
>
>
>
>
> >>> The obvious problem with that is that in general it's impractical to
> >>> have enough buckets to cover all the possible timer expirations (you =
d
> >>> need 3.6 million buckets to cover an hour). =A0So instead you have
> >>> addition tiers of buckets at exponentially lower resolutions. =A0So i=
f
> >>> your adding a timer firing in 500ms, which doesn't fit in the 128 1ms
> >>> buckets of the first tier, you add it to the second tier, in the
> >>> seventh bucket. =A0The second tier is composed of 64ms buckets, the
> >>> first of which contains timers that will expire at least 64ms from
> >>> now. =A0Periodically (every 64ms), you take the timers in the first
> >>> bucket in the second tier (which will be expiring in 64-127ms), and
> >>> redistribute them in to the first (1ms) tier, from where they
> >>> eventually get dispatched. =A0The second tier is also a circular
> >>> structure, and gets stepped around after the redistribution.
> >>> Repeat with additional coarser grained tiers as needed.
> >> Are you discarding "resolution" (bad choice of word) as the
> >> interval increases? =A0E.g., do timers specified for 8190 and 8191
> >> ms (from now) *both* expire at the same ACTUAL instant?
>
> > No. =A0They'll probably move from the second to first tier in the same
> > batch, but end up in adjacent tier 1 buckets, and then fire at
> > successive ticks.
>
> OK, the point I was trying to identify was that you *do*
> track the "actual" interval specified (and don't just
> "round it off" to coarser granularity as it increases)
>
>
>
>
>
> >>> So to set a timer expiring 10,000ms from now, you'd add it to the
> >>> second bucket of the third tier. =A0Somewhere between 4096 and 8191
> >>> ticks before that timer is due to fire, the first bucket in the third
> >>> tier (now containing that timer) will be redistributed to the second
> >>> tier. =A0Let's say the redistribution happens 5000ms before expiratio=
n,
> >>> then the timer would end up in the 78th bucket in the second tier.
> >>> Later, somewhere between 64 and 127ms before firing, the first bucket
> >>> in the second tier (which now contains the timer in question) gets
> >>> redistributed to the first tier. =A0Assuming my arithmetic is correct=
,
> >>> and the redistributions are synced on the 64th/4096th/etc. ticks, the
> >>> timer would be moved to the 72nd bucket in the first tier. =A072 tick=
s
> >>> later, the timer would be dispatched.
> >> <frown> =A0Too much chance of my math and yours not agreeing so
> >> I won't follow the numbers. =A0Rather, the answer to my above
> >> question should clarify weather *your* math is tracking "all
> >> 10,000 of those milliseconds" *or* is just rounding it
> >> off to the nearest X.
>
> >> I.e., is the time stored in the struct as well as impliclty
> >> in the list in which it resides (regardless of which tier)?
> >> If so, then each "redistribution" requires you to examine
> >> every timer in the first bucket of "the second tier" and
> >> disperse them to potentially different buckets in the *first*
> >> tier -- based on their actual values (in effect, "time remaining").
>
> > That's correct. =A0The timer stores its actual expiration time.
> > Neighboring timer's "declump" as the move towards the finer resolution
> > tiers.
>
> So, in all but the first tier (assuming the first tier *always*
> has the granularity of the tick), the "redistribution" requires
> examining each timer as you "promote" (demote?) it to the finer
> granularity of the tier above (below? =A0I guess it depends on which
> way you look at things =A0:> ) to REDISTRIBUTE the individual timers
> into the buckets on that new tier.
>
> However, you should only have to do this for (at most) the first
> "bucket" of each tier when that time comes (?)
>
> It seems like this approach is geared towards lots of timers
> of wide dynamic range. =A0The overhead seems geared towards
> letting you ignore most/many of them until close to their
> expiration time (though you have an ongoing, sporadic effort
> to *maintain* them as they move to finer grained "buckets"
> (i.e., you have to revisit them several times before they
> expire)


Pretty much.  Although I would consider the timers to be basically
wholly ignored between initial insertion and dispatch, except for any
required redistributions.

It has the advantage of scalability, as the amount of work done per
timer does not increase as the number of timers in the system
increases (which is not the case for most schemes), and not placing
any limits on the range of the timers (other than being tick-grained).

The number of redistributions (each of which is very low cost), is
always small (no more than six in the configuration we're discussing),
and occurs over the lifetime of the timer.  The total work expended
for a timer over its lifetime is of a fairly small range - on the
order of 3:1 for a very long duration timer initially deposited into
the highest tier vs. a short one initially placed in the lowest.  The
most expensive operation being the initial tier selection (for which a
good Count Leading Zeros instruction is very helpful).  Certainly the
simple redistribution algorithm is bursty, but smoothing that out is
not hard (you have 64 ticks to complete the redistribution of a bucket
from the second tier to the first, and you can just do 1/64th of the
work each tick).  The initial insertion and final dispatch are also
lower cost than most schemes (for example, the traditional priority
queue), for even a modest number of timers.

Obviously not counting the actual work of processing each timer when
it fires.

As applied to your original post, you could attach timers to things
pretty much at whim, and other than storage costs, have them add very
little overhead to the system, without the complexity of having to
share timers.

Hi Robert,

>>> The obvious problem with that is that in general it's impractical to
>>> have enough buckets to cover all the possible timer expirations (you&#65533;d
>>> need 3.6 million buckets to cover an hour).  So instead you have
>>> addition tiers of buckets at exponentially lower resolutions.  So if
>>> your adding a timer firing in 500ms, which doesn't fit in the 128 1ms
>>> buckets of the first tier, you add it to the second tier, in the
>>> seventh bucket.  The second tier is composed of 64ms buckets, the
>>> first of which contains timers that will expire at least 64ms from
>>> now.  Periodically (every 64ms), you take the timers in the first
>>> bucket in the second tier (which will be expiring in 64-127ms), and
>>> redistribute them in to the first (1ms) tier, from where they
>>> eventually get dispatched.  The second tier is also a circular
>>> structure, and gets stepped around after the redistribution.
>>> Repeat with additional coarser grained tiers as needed.
>> Are you discarding "resolution" (bad choice of word) as the
>> interval increases?  E.g., do timers specified for 8190 and 8191
>> ms (from now) *both* expire at the same ACTUAL instant?
> 
> No.  They'll probably move from the second to first tier in the same
> batch, but end up in adjacent tier 1 buckets, and then fire at
> successive ticks.

OK, the point I was trying to identify was that you *do*
track the "actual" interval specified (and don't just
"round it off" to coarser granularity as it increases)

>>> So to set a timer expiring 10,000ms from now, you'd add it to the
>>> second bucket of the third tier.  Somewhere between 4096 and 8191
>>> ticks before that timer is due to fire, the first bucket in the third
>>> tier (now containing that timer) will be redistributed to the second
>>> tier.  Let's say the redistribution happens 5000ms before expiration,
>>> then the timer would end up in the 78th bucket in the second tier.
>>> Later, somewhere between 64 and 127ms before firing, the first bucket
>>> in the second tier (which now contains the timer in question) gets
>>> redistributed to the first tier.  Assuming my arithmetic is correct,
>>> and the redistributions are synced on the 64th/4096th/etc. ticks, the
>>> timer would be moved to the 72nd bucket in the first tier.  72 ticks
>>> later, the timer would be dispatched.
>> <frown>  Too much chance of my math and yours not agreeing so
>> I won't follow the numbers.  Rather, the answer to my above
>> question should clarify weather *your* math is tracking "all
>> 10,000 of those milliseconds" *or* is just rounding it
>> off to the nearest X.
>>
>> I.e., is the time stored in the struct as well as impliclty
>> in the list in which it resides (regardless of which tier)?
>> If so, then each "redistribution" requires you to examine
>> every timer in the first bucket of "the second tier" and
>> disperse them to potentially different buckets in the *first*
>> tier -- based on their actual values (in effect, "time remaining").
> 
> That's correct.  The timer stores its actual expiration time.
> Neighboring timer's "declump" as the move towards the finer resolution
> tiers.

So, in all but the first tier (assuming the first tier *always*
has the granularity of the tick), the "redistribution" requires
examining each timer as you "promote" (demote?) it to the finer
granularity of the tier above (below?  I guess it depends on which
way you look at things  :> ) to REDISTRIBUTE the individual timers
into the buckets on that new tier.

However, you should only have to do this for (at most) the first
"bucket" of each tier when that time comes (?)

It seems like this approach is geared towards lots of timers
of wide dynamic range.  The overhead seems geared towards
letting you ignore most/many of them until close to their
expiration time (though you have an ongoing, sporadic effort
to *maintain* them as they move to finer grained "buckets"
(i.e., you have to revisit them several times before they
expire)

On Dec 10, 1:01=A0pm, D Yuniskis <not.going.to...@seen.com> wrote:
> hi Robert,
>
> robertwess...@yahoo.com wrote:
> > On Dec 9, 2:41 pm, D Yuniskis <not.going.to...@seen.com> wrote:
>
> [chomp (i.e., "big snip" =A0:> )]
>
>
>
>
>
> >>> The scheme with the fixed buckets has a couple of advantages. =A0Firs=
t,
> >>> there's no real computation done at the tick - you just grab the
> >>> linked list of timers in the next 1ms bucket (and the list heads are =
a
> >>> circular structure, so there's never any allocation of new list
> >>> heads), fire them all, and you're done. =A0A timer that's abandoned
> >>> before being fired (as you'd expect for most timeouts) is just a
> >>> deletion from its current linked list. =A0Second, the cost of managin=
g
> >>> the timers is basically fixed based on the interval set, and is
> >>> basically always a small integer number of liked list insertions and
> >>> deletions (no more than seven for the aforementioned 278 years). =A0T=
he
> >> (sigh) =A0I don't think I'm following you. =A0:-/
>
> >> Hopefully restating your environment:
> >> - the tick happens at 1ms intervals
> >> - all timers for a given "duration" (blech) are linked into a
> >> =A0 =A0"duration-specific" list (i.e., any timer set for 3ms is
> >> =A0 =A0linked into the "3ms list" -- this is just an efficiency
> >> =A0 =A0hack to make it easy to grab *all* of the timers expiring
> >> =A0 =A0at a given "instant"/tick, right?) =A0You call each such
> >> =A0 =A0list a "bucket" (??)
> >> - these lists are grouped into "tiers": =A0the first tier
> >> =A0 =A0spans the interval (0 - 128ms) and contains 128 lists
> >> =A0 =A0(i.e., a list of 1ms timers, a list of 2ms timers, a list
> >> =A0 =A0of 3ms timers, etc.); the second tier spans the interval
> >> =A0 =A0(128 - 8192ms) and also contains 128 lists (i.e., the
> >> =A0 =A0list of timers expiring in 128-191ms, the list of 192-235ms,
> >> =A0 =A0etc.); the third tier spans (8192- ....
> >> =A0 =A0[I hope my arithmetic is correct at this hour of the morning]
> >> - within a list, the individual timers are sorted by duration.
> >> =A0 =A0For the lists in the finest grained tier, no sorting is necessa=
ry
> >> =A0 =A0as each timer has the *same* duration (i.e., N ms). =A0For high=
er
> >> =A0 =A0tiers, the durations may vary due to a coarser grading (e.g.,
> >> =A0 =A0the 128ms list contains timers expiring in 128, 129, 130... 191=
ms)
> >> =A0 =A0[sorting could be donw at insertion or postponed to "redistribu=
tion"]
>
> >> Is this roughly correct? =A0(I haven't checked the actual tiers; your
> >> values change between, e.g., 8192 and 4096 later)
>
> >>> only time you compute the time is at the initial insertion of the
> >>> timer or at one of its redistributions (FWIW, the redistributions are
> >>> marginally simpler, since you don't have to decide which tier you're
> >>> going to insert into - always the next finer one). =A0The number of
> >>> timers in the system does not impact the cost per timer at all.
> >> It seems like all the "tiers" buys you is convenient places to
> >> peek into what would otherwise be a single (sorted) list of
> >> timer expirations (?). =A0If so, it loses its appeal when the
> >> range of timeout *values* is reduced -- yet the number of
> >> timers remains high.
>
> >> E.g., if I'm seeing *lots* of timers at a *few* different "values",
> >> then those might all end up in a single "tier". =A0Or, a "few" tiers.
> >> So, the improvement the tiers give you is marginal compared to just
> >> having timers in "identical valued" lists. =A0(?)
>
> >> I, for example, just have a single list with "markers" at
> >> particular points *into* the list (to cut search time).
> >> For expiration, I just pull timers off the head of the list
> >> having "0" time on them until I find one that has a nonzero
> >> (positive) "delta time". =A0That time defines the next time that
> >> the "timers" will need to be examined (i.e., nothing expires
> >> now and that "delta time").
>
> >> If all equal valued timers were placed in explicit lists,
> >> then I wouldn't have to examine subsequent timers in my
> >> *single* list -- I could just pull *the* list of "expiring
> >> timers" off the head of the queue.
>
> >> But, I would then need to walk through *that* list so
> >> I still have to process each individual timer... =A0:-/
>
> > No. =A0Each bucket contains timer expiring at a certain time, nothing t=
o
> > do with their durations.
>
> Sorry, that's what I meant: =A0"X seconds from now" (now being the
> entry just before the head of the list).
>
> > So the first tier is a set of 128 buckets, the first of which holds
> > timers expiring 1ms from now (IOW, at the next tick), the second 2ms
> > from now (second tick), etc. =A0You'd obviously implement that as a
> > circular structure, and inserting a timer with an expiration 5ms in
> > the future would add that timer to the linked list in the fifth bucket
> > from the current start. =A0Then at each tick you just grab the next
> > bucket, dispatch all those timers, and step to the next position in
> > the circular structure.
>
> So, some "buckets" can be "empty".


Sure.  And an empty bucket is trivial to deal with at the tick.


> > The obvious problem with that is that in general it's impractical to
> > have enough buckets to cover all the possible timer expirations (you=EF=
=BF=BDd
> > need 3.6 million buckets to cover an hour). =A0So instead you have
> > addition tiers of buckets at exponentially lower resolutions. =A0So if
> > your adding a timer firing in 500ms, which doesn't fit in the 128 1ms
> > buckets of the first tier, you add it to the second tier, in the
> > seventh bucket. =A0The second tier is composed of 64ms buckets, the
> > first of which contains timers that will expire at least 64ms from
> > now. =A0Periodically (every 64ms), you take the timers in the first
> > bucket in the second tier (which will be expiring in 64-127ms), and
> > redistribute them in to the first (1ms) tier, from where they
> > eventually get dispatched. =A0The second tier is also a circular
> > structure, and gets stepped around after the redistribution.
>
> > Repeat with additional coarser grained tiers as needed.
>
> Are you discarding "resolution" (bad choice of word) as the
> interval increases? =A0E.g., do timers specified for 8190 and 8191
> ms (from now) *both* expire at the same ACTUAL instant?


No.  They'll probably move from the second to first tier in the same
batch, but end up in adjacent tier 1 buckets, and then fire at
successive ticks.


> > So to set a timer expiring 10,000ms from now, you'd add it to the
> > second bucket of the third tier. =A0Somewhere between 4096 and 8191
> > ticks before that timer is due to fire, the first bucket in the third
> > tier (now containing that timer) will be redistributed to the second
> > tier. =A0Let's say the redistribution happens 5000ms before expiration,
> > then the timer would end up in the 78th bucket in the second tier.
> > Later, somewhere between 64 and 127ms before firing, the first bucket
> > in the second tier (which now contains the timer in question) gets
> > redistributed to the first tier. =A0Assuming my arithmetic is correct,
> > and the redistributions are synced on the 64th/4096th/etc. ticks, the
> > timer would be moved to the 72nd bucket in the first tier. =A072 ticks
> > later, the timer would be dispatched.
>
> <frown> =A0Too much chance of my math and yours not agreeing so
> I won't follow the numbers. =A0Rather, the answer to my above
> question should clarify weather *your* math is tracking "all
> 10,000 of those milliseconds" *or* is just rounding it
> off to the nearest X.
>
> I.e., is the time stored in the struct as well as impliclty
> in the list in which it resides (regardless of which tier)?
> If so, then each "redistribution" requires you to examine
> every timer in the first bucket of "the second tier" and
> disperse them to potentially different buckets in the *first*
> tier -- based on their actual values (in effect, "time remaining").


That's correct.  The timer stores its actual expiration time.
Neighboring timer's "declump" as the move towards the finer resolution
tiers.


> OTOH, if you are using coarser "resolutions" for each successive
> tier, then 10000 ms is the same as 10001 ms (i.e., you are throwing
> away some portion of your specified time)


No, the other way...


> > As each "bucket" is just the head of a doubly linked list, adding or
> > removing a timer from a bucket is a trivial linked list operation. =A0I=
n
> > the case of the initial insertion, you need to determine in which tier
> > to initially place the timer (timer expires in less than 128ms,
> > 8192ms, 524,288ms=EF=BF=BD), and then it's a subtract and divide (by a =
power
> > of 64, in this example) to select the bucket into which the timer is
> > inserted. =A0The redistribution between tiers is even simpler, you just
> > run through the linked list of timers that need to be redistributed
> > (from the first bucket in the tier), and you just insert each of them
> > in the appropriate bucket in the next finer tier.
>
> > So the total processing of a timer consists of the initial insertion,
> > zero or more redistributions (based on the log64() of how far in the
> > future the timer expires), and the final dispatch. =A0All of which are
>
> I can't comment on that without clarification of the above. =A0:-/
>
>
>
> > simple linked list insertions or just a step through a linked list.
> > The total storage required is several tiers of 128 buckets, the
> > buckets basically being just the list heads. =A0And as I mentioned,
> > seven tiers gets you a couple of centuries of timer. =A0Plus, of course=
,
> > the timers themselves.

On Fri, 10 Dec 2010 12:20:26 -0700, D Yuniskis
<not.going.to.be@seen.com> wrote:

><snip>
>Comer, Tanenbaum, Organick, Savitzky, Foster, McKusick, Bach,
>Boykin, etc.  I read Comer more than 20 years ago.  If you
>read this thread carefully, you'll see I've already alluded
>to delta queues.
><snip>

Thanks.  I hadn't read it carefully, as I'd indicated.  And
this is a good answer to me.  Just wanted to ask.

Jon

Hi Jon,

Jon Kirwan wrote:
> On Thu, 09 Dec 2010 12:45:59 -0700, D Yuniskis wrote:
> 
>> <snip>
>> Obviously, I am hoping to come up with a "cheaper" solution.
>> (for some definition of "cheap").
> 
> You know your application and I haven't been reading much of
> the thread, waiting to see how it develops.  I'm still kind
> of ignorant because I've skimmed.  But have you read Comer's
> XINU book and taken into account his chapter on delta queues?

Comer, Tanenbaum, Organick, Savitzky, Foster, McKusick, Bach,
Boykin, etc.  I read Comer more than 20 years ago.  If you
read this thread carefully, you'll see I've already alluded
to delta queues.

The problem isn't "how to implement timers".  Rather, the
problem is "how to implement timers that are used in a
specific way" (the implication being that some economies
can be gained by rigidly constraining *how* those timers
would be used).

All of the above "suffer" from the fact that they deal
with *generic* timing instead of its application to
(this) specific goal.  And, they all tend to be "heavy"
implementations.

E.g., if you were implementing a set of "reminders" for
an appointment book/calendar (I always gravitate to PDAs
for examples :> ), you probably *wouldn't* use nanosleep()!
Aside from having far too much precision, it would
complicate the implementation:  how much time between
"now" and "whenever I need to signal the user that it
is time for his 'appointment'"?  The code would be
overly complex and its purpose harder to fathom.

Instead, you would have some other "timing" layer that
presented something more user-oriented to your code.
E.g., the granularity of "appointments" would be
deliberately synthesized to lump "alarms" into
single, discrete times (so you don't end up with
alarms going off three milliseconds apart, etc.)

> It almost sounds like you are struggling in that direction to
> me, but only you'd know if that's so.  Mostly, I'm curious if
> you'd taken the time to read his book and are familiar with
> the idea enough to discuss it in your context.

Forget *how* to implement timers.  Instead, think about
"how to implement timers that will ONLY be used in this
fashion"  :>  (timing "services")

hi Robert,

robertwessel2@yahoo.com wrote:
> On Dec 9, 2:41 pm, D Yuniskis <not.going.to...@seen.com> wrote:

[chomp (i.e., "big snip"  :> )]

>>> The scheme with the fixed buckets has a couple of advantages.  First,
>>> there's no real computation done at the tick - you just grab the
>>> linked list of timers in the next 1ms bucket (and the list heads are a
>>> circular structure, so there's never any allocation of new list
>>> heads), fire them all, and you're done.  A timer that's abandoned
>>> before being fired (as you'd expect for most timeouts) is just a
>>> deletion from its current linked list.  Second, the cost of managing
>>> the timers is basically fixed based on the interval set, and is
>>> basically always a small integer number of liked list insertions and
>>> deletions (no more than seven for the aforementioned 278 years).  The
>> (sigh)  I don't think I'm following you.  :-/
>>
>> Hopefully restating your environment:
>> - the tick happens at 1ms intervals
>> - all timers for a given "duration" (blech) are linked into a
>>    "duration-specific" list (i.e., any timer set for 3ms is
>>    linked into the "3ms list" -- this is just an efficiency
>>    hack to make it easy to grab *all* of the timers expiring
>>    at a given "instant"/tick, right?)  You call each such
>>    list a "bucket" (??)
>> - these lists are grouped into "tiers":  the first tier
>>    spans the interval (0 - 128ms) and contains 128 lists
>>    (i.e., a list of 1ms timers, a list of 2ms timers, a list
>>    of 3ms timers, etc.); the second tier spans the interval
>>    (128 - 8192ms) and also contains 128 lists (i.e., the
>>    list of timers expiring in 128-191ms, the list of 192-235ms,
>>    etc.); the third tier spans (8192- ....
>>    [I hope my arithmetic is correct at this hour of the morning]
>> - within a list, the individual timers are sorted by duration.
>>    For the lists in the finest grained tier, no sorting is necessary
>>    as each timer has the *same* duration (i.e., N ms).  For higher
>>    tiers, the durations may vary due to a coarser grading (e.g.,
>>    the 128ms list contains timers expiring in 128, 129, 130... 191ms)
>>    [sorting could be donw at insertion or postponed to "redistribution"]
>>
>> Is this roughly correct?  (I haven't checked the actual tiers; your
>> values change between, e.g., 8192 and 4096 later)
>>
>>> only time you compute the time is at the initial insertion of the
>>> timer or at one of its redistributions (FWIW, the redistributions are
>>> marginally simpler, since you don't have to decide which tier you're
>>> going to insert into - always the next finer one).  The number of
>>> timers in the system does not impact the cost per timer at all.
>> It seems like all the "tiers" buys you is convenient places to
>> peek into what would otherwise be a single (sorted) list of
>> timer expirations (?).  If so, it loses its appeal when the
>> range of timeout *values* is reduced -- yet the number of
>> timers remains high.
>>
>> E.g., if I'm seeing *lots* of timers at a *few* different "values",
>> then those might all end up in a single "tier".  Or, a "few" tiers.
>> So, the improvement the tiers give you is marginal compared to just
>> having timers in "identical valued" lists.  (?)
>>
>> I, for example, just have a single list with "markers" at
>> particular points *into* the list (to cut search time).
>> For expiration, I just pull timers off the head of the list
>> having "0" time on them until I find one that has a nonzero
>> (positive) "delta time".  That time defines the next time that
>> the "timers" will need to be examined (i.e., nothing expires
>> now and that "delta time").
>>
>> If all equal valued timers were placed in explicit lists,
>> then I wouldn't have to examine subsequent timers in my
>> *single* list -- I could just pull *the* list of "expiring
>> timers" off the head of the queue.
>>
>> But, I would then need to walk through *that* list so
>> I still have to process each individual timer...  :-/
> 
> No.  Each bucket contains timer expiring at a certain time, nothing to
> do with their durations.

Sorry, that's what I meant:  "X seconds from now" (now being the
entry just before the head of the list).

> So the first tier is a set of 128 buckets, the first of which holds
> timers expiring 1ms from now (IOW, at the next tick), the second 2ms
> from now (second tick), etc.  You'd obviously implement that as a
> circular structure, and inserting a timer with an expiration 5ms in
> the future would add that timer to the linked list in the fifth bucket
> from the current start.  Then at each tick you just grab the next
> bucket, dispatch all those timers, and step to the next position in
> the circular structure.

So, some "buckets" can be "empty".

> The obvious problem with that is that in general it's impractical to
> have enough buckets to cover all the possible timer expirations (you&#4294967295;d
> need 3.6 million buckets to cover an hour).  So instead you have
> addition tiers of buckets at exponentially lower resolutions.  So if
> your adding a timer firing in 500ms, which doesn't fit in the 128 1ms
> buckets of the first tier, you add it to the second tier, in the
> seventh bucket.  The second tier is composed of 64ms buckets, the
> first of which contains timers that will expire at least 64ms from
> now.  Periodically (every 64ms), you take the timers in the first
> bucket in the second tier (which will be expiring in 64-127ms), and
> redistribute them in to the first (1ms) tier, from where they
> eventually get dispatched.  The second tier is also a circular
> structure, and gets stepped around after the redistribution.
> 
> Repeat with additional coarser grained tiers as needed.

Are you discarding "resolution" (bad choice of word) as the
interval increases?  E.g., do timers specified for 8190 and 8191
ms (from now) *both* expire at the same ACTUAL instant?

> So to set a timer expiring 10,000ms from now, you'd add it to the
> second bucket of the third tier.  Somewhere between 4096 and 8191
> ticks before that timer is due to fire, the first bucket in the third
> tier (now containing that timer) will be redistributed to the second
> tier.  Let's say the redistribution happens 5000ms before expiration,
> then the timer would end up in the 78th bucket in the second tier.
> Later, somewhere between 64 and 127ms before firing, the first bucket
> in the second tier (which now contains the timer in question) gets
> redistributed to the first tier.  Assuming my arithmetic is correct,
> and the redistributions are synced on the 64th/4096th/etc. ticks, the
> timer would be moved to the 72nd bucket in the first tier.  72 ticks
> later, the timer would be dispatched.

<frown>  Too much chance of my math and yours not agreeing so
I won't follow the numbers.  Rather, the answer to my above
question should clarify weather *your* math is tracking "all
10,000 of those milliseconds" *or* is just rounding it
off to the nearest X.

I.e., is the time stored in the struct as well as impliclty
in the list in which it resides (regardless of which tier)?
If so, then each "redistribution" requires you to examine
every timer in the first bucket of "the second tier" and
disperse them to potentially different buckets in the *first*
tier -- based on their actual values (in effect, "time remaining").

OTOH, if you are using coarser "resolutions" for each successive
tier, then 10000 ms is the same as 10001 ms (i.e., you are throwing
away some portion of your specified time)

> As each "bucket" is just the head of a doubly linked list, adding or
> removing a timer from a bucket is a trivial linked list operation.  In
> the case of the initial insertion, you need to determine in which tier
> to initially place the timer (timer expires in less than 128ms,
> 8192ms, 524,288ms&#4294967295;), and then it's a subtract and divide (by a power
> of 64, in this example) to select the bucket into which the timer is
> inserted.  The redistribution between tiers is even simpler, you just
> run through the linked list of timers that need to be redistributed
> (from the first bucket in the tier), and you just insert each of them
> in the appropriate bucket in the next finer tier.
> 
> So the total processing of a timer consists of the initial insertion,
> zero or more redistributions (based on the log64() of how far in the
> future the timer expires), and the final dispatch.  All of which are

I can't comment on that without clarification of the above.  :-/

> simple linked list insertions or just a step through a linked list.
> The total storage required is several tiers of 128 buckets, the
> buckets basically being just the list heads.  And as I mentioned,
> seven tiers gets you a couple of centuries of timer.  Plus, of course,
> the timers themselves.

Hi Marc,

Marc Jet wrote:
> I notice there is some overlap between what we came up with :-)

<grin>  If you do this sort of thing long enough, you end up
touching on damn near every possible implementation strategy
(unless, of course, you take a pedestrian approach to it and
"settle" for "the first thing that works"  :> )

>> Do you put the Scheduler into the same set of signal consumers?
>> (that would be cool as it would let scheduling be treated as just
>> another "thread")
> 
> I don't understand the question.  Maybe the comments below resolve it,
> otherwise please elaborate.  What can be said is that both the ISR and
> Attach/Detach work at the same abstraction level (they reprogram the
> timer hardware).  On the task switching level, the ISR executes
> asynchronously and interacts with the scheduler, while Attach/Detach
> always execute in the context of the calling task.  Due to a potential
> race, Detach sometimes also interacts with the scheduler.

I *think* that's close to what I am suggesting (asking) above
(barring terminology).

I've taken several approaches to the *basic* structure of
timer/scheduler over the years (here, I am ignoring implementation
of the actual timers themselves but, rather, how the "timing
service" relates to the "scheduling").

By far, the most common approach has been to create a periodic
interrupt ("tick") that the hardware uses to signal a *timing*
service.  This directly or indirectly updates the "timers"
(how is not important).  A "special" timer is the "scheduling
timer" that defines the time left on the executing thread's
timeslice.  When that timer expires, the scheduler is invoked to
"preempt" the current thread [there are lots of caveats, here,
that can be ignored... important thing is "scheduler sits atop
timing service"].  Again, the mechanism by which the scheduler
is invoked is not important (and can be varied).

[Note that the "tick" can thus be semantically different from
the "jiffy".  Further, that threads can have different size
timeslices -- e.g., this allows finer partitioning of the
CPU *within* a priority level]

Also common with this scheme, time doesn't exist outside of
the scope of the "tick".  A practical consequence of this is
that a thread that yields/blocks just prior to its timeslice
ending (to be pedantic:  "just prior to the next tick") can
*cheat* the next thread to execute out of a portion of its
timeslice.  This is because the scheduler runs (because of the
yield) AS IF it had run at the preceding tick.  So, the next
thread is activated... and a tick comes along (coincidentally)
immediately thereafter!  :>

You can work around this by crediting the thread with a partial
tick (in effect, increasing its timeslice by [0,1) tick) -- but
that just adds a bias in the other direction (i.e., instead of getting
cheated out of a tick, it *gets* some portion of a free tick).

To "fix" this problem, you can have the scheduler restart the
"tick interval" as it activates the next thread.  In effect,
resynchronizing the tick with the jiffy at each thread activation.
In that case, a thread that does NOT block or yield is guaranteed
its entire timeslice.

But, now the tick is running at a varying frequency.  (and, the
scheduler is having to muck with hardware!)  This complicates
use of the tick (which drives the jiffy) for reliable timing.

You can "fix" this problem by running the timing services independant
of the scheduler (e.g., another "clock").  Or, you can "do the math"
(keep track of residuals) to track the "jitter" in this "clock"
so that scheduling/timing are unaffected (i.e., compensated).

Or, just live with the problem. (i.e., pretend it doesn't happen often
and/or that it happens "equally" to all threads)

What I was suggesting was letting the timing service run the
IRQ exclusively.  It peeks at an ordered queue of "timers"
and programs the hardware timer (clock) with the period of
the next timer (at head of queue).  I.e., this is the soonest
"thing" (temporal event) that is going to happen so any "clock"
IRQ's prior to that would be "uneventful".

Now, have the scheduler feed events (which should be at most one?)
into that same queue.  So, coincident with activating the next
thread to execute, the scheduler injects a timer event into that
queue with an expiration time of "now + timeslice" -- and *remembers*
a handle for that event!

If the scheduler gets invoked prior to that time, it has the
responsibility of deleting that (no longer pertinent) event
and reinserting a *new* one for the thread that it is activating
*now* (which may have a different timeslice!).

The problem with this approach is there is nothing that limits
the granularity of events in that queue.  E.g., if you allow time
to be specified in "timer clock cycles", then you could
conceivably have the two events in the queue differing by only
a few clock cycles -- too close temporally to be handled as
distinct events.  (so, you want to be restrict the values
that can be used for "times" -- which brings you back to the
first implementation -- effectively)

>> It still seems to avoid the (to me) obvious chances for
>> optimization/exploitation -- e.g., these timeouts aren't
>> "general purpose timers" that can be used to initiate
>> events.  They are *always* used in the same way -- to
>> unblock a thread waiting on some service/resource.
>> If all services are synchronous, then a thread can ONLY
>> be blocking on one at a time.  I.e., there is only need for
>> one "service timer" per thread.  And, that the use of that
>> timer is preordained.
> 
> Not necessarily.  In the async API described in my previous post, the
> "timer signal" describes a moment in time and triggers when that
> moment is reached.  The trigger mechanism is implemented using my
> abstract concept of signal producers (which in this case represents
> the timer ISR side), and the trigger event manifests as signal on the
> sensor (which always represents a task).
> 
> A task can block on one or multiple sensors.  The obvious use case for
> blocking on a single sensor is the sleep() function.  A use case for
> multiple sensors, is to wait for IO to finish plus a timeout.
> 
> At first sight it doesn't make sense to simultanously block on
> multiple timer signal sensors, because there is always one to time out
> first.  But even that is a possible use case!  Specifically when a
> function implements own timeouts and ALSO honors timeouts from the
> upper layers (e.g. given as function argument).  If it wasn't allowed
> to block on multiple timeouts simultanously, it would have to deep-
> inspect them all and find the first one to expire.

Correct.  I allow any number of "general purpose" timers.
Note that I also allow timers to be inspected.  I.e., a
consumer can "time" how long something takes (and that
thing can, in turn, time how long parts of *it* take, etc.).

This is essential if you want your code to adapt to the
temporal constraints of its environment (instead of just
breaking -- missed deadline -- as load increases).  It
makes more sense to implement "timers" (as in "measure
elapsed time") this way and have the consumer track the
"initial value" than to have the timing service do it.

> Note that my signal sensors and producers are generic entities, while
> the timer signal is a specific implementation using them.  Although
> it's the most prominent one that I made and use, the producer / sensor
> concept is generic IPC and removes the task scheduling aspects from
> drivers and ISRs.

Understood.

> BTW your example code snippet, ported to my polled API, would look
> almost identical.  Except there is no handle + struct/object, but
> rather the handle IS the struct/object.

Yes -- but then you can't drag the timer out of the user's
address space.  I.e., the user can always directly access
the contents of the timer struct in your implementation.
My API allows the OS to "hide"/protect the timer *from*
the "user" (a bug in your code overwriting some key part
of the struct could crash the timer service, for example).

> This is a very important thing to understand.  There can be as many
> independent time references as the application desires.  They can be
> duplicated, passed to other modules, and dropped without caring about
> storage ownership.

Yes, though see above.

> From the multitasking point of view, with the polled API there's no
> effect beyond the current task.

Yes, the current task (thread) essentially acts as its own
timing service.  OTOH, there are no guarantees that it will
promptly "see" an expired timer (e.g., the entire thread may
be blocked when the timer expires; or, the thread may not
be executing that code when the timer actually expires).

The first MTOS that I designed used a similar timing service
except the timers were updated by a "timer thread" (i.e.,
timers were no different from any other "task"/service).
The timer thread, in turn, polled a global variable updated
by the tick ISR (i.e., nothing happened in the tick ISR
other than incrementing a counter).  Timers were arranged
in an array (that the timer thread managed).  A consumer
could directly reference the value *in* a timer (e.g.,
by accessing "timer[i]").

This made for an incredibly powerful programming model.
Since checking the timer was very efficient, you wrote
code like:

   if (DATA_AVAILABLE != check_uart()) {
     /* nothing here, yet */
     if (timer[UART_TIMEOUT] > 0) {
        /* timer hasn't yet expired */
        blink_light();

        /* keep waiting */
        yield();
     } else {
        /* timeout waiting for data */
     }
   } else {
     /* received data; process it */
   }

I.e., it was *faster* to just peek at the timer's
current value and "yield" than it was to implement
an API to "wait_for_timer()".  It also let you *do*
things while waiting which, otherwise, required a
separate thread *or* a heavier implementation.

(I still use this approach on *tiny* designs)

> With the async API on the other hand,
> there's a common (sorted) list for all timer signals that are
> "attached" to the global producer (which basically is the timer ISR).
> "Attaching" DOES affect beyond the scope of the current application
> (cpu-wise, not memory wise).  More timer signals on the list
> translates to slower attaching for everybody, and also more ISR
> activity.
> 
>> I think I'll see what the cost of including it as part of
>> the thread state and an implicit part of the scheduler gives
>> me in terms of performance and cost.
> 
> Merging my polled API into the thread would not be beneficial, because
> it would prohibit using several timeouts at once.  They are useful for
> nested timeouts, e.g. when a subfunction decides to implement a
> timeout without exposing it on the API.  Also, in more complex
> statemachines I often find myself using several time references
> simultanously (e.g. last character received, last some_event, last lcd
> update).
> 
> If you referred to merging the async timer signals into the thread, it
> may have less negative impact - but still has.  It would reduce them
> to really just be "an absolute timeout" for "the current activity".

Exactly.  The "current activity" is a service invocation.
Since service invocations are synchronous (my choice), there
can be only one (for each thread).  Note that this does not
preclude having general purpose timers -- as many as you
want -- in addition to this "service timer".  I'm just
exploiting the fact that the "service timer" has a fixed
and inflexible role.  It need not have the same capabilities
of a general purpose timer (e.g., it *belongs* to a particular
thread; something that isn't necessarily a requirement to
be imposed on general purpose timers).

I think this will let me make it lightweight enough that
it ends up a net savings over using a general purpose timer
for the same purpose (both in terms of space *and* time).

> It would mean that there (suddenly) must be a strict boundary between
> OS functionality (which unconditionally aborts all action upon

No.  OS doesn't have to "abort" anything.  I.e., if I were
to implement deadline handlers to do this same sort of thing,
then they *would* (typically) abort pending/blocked operations
9service calls).  [this makes coding the application simpler
but adds a lot of machinery in the OS -- as well as forcing the
application to handle the asynchronous nature of aborted
services].

If, OTOH, I adopt the *practice* that every service supports
the concept of a timeout and is *coded* with this in mind,
then they can assume (at some cost) the responsibility of
watching -- and compensating -- for their own missed deadlines.

> timeout) and application functionality (which initiates retry
> mechanisms and sane error handling, instead of just passing the
> timeout error upstream).  Also, to a lesser degree the previous
> paragraph also applies here.

I notice there is some overlap between what we came up with :-)

> Do you put the Scheduler into the same set of signal consumers?
> (that would be cool as it would let scheduling be treated as just
> another "thread")

I don't understand the question.  Maybe the comments below resolve it,
otherwise please elaborate.  What can be said is that both the ISR and
Attach/Detach work at the same abstraction level (they reprogram the
timer hardware).  On the task switching level, the ISR executes
asynchronously and interacts with the scheduler, while Attach/Detach
always execute in the context of the calling task.  Due to a potential
race, Detach sometimes also interacts with the scheduler.

> It still seems to avoid the (to me) obvious chances for
> optimization/exploitation -- e.g., these timeouts aren't
> "general purpose timers" that can be used to initiate
> events.  They are *always* used in the same way -- to
> unblock a thread waiting on some service/resource.
> If all services are synchronous, then a thread can ONLY
> be blocking on one at a time.  I.e., there is only need for
> one "service timer" per thread.  And, that the use of that
> timer is preordained.

Not necessarily.  In the async API described in my previous post, the
"timer signal" describes a moment in time and triggers when that
moment is reached.  The trigger mechanism is implemented using my
abstract concept of signal producers (which in this case represents
the timer ISR side), and the trigger event manifests as signal on the
sensor (which always represents a task).

A task can block on one or multiple sensors.  The obvious use case for
blocking on a single sensor is the sleep() function.  A use case for
multiple sensors, is to wait for IO to finish plus a timeout.

At first sight it doesn't make sense to simultanously block on
multiple timer signal sensors, because there is always one to time out
first.  But even that is a possible use case!  Specifically when a
function implements own timeouts and ALSO honors timeouts from the
upper layers (e.g. given as function argument).  If it wasn't allowed
to block on multiple timeouts simultanously, it would have to deep-
inspect them all and find the first one to expire.

Note that my signal sensors and producers are generic entities, while
the timer signal is a specific implementation using them.  Although
it's the most prominent one that I made and use, the producer / sensor
concept is generic IPC and removes the task scheduling aspects from
drivers and ISRs.

BTW your example code snippet, ported to my polled API, would look
almost identical.  Except there is no handle + struct/object, but
rather the handle IS the struct/object.

This is a very important thing to understand.  There can be as many
independent time references as the application desires.  They can be
duplicated, passed to other modules, and dropped without caring about
storage ownership.

From the multitasking point of view, with the polled API there's no
effect beyond the current task.  With the async API on the other hand,
there's a common (sorted) list for all timer signals that are
"attached" to the global producer (which basically is the timer ISR).
"Attaching" DOES affect beyond the scope of the current application
(cpu-wise, not memory wise).  More timer signals on the list
translates to slower attaching for everybody, and also more ISR
activity.

> I think I'll see what the cost of including it as part of
> the thread state and an implicit part of the scheduler gives
> me in terms of performance and cost.

Merging my polled API into the thread would not be beneficial, because
it would prohibit using several timeouts at once.  They are useful for
nested timeouts, e.g. when a subfunction decides to implement a
timeout without exposing it on the API.  Also, in more complex
statemachines I often find myself using several time references
simultanously (e.g. last character received, last some_event, last lcd
update).

If you referred to merging the async timer signals into the thread, it
may have less negative impact - but still has.  It would reduce them
to really just be "an absolute timeout" for "the current activity".
It would mean that there (suddenly) must be a strict boundary between
OS functionality (which unconditionally aborts all action upon
timeout) and application functionality (which initiates retry
mechanisms and sane error handling, instead of just passing the
timeout error upstream).  Also, to a lesser degree the previous
paragraph also applies here.

Best regards