"Efficient" timer implementations

Hi,

In my latest RTOS, I have timers on every service (i.e.,
every service invocation can return ERROR_TIMEOUT in
addition to SUCCESS or FAILURE).  But, the common
approaches to providing timers (not just timeOUTs)
don't "feel right" when applied consistently like this.
I'm wondering if there is something I can exploit to
"cut a corner" in my implementation...

E.g., allocating a "pool" of timers requires careful
sizing of that pool -- lest a service end up having
to support a NO_TIMER_AVAILABLE error code (frown).
OTOH, dynamic allocation (potentially) ends up with
*lots* of timers having to be managed (overhead).

Consider, most timers will (presumably) NOT expire
in normal use so the cost of maintaining them should
be low -- foolish to allocate lots of resources to
something that you hope not to "use".

The idea of a "Timer per Service" doesn't work because a
service can be multithreaded (in "reality" or in "effect").
I.e., that would serialize all accesses to the service.

A "Timer per Thread" is a potential solution -- *if*
you also allow "general purpose" timers to coexist
with that "service timer".  (i.e., a thread could
want to start a general purpose timer, *then* invoke
a service with another "timeout" WITHOUT having to
explicitly share that "general purpose" timer with
the service provider).

[N.B. this is an option but adds a lot of complexity
to timer management -- and would limit the number of
such general purpose timers that the thread could
create]

[Also, note that this could complicate the implementation
of *layered* services]

I could potentially trade away the ability to infer
*elapsed* time from the examination of a (running)
timer -- but, it is imperative that a timer be an
independant entity in its own right (e.g., so you
can pass the "time remaining" from one *completed*
service call to the *next* service called).

Or, is this just one of those areas where there is no
"clever" solution?

Thx,
--don

Reply by robe...@yahoo.com ●December 7, 20102010-12-07

On Dec 6, 2:18=A0pm, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi,
>
> In my latest RTOS, I have timers on every service (i.e.,
> every service invocation can return ERROR_TIMEOUT in
> addition to SUCCESS or FAILURE). =A0But, the common
> approaches to providing timers (not just timeOUTs)
> don't "feel right" when applied consistently like this.
> I'm wondering if there is something I can exploit to
> "cut a corner" in my implementation...
>
> E.g., allocating a "pool" of timers requires careful
> sizing of that pool -- lest a service end up having
> to support a NO_TIMER_AVAILABLE error code (frown).
> OTOH, dynamic allocation (potentially) ends up with
> *lots* of timers having to be managed (overhead).
>
> Consider, most timers will (presumably) NOT expire
> in normal use so the cost of maintaining them should
> be low -- foolish to allocate lots of resources to
> something that you hope not to "use".
>
> The idea of a "Timer per Service" doesn't work because a
> service can be multithreaded (in "reality" or in "effect").
> I.e., that would serialize all accesses to the service.
>
> A "Timer per Thread" is a potential solution -- *if*
> you also allow "general purpose" timers to coexist
> with that "service timer". =A0(i.e., a thread could
> want to start a general purpose timer, *then* invoke
> a service with another "timeout" WITHOUT having to
> explicitly share that "general purpose" timer with
> the service provider).
>
> [N.B. this is an option but adds a lot of complexity
> to timer management -- and would limit the number of
> such general purpose timers that the thread could
> create]
>
> [Also, note that this could complicate the implementation
> of *layered* services]
>
> I could potentially trade away the ability to infer
> *elapsed* time from the examination of a (running)
> timer -- but, it is imperative that a timer be an
> independant entity in its own right (e.g., so you
> can pass the "time remaining" from one *completed*
> service call to the *next* service called).
>
> Or, is this just one of those areas where there is no
> "clever" solution?

I don't quite understand your constraints.  Is it the overhead of
managing the very large set of timers?  The usual priority queue is
O(log(n)) in the number of timers in inserting, deleting and firing
timers.

An alternative, if you have a fixed tick (say 1ms), is to use a series
of groups of exponentially increasing buckets for timers.  For
example, have 128 buckets for timers at 1ms resolution, then 128 at
64ms resolution, 128 at 4096ms...  Structure each tier as a circular
list of buckets.  Insert a new timer in the linked list for the best
bucket (IOW, if the new timer expires in the next 128ms, insert into
the appropriate 1ms bucket, if it expires in more than that, but less
than 8192ms, use the 64ms buckets).  Every tick you just fire all the
timers in the next 1ms bucket.  Every 64ms redistribute the timers in
the next 64ms bucket into the 1ms buckets, repeat every 4096ms for the
next bucket in the third (4096ms) tier, etc.

Basically that does one linked list insertion and deletion for timers
firing in the next 128ms, 2 for those in the next 8192ms, 3 for 524s,
etc.  A handful of tiers gets you to timers as long as you practically
need (for the above parameters, seven tiers gets you to 278  years).

The simple design has one flaw, in that you get big bursts of
redistribution work at the 2**6, 2**12, 2**18... ticks, but you can
distribute that without too much added complexity by redistributing on
each tick, but only ceil(1/64th) (etc.) of the contents of the bucked
being redistributed.

Obviously the length of the clock tick and the number of buckets in
each tier can be varied (although reducing the size of the tiers
increases the number of redistributions done over the life of a
timer).

Reply by D Yuniskis ●December 7, 20102010-12-07

Hi Robert,

robertwessel2@yahoo.com wrote:
> On Dec 6, 2:18 pm, D Yuniskis <not.going.to...@seen.com> wrote:
>
>> In my latest RTOS, I have timers on every service (i.e.,
>> every service invocation can return ERROR_TIMEOUT in
>> addition to SUCCESS or FAILURE).  But, the common
>> approaches to providing timers (not just timeOUTs)
>> don't "feel right" when applied consistently like this.
>> I'm wondering if there is something I can exploit to
>> "cut a corner" in my implementation...
>>
>> E.g., allocating a "pool" of timers requires careful
>> sizing of that pool -- lest a service end up having
>> to support a NO_TIMER_AVAILABLE error code (frown).
>> OTOH, dynamic allocation (potentially) ends up with
>> *lots* of timers having to be managed (overhead).
>>
>> Consider, most timers will (presumably) NOT expire
>> in normal use so the cost of maintaining them should
>> be low -- foolish to allocate lots of resources to
>> something that you hope not to "use".
>>
>> The idea of a "Timer per Service" doesn't work because a
>> service can be multithreaded (in "reality" or in "effect").
>> I.e., that would serialize all accesses to the service.
>>
>> A "Timer per Thread" is a potential solution -- *if*
>> you also allow "general purpose" timers to coexist
>> with that "service timer".  (i.e., a thread could
>> want to start a general purpose timer, *then* invoke
>> a service with another "timeout" WITHOUT having to
>> explicitly share that "general purpose" timer with
>> the service provider).
>>
>> [N.B. this is an option but adds a lot of complexity
>> to timer management -- and would limit the number of
>> such general purpose timers that the thread could
>> create]
>>
>> [Also, note that this could complicate the implementation
>> of *layered* services]
>>
>> I could potentially trade away the ability to infer
>> *elapsed* time from the examination of a (running)
>> timer -- but, it is imperative that a timer be an
>> independant entity in its own right (e.g., so you
>> can pass the "time remaining" from one *completed*
>> service call to the *next* service called).
>>
>> Or, is this just one of those areas where there is no
>> "clever" solution?
> 
> I don't quite understand your constraints.  Is it the overhead of
> managing the very large set of timers?  The usual priority queue is
> O(log(n)) in the number of timers in inserting, deleting and firing
> timers.
> 
> An alternative, if you have a fixed tick (say 1ms), is to use a series
> of groups of exponentially increasing buckets for timers.  For
> example, have 128 buckets for timers at 1ms resolution, then 128 at
> 64ms resolution, 128 at 4096ms...  Structure each tier as a circular
> list of buckets.  Insert a new timer in the linked list for the best
> bucket (IOW, if the new timer expires in the next 128ms, insert into
> the appropriate 1ms bucket, if it expires in more than that, but less
> than 8192ms, use the 64ms buckets).  Every tick you just fire all the
> timers in the next 1ms bucket.  Every 64ms redistribute the timers in
> the next 64ms bucket into the 1ms buckets, repeat every 4096ms for the
> next bucket in the third (4096ms) tier, etc.

This is similar to my current implementation -- however, I don't
rigidly define the "intervals" in each "bucket".

E.g., currently, a thread defines a Timer, then passes that
timer to a service invocation and blocks on the result.  If
the service completes successfully, you can chose to examine
the Timer (i.e., to determine how long you have blocked), can
discard it or can pass the "time remaining" on to some subsequent
service invocation (typically some OTHER service).  If the
service failed due to a TIMEOUT, you implicitly know that the
Timer has "0" left on it.  If the service failed for other
reasons, then you address those problems.

[Note that the timer presently exists in user space!]

The RTOS keeps all "running" Timers on a queue sorted by
expiration time.  It separately manages a (hidden) list
of pointers *into* that queue to expedite insertion, etc.
The size of this "list of pointers" varies as a function
of the number of active Timers.  This lets the maximum
"work" performed for an insertion be bounded by limiting
the length of the longest (sub)queue that has to be parsed.

E.g., if there are 45 active timers, I could opt to mark
the list of 45 timers in 4 groups of ~12 timers, each -- note
that the time interval represented in each such "subqueue"
varies based on the times present *on* the timers in that
"subqueue".  If the number of timers suddenly increases to
136, these 4 groups would mean the length of each "subqueue"
has now increased to ~35 from that ~12 -- and the average
time for parsing a queue would increase threefold.  In this
case, the OS can lengthen the "hidden list"... maybe opting
for 10 groups of timers.

Maintaining these lists incrementally is relatively
inexpensive -- if a timer is deleted/inserted, at most one
timer shifts into/out of each group (i.e., the whole group
need not be reexamined/reshuffled).

> Basically that does one linked list insertion and deletion for timers
> firing in the next 128ms, 2 for those in the next 8192ms, 3 for 524s,
> etc.  A handful of tiers gets you to timers as long as you practically
> need (for the above parameters, seven tiers gets you to 278  years).

I opted for the "cut into N timers" instead of "N intervals"
as most timers will tend to be similar time intervals.  I.e.,
there won't be any in the "278 year" tier.  :>   And, "steady
state", threads will keep recreating the same timers over
and over again as they keep accessing the same *services*
over and over again, etc. [this is actually a falsehood but
the details are unimportant, here]

I store only deltas in each "timer" so scheduling the next
"timer expiration" is easy -- just look at the delta in the
timer at the head of the queue.  OTOH, finding the expiration
time (i.e., "time remaining") of the Nth timer in the queue
requires examining all of the timers expiring *before* it.
[This shows the bias towards short intervals and the fact that
most timers will be used as timeOUTs.]

> The simple design has one flaw, in that you get big bursts of
> redistribution work at the 2**6, 2**12, 2**18... ticks, but you can
> distribute that without too much added complexity by redistributing on
> each tick, but only ceil(1/64th) (etc.) of the contents of the bucked
> being redistributed.
> 
> Obviously the length of the clock tick and the number of buckets in
> each tier can be varied (although reducing the size of the tiers
> increases the number of redistributions done over the life of a
> timer).

In my scheme, you have to do a "redistribution" for each G timer
"inserts + deletes" (where G is the number of groups in the
hidden list).

Having said all that... :>

The bigger question:  is there a way of bounding the number
of timers so that I can determine implementation costs at
design time instead of run-time?  e.g., my suggestion of a
"Service Timer" (per thread) seems like this does that
(ASSUMING ALL SERVICES ARE SYNCHRONOUS!).  However, I think
it would require each service to execute on the invoking
thread's stack -- or, force constraints on the services
regarding what they can use, etc. (or, otherwise constrain
the design of the service e.g., thread per client)

[as I said before, this doesn't address the "general purpose
timers"]

Reply by Arlet Ottens ●December 7, 20102010-12-07

On Tue, 07 Dec 2010 02:53:43 -0700, D Yuniskis wrote:

> The bigger question:  is there a way of bounding the number of timers so
> that I can determine implementation costs at design time instead of
> run-time?  e.g., my suggestion of a "Service Timer" (per thread) seems
> like this does that (ASSUMING ALL SERVICES ARE SYNCHRONOUS!).  However,
> I think it would require each service to execute on the invoking
> thread's stack -- or, force constraints on the services regarding what
> they can use, etc. (or, otherwise constrain the design of the service
> e.g., thread per client)

You could make all the threads responsible for managing their own memory 
resources. Assuming that you're working in C, a timer object could be a 
struct that includes the linked-list pointers. For example:

struct timer
{
    struct timer *prev;
    struct timer *next;

    int period;
    // whatever else you need for the timer object
};

In the thread, you then have the option of allocating timers statically, 
dynamically, or even on the stack:

f( )
{
    struct timer x;

    init_timer( &x, ... );
    wait_timer( &x );
    del_timer( &x );
}

The RTOS kernel would then just use the prev, next pointers of the timer 
struct, so you can avoid any additional memory allocation in the kernel.

Reply by robe...@yahoo.com ●December 7, 20102010-12-07

On Dec 7, 3:53=A0am, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Robert,
>
>
>
>
>
> robertwess...@yahoo.com wrote:
> > On Dec 6, 2:18 pm, D Yuniskis <not.going.to...@seen.com> wrote:
>
> >> In my latest RTOS, I have timers on every service (i.e.,
> >> every service invocation can return ERROR_TIMEOUT in
> >> addition to SUCCESS or FAILURE). =A0But, the common
> >> approaches to providing timers (not just timeOUTs)
> >> don't "feel right" when applied consistently like this.
> >> I'm wondering if there is something I can exploit to
> >> "cut a corner" in my implementation...
>
> >> E.g., allocating a "pool" of timers requires careful
> >> sizing of that pool -- lest a service end up having
> >> to support a NO_TIMER_AVAILABLE error code (frown).
> >> OTOH, dynamic allocation (potentially) ends up with
> >> *lots* of timers having to be managed (overhead).
>
> >> Consider, most timers will (presumably) NOT expire
> >> in normal use so the cost of maintaining them should
> >> be low -- foolish to allocate lots of resources to
> >> something that you hope not to "use".
>
> >> The idea of a "Timer per Service" doesn't work because a
> >> service can be multithreaded (in "reality" or in "effect").
> >> I.e., that would serialize all accesses to the service.
>
> >> A "Timer per Thread" is a potential solution -- *if*
> >> you also allow "general purpose" timers to coexist
> >> with that "service timer". =A0(i.e., a thread could
> >> want to start a general purpose timer, *then* invoke
> >> a service with another "timeout" WITHOUT having to
> >> explicitly share that "general purpose" timer with
> >> the service provider).
>
> >> [N.B. this is an option but adds a lot of complexity
> >> to timer management -- and would limit the number of
> >> such general purpose timers that the thread could
> >> create]
>
> >> [Also, note that this could complicate the implementation
> >> of *layered* services]
>
> >> I could potentially trade away the ability to infer
> >> *elapsed* time from the examination of a (running)
> >> timer -- but, it is imperative that a timer be an
> >> independant entity in its own right (e.g., so you
> >> can pass the "time remaining" from one *completed*
> >> service call to the *next* service called).
>
> >> Or, is this just one of those areas where there is no
> >> "clever" solution?
>
> > I don't quite understand your constraints. =A0Is it the overhead of
> > managing the very large set of timers? =A0The usual priority queue is
> > O(log(n)) in the number of timers in inserting, deleting and firing
> > timers.
>
> > An alternative, if you have a fixed tick (say 1ms), is to use a series
> > of groups of exponentially increasing buckets for timers. =A0For
> > example, have 128 buckets for timers at 1ms resolution, then 128 at
> > 64ms resolution, 128 at 4096ms... =A0Structure each tier as a circular
> > list of buckets. =A0Insert a new timer in the linked list for the best
> > bucket (IOW, if the new timer expires in the next 128ms, insert into
> > the appropriate 1ms bucket, if it expires in more than that, but less
> > than 8192ms, use the 64ms buckets). =A0Every tick you just fire all the
> > timers in the next 1ms bucket. =A0Every 64ms redistribute the timers in
> > the next 64ms bucket into the 1ms buckets, repeat every 4096ms for the
> > next bucket in the third (4096ms) tier, etc.
>
> This is similar to my current implementation -- however, I don't
> rigidly define the "intervals" in each "bucket".
>
> E.g., currently, a thread defines a Timer, then passes that
> timer to a service invocation and blocks on the result. =A0If
> the service completes successfully, you can chose to examine
> the Timer (i.e., to determine how long you have blocked), can
> discard it or can pass the "time remaining" on to some subsequent
> service invocation (typically some OTHER service). =A0If the
> service failed due to a TIMEOUT, you implicitly know that the
> Timer has "0" left on it. =A0If the service failed for other
> reasons, then you address those problems.
>
> [Note that the timer presently exists in user space!]
>
> The RTOS keeps all "running" Timers on a queue sorted by
> expiration time. =A0It separately manages a (hidden) list
> of pointers *into* that queue to expedite insertion, etc.
> The size of this "list of pointers" varies as a function
> of the number of active Timers. =A0This lets the maximum
> "work" performed for an insertion be bounded by limiting
> the length of the longest (sub)queue that has to be parsed.
>
> E.g., if there are 45 active timers, I could opt to mark
> the list of 45 timers in 4 groups of ~12 timers, each -- note
> that the time interval represented in each such "subqueue"
> varies based on the times present *on* the timers in that
> "subqueue". =A0If the number of timers suddenly increases to
> 136, these 4 groups would mean the length of each "subqueue"
> has now increased to ~35 from that ~12 -- and the average
> time for parsing a queue would increase threefold. =A0In this
> case, the OS can lengthen the "hidden list"... maybe opting
> for 10 groups of timers.
>
> Maintaining these lists incrementally is relatively
> inexpensive -- if a timer is deleted/inserted, at most one
> timer shifts into/out of each group (i.e., the whole group
> need not be reexamined/reshuffled).
>
> > Basically that does one linked list insertion and deletion for timers
> > firing in the next 128ms, 2 for those in the next 8192ms, 3 for 524s,
> > etc. =A0A handful of tiers gets you to timers as long as you practicall=
y
> > need (for the above parameters, seven tiers gets you to 278 =A0years).
>
> I opted for the "cut into N timers" instead of "N intervals"
> as most timers will tend to be similar time intervals. =A0I.e.,
> there won't be any in the "278 year" tier. =A0:> =A0 And, "steady
> state", threads will keep recreating the same timers over
> and over again as they keep accessing the same *services*
> over and over again, etc. [this is actually a falsehood but
> the details are unimportant, here]
>
> I store only deltas in each "timer" so scheduling the next
> "timer expiration" is easy -- just look at the delta in the
> timer at the head of the queue. =A0OTOH, finding the expiration
> time (i.e., "time remaining") of the Nth timer in the queue
> requires examining all of the timers expiring *before* it.
> [This shows the bias towards short intervals and the fact that
> most timers will be used as timeOUTs.]
>
> > The simple design has one flaw, in that you get big bursts of
> > redistribution work at the 2**6, 2**12, 2**18... ticks, but you can
> > distribute that without too much added complexity by redistributing on
> > each tick, but only ceil(1/64th) (etc.) of the contents of the bucked
> > being redistributed.
>
> > Obviously the length of the clock tick and the number of buckets in
> > each tier can be varied (although reducing the size of the tiers
> > increases the number of redistributions done over the life of a
> > timer).
>
> In my scheme, you have to do a "redistribution" for each G timer
> "inserts + deletes" (where G is the number of groups in the
> hidden list).


The scheme with the fixed buckets has a couple of advantages.  First,
there's no real computation done at the tick - you just grab the
linked list of timers in the next 1ms bucket (and the list heads are a
circular structure, so there's never any allocation of new list
heads), fire them all, and you're done.  A timer that's abandoned
before being fired (as you'd expect for most timeouts) is just a
deletion from its current linked list.  Second, the cost of managing
the timers is basically fixed based on the interval set, and is
basically always a small integer number of liked list insertions and
deletions (no more than seven for the aforementioned 278 years).  The
only time you compute the time is at the initial insertion of the
timer or at one of its redistributions (FWIW, the redistributions are
marginally simpler, since you don't have to decide which tier you're
going to insert into - always the next finer one).  The number of
timers in the system does not impact the cost per timer at all.

If many of yours timer are timeouts which do not need fine resolution
and wide range, you could have a second set of buckets at (say) 1s
resolution, and enough of them to cover almost all of your timeouts,
and then you don't even do any redistributions, although they'll fire
in bigger clumps.  OTOH, if most timers do not actually fire, some
judicious picking of tier sizes can ensure the same thing - in the
above, for example, if most timers are between 8s and 262s, and are
abandoned at least 8192ms before they expire, they're never
redistributed (and one that do live long enough to fire, would be
redistributed twice).


> Having said all that... :>
>
> The bigger question: =A0is there a way of bounding the number
> of timers so that I can determine implementation costs at
> design time instead of run-time? =A0e.g., my suggestion of a
> "Service Timer" (per thread) seems like this does that
> (ASSUMING ALL SERVICES ARE SYNCHRONOUS!). =A0However, I think
> it would require each service to execute on the invoking
> thread's stack -- or, force constraints on the services
> regarding what they can use, etc. (or, otherwise constrain
> the design of the service e.g., thread per client)


You could embed a timer object in each object that might need one.
They're not necessarily very big.  A pair of linked list pointers, a
pointer to the fire routine, a word of user data and a ~64 bit time.
So ~24 bytes on a 32 bit system.  Obviously you might have timers
unrelated to a specific object, and embedding them somewhat assumes a
single address space OS.  Of course that just moves the problem
elsewhere, but presumably you=92re having to deal with the allocation of
those other objects anyway.

Reply by Marc Jet ●December 7, 20102010-12-07

In my own embedded OS, which evolved over many years and several
platforms, I found the following the most simple yet flexible
approach:

Depending on the hardware, I identify the smallest time granularity.
This could be CPU cycles for platforms with a cycle counter, or the
timer register readable state (after prescaler).  This I define my
software timer granularity, and choose a type big enough to last
"forever".  64 bits have been enough for my projects so far, and
fortunately a 64 bit "unsigned long long" is supported by most
compilers for register passing and function result.  This type I
define as "tim64".

The HAL provides functions timGet() which assembles the current time
as tim64 (sometimes just from hardware registers, sometimes through
race resolution between hardware timers and ISR software state).  Note
that this is not wallclock time, but just an ever increasing value
that starts at 0LL during power up.  There are also functions to add
and substract NS, US, MS and seconds, which implicitly contain the
conversion between hardware cycles and human readable time scales.

Based on this, I create a "polled" API.  It is basically timGet() +
timElapsed() for knowing how much time has passed since some moment.
And timSet(ms) + timIsOver() for polled loops (timSet consists of
timGet + timAdd).

For asynchronous waiting, my world consists of signal producers and
signal sensors.  A timer can produce a signal (when time is over), and
the application can sleep until its sensor receives such a signal.  My
timer signal is a small struct which combines the standard signal
sensor and the tim64 target time.  Note that this is an extra API,
layered on top of the polled one.

The API provides timSigCreate() to create a signal, in memory provided
by the application (usually an autovar).  With timSigAttach() it is
queued into a sorted list of active timers, and timSigDetach()
reclaims ownership of the memory by cancelling/removing from the list.

The hardware ISR reprograms a hardware timer to trigger when the next
queued timer signal wants to be awakened, and signals the
corresponding tasks as their events happen.  timSigAttach() and
timSigDetach() resolve the potential races, and reprogram the hardware
timer too.  Often, the hardware timer has a different granularity than
tim64, so conversion is necessary here, too.

The signal sensor stuff is common over many OS services, and a task
can wait on multiple signal sensors. This enables the timer signals to
be used as timeouts for services that are themselves asynchonous.

The philosophy of this is to push the storage burden to the
application that requests a service.  This helps scaling a lot, and
there is almost no "unjustified" overhead.

For the polled API just CPU registers or an autovar is enough.  Time
handling can often consist of just a few hardware register reads with
no or little conversion.  By not using the other API, a statically
linked image can be tiny.

On the other hand, there is absolutely no limit for asynchronous
timers.  Yet they can still be efficiently mapped into the .bss
section.  They can live as autovar in a "function with timeout", or as
member of a handle that your service provides.  There is no dynamic
memory allocation involved (unless you do), and no "out of memory"
handling.

I found that this approach is so much lighter on resources, and
outweights the disadvantages (which is the memory ownership transfer
in timSigAttach/Detach and therefore required programming discipline).

Let me know (here) when you found this useful.

Best regards

Reply by D Yuniskis ●December 9, 20102010-12-09

Hi Mark,

Marc Jet wrote:

> Depending on the hardware, I identify the smallest time granularity.
> This could be CPU cycles for platforms with a cycle counter, or the
> timer register readable state (after prescaler).  This I define my
> software timer granularity, and choose a type big enough to last
> "forever".  64 bits have been enough for my projects so far, and
> fortunately a 64 bit "unsigned long long" is supported by most
> compilers for register passing and function result.  This type I
> define as "tim64".

I *used* to use a similar approach -- though much less fine
grained (i.e., I *set* the granularity of my "time" to
the finest that I would need -- almost always in bogounits).
"Time" was always an integer (of varying size), etc.
Similarly, "forever" was simply the longest time in which
I was interested -- for "system" time (ignore wall clock
for this discussion).

In *this* application (media clients), I have no need for
"system time".  All time is timeouts and (elapsed) timers.
I.e., nothing ever happens *at* a particular time but,
rather, some time "from now" (so, the whole "set time",
"wait for time", etc. API goes away -- along with the problems
it brings).

I have several "clocks" (cf. Mach) that can be used to drive
timers.  Most are virtual clocks derived from real-time "events".
The precision of these varies as appropriate to their source.
E.g., if a DHCP client wants to wait 30 seconds for a timeout,
it *probably* doesn't care if that is 30.0000000000 seconds,
29.9099325 seconds or "31" seconds.  (i.e., that is how I apply
the various clocks).

OTOH, the audio streamer very much cares how precisely *this*
next 44KHz sample is pushed out to the D/AC.  At the same time,
it probably has no interest in anything as far distant as
"30 seconds hence".

The application is carefully structured so that individual aspects
of it need not be concerned with more than one sense of time.
This eliminates/minimizes synchronization and conversion errors.

Time is *not* an integral data type, however (despite being
implemented as an "integer").  I used fixed point representations
as they allow me to adjust the various FLL's and PLL's more
sanely (simplifies the math).  This allows me to keep the
clock(s) in question syntonous with those on other clients
while simultaneously giving me a lot of flexibility on phase
control (it's just an offset in the math).

As a result of all this, I can shrink "time"s to 32b datatypes
(I am struggling to get them down to 16 -- probably an unnecessary
optimization).  This is effectively "forever" in terms of an
of the application components' usage of them (e.g., 2G seconds?
2G milliseconds?  2G video frames?  etc.)

> The HAL provides functions timGet() which assembles the current time
> as tim64 (sometimes just from hardware registers, sometimes through
> race resolution between hardware timers and ISR software state).  Note
> that this is not wallclock time, but just an ever increasing value
> that starts at 0LL during power up.  There are also functions to add
> and substract NS, US, MS and seconds, which implicitly contain the
> conversion between hardware cycles and human readable time scales.

Again, I have no need for "human readable" time units.  Just like
an engine controller cares little about "seconds" or "time of day"
but, rather, how much BTDC it has to fire the injectors and
detonate the plug.  Bogounits prevail.

> Based on this, I create a "polled" API.  It is basically timGet() +
> timElapsed() for knowing how much time has passed since some moment.
> And timSet(ms) + timIsOver() for polled loops (timSet consists of
> timGet + timAdd).
> 
> For asynchronous waiting, my world consists of signal producers and
> signal sensors.  A timer can produce a signal (when time is over), and
> the application can sleep until its sensor receives such a signal.  My
> timer signal is a small struct which combines the standard signal
> sensor and the tim64 target time.  Note that this is an extra API,
> layered on top of the polled one.
> 
> The API provides timSigCreate() to create a signal, in memory provided
> by the application (usually an autovar).  With timSigAttach() it is
> queued into a sorted list of active timers, and timSigDetach()
> reclaims ownership of the memory by cancelling/removing from the list.
> 
> The hardware ISR reprograms a hardware timer to trigger when the next
> queued timer signal wants to be awakened, and signals the
> corresponding tasks as their events happen.  timSigAttach() and
> timSigDetach() resolve the potential races, and reprogram the hardware
> timer too.  Often, the hardware timer has a different granularity than
> tim64, so conversion is necessary here, too.

Do you put the Scheduler into the same set of signal consumers?
(that would be cool as it would let scheduling be treated as just
another "thread")

> The signal sensor stuff is common over many OS services, and a task
> can wait on multiple signal sensors. This enables the timer signals to
> be used as timeouts for services that are themselves asynchonous.
> 
> The philosophy of this is to push the storage burden to the
> application that requests a service.  This helps scaling a lot, and
> there is almost no "unjustified" overhead.

My (current) API provides a "create timer" hook.  The user
can optionally provide a pointer to a "timer struct" that
it would like used for the timer's representation.  However,
the system can ignore this -- regardless, a handle for
the timer is returned (if created).  This lets me keep the
timer "precious" if I want to (the API is common to a variety
of different RTOS's I've designed over the years -- it's just
a handy abstraction).  It also lets me change the representation
of the timer without the application being concerned (or bothered!)
by that change.

> For the polled API just CPU registers or an autovar is enough.  Time
> handling can often consist of just a few hardware register reads with
> no or little conversion.  By not using the other API, a statically
> linked image can be tiny.
> 
> On the other hand, there is absolutely no limit for asynchronous
> timers.  Yet they can still be efficiently mapped into the .bss
> section.  They can live as autovar in a "function with timeout", or as
> member of a handle that your service provides.  There is no dynamic
> memory allocation involved (unless you do), and no "out of memory"
> handling.

My timeouts live outside the scope of the function (service)
invoked.  E.g., I can do:

Timer foo;
Timer handle = Create_Timer(&foo, SOME_AMOUNT_OF_TIME, ...);

if (TIMEOUT == Get_IP_Address(&foo, ...)) {
   /* didn't get IP address in time available; bad DHCP server? */
};

if (TIMEOUT == Get_Program_Image(&foo, IPaddr, ...)) {
   /* didn't get program image in remaining time available; die */
}

if (TIMEOUT == Initialize_System(&foo, ...)) {
   /* resources not available in time remaining; gasp! */
}

log("startup completed with %d remaining", Get_Remaining(&foo));

If I wrap this in a routine (e.g., "System_Startup"), then
foo could be autovar (if the RTOS complies) -- assuming the
rest of the system cares nothing about foo outside of this
scope.  Furthermore, if System_Startup had some tear-down
times associated with it and was, itself, constrained by
an imported timer, then it could deduct those times from
the timer prior to invoking the constituent services so that
a failure would leave it with sufficient time remaining
to complete its tear-down actions and still return to its
caller within the prescribed "time limit".

[this is intended as an alternative to supporting deadline
handlers in the RTOS itself]

> I found that this approach is so much lighter on resources, and
> outweights the disadvantages (which is the memory ownership transfer
> in timSigAttach/Detach and therefore required programming discipline).
> 
> Let me know (here) when you found this useful.

It still seems to avoid the (to me) obvious chances for 
optimization/exploitation -- e.g., these timeouts aren't
"general purpose timers" that can be used to initiate
events.  They are *always* used in the same way -- to
unblock a thread waiting on some service/resource.
If all services are synchronous, then a thread can ONLY
be blocking on one at a time.  I.e., there is only need for
one "service timer" per thread.  And, that the use of that
timer is preordained.

I think I'll see what the cost of including it as part of
the thread state and an implicit part of the scheduler gives
me in terms of performance and cost.

Reply by D Yuniskis ●December 9, 20102010-12-09

Hi Arlet,

Arlet Ottens wrote:
> On Tue, 07 Dec 2010 02:53:43 -0700, D Yuniskis wrote:
> 
>> The bigger question:  is there a way of bounding the number of timers so
>> that I can determine implementation costs at design time instead of
>> run-time?  e.g., my suggestion of a "Service Timer" (per thread) seems
>> like this does that (ASSUMING ALL SERVICES ARE SYNCHRONOUS!).  However,
>> I think it would require each service to execute on the invoking
>> thread's stack -- or, force constraints on the services regarding what
>> they can use, etc. (or, otherwise constrain the design of the service
>> e.g., thread per client)
> 
> You could make all the threads responsible for managing their own memory 
> resources. Assuming that you're working in C, a timer object could be a 
> struct that includes the linked-list pointers. For example:
> 
> struct timer
> {
>     struct timer *prev;
>     struct timer *next;
>     
>     int period;
>     // whatever else you need for the timer object
> };
> 
> In the thread, you then have the option of allocating timers statically, 
> dynamically, or even on the stack:
> 
> f( )
> {
>     struct timer x;
> 
>     init_timer( &x, ... );
>     wait_timer( &x );
>     del_timer( &x );
> }
> 
> The RTOS kernel would then just use the prev, next pointers of the timer 
> struct, so you can avoid any additional memory allocation in the kernel.

Yes, see the description of my "Create_Timer" API (in a reply to
Mark Jet).  I allow threads to *try* to manage the memory
associated with the timer(s).  (the RTOS reserves the right
of managing them internally, if it chooses).  But, doing so,
*implicitly* makes all timers the same (i.e., it precludes
any optimization based on the narrowly defined use expected
of these "service timers").  It's "just another timer" that
never uses the full capabilities of a "general purpose timer".

I'm looking for "tricks" that are inherent in this particular
type of usage that can be exploited for performance/space gains.
E.g., I'm competing with using deadline handlers *in* the RTOS.
Obviously, I am hoping to come up with a "cheaper" solution.
(for some definition of "cheap").

Reply by D Yuniskis ●December 9, 20102010-12-09

Hi Robert,

robertwessel2@yahoo.com wrote:
> On Dec 7, 3:53 am, D Yuniskis <not.going.to...@seen.com> wrote:

[chomp (i.e., "big snip"  :> )]

> The scheme with the fixed buckets has a couple of advantages.  First,
> there's no real computation done at the tick - you just grab the
> linked list of timers in the next 1ms bucket (and the list heads are a
> circular structure, so there's never any allocation of new list
> heads), fire them all, and you're done.  A timer that's abandoned
> before being fired (as you'd expect for most timeouts) is just a
> deletion from its current linked list.  Second, the cost of managing
> the timers is basically fixed based on the interval set, and is
> basically always a small integer number of liked list insertions and
> deletions (no more than seven for the aforementioned 278 years).  The

(sigh)  I don't think I'm following you.  :-/

Hopefully restating your environment:
- the tick happens at 1ms intervals
- all timers for a given "duration" (blech) are linked into a
   "duration-specific" list (i.e., any timer set for 3ms is
   linked into the "3ms list" -- this is just an efficiency
   hack to make it easy to grab *all* of the timers expiring
   at a given "instant"/tick, right?)  You call each such
   list a "bucket" (??)
- these lists are grouped into "tiers":  the first tier
   spans the interval (0 - 128ms) and contains 128 lists
   (i.e., a list of 1ms timers, a list of 2ms timers, a list
   of 3ms timers, etc.); the second tier spans the interval
   (128 - 8192ms) and also contains 128 lists (i.e., the
   list of timers expiring in 128-191ms, the list of 192-235ms,
   etc.); the third tier spans (8192- ....
   [I hope my arithmetic is correct at this hour of the morning]
- within a list, the individual timers are sorted by duration.
   For the lists in the finest grained tier, no sorting is necessary
   as each timer has the *same* duration (i.e., N ms).  For higher
   tiers, the durations may vary due to a coarser grading (e.g.,
   the 128ms list contains timers expiring in 128, 129, 130... 191ms)
   [sorting could be donw at insertion or postponed to "redistribution"]

Is this roughly correct?  (I haven't checked the actual tiers; your
values change between, e.g., 8192 and 4096 later)

> only time you compute the time is at the initial insertion of the
> timer or at one of its redistributions (FWIW, the redistributions are
> marginally simpler, since you don't have to decide which tier you're
> going to insert into - always the next finer one).  The number of
> timers in the system does not impact the cost per timer at all.

It seems like all the "tiers" buys you is convenient places to
peek into what would otherwise be a single (sorted) list of
timer expirations (?).  If so, it loses its appeal when the
range of timeout *values* is reduced -- yet the number of
timers remains high.

E.g., if I'm seeing *lots* of timers at a *few* different "values",
then those might all end up in a single "tier".  Or, a "few" tiers.
So, the improvement the tiers give you is marginal compared to just
having timers in "identical valued" lists.  (?)

I, for example, just have a single list with "markers" at
particular points *into* the list (to cut search time).
For expiration, I just pull timers off the head of the list
having "0" time on them until I find one that has a nonzero
(positive) "delta time".  That time defines the next time that
the "timers" will need to be examined (i.e., nothing expires
now and that "delta time").

If all equal valued timers were placed in explicit lists,
then I wouldn't have to examine subsequent timers in my
*single* list -- I could just pull *the* list of "expiring
timers" off the head of the queue.

But, I would then need to walk through *that* list so
I still have to process each individual timer...  :-/

> If many of yours timer are timeouts which do not need fine resolution
> and wide range, you could have a second set of buckets at (say) 1s

Timers are finer grained than that.  I'm processing audio/video
so ms and fractional ms are typical event times (though I am
processing things in large bunches, etc.).  "seconds" are
"an eternity"  :>

> resolution, and enough of them to cover almost all of your timeouts,
> and then you don't even do any redistributions, although they'll fire
> in bigger clumps.  OTOH, if most timers do not actually fire, some
> judicious picking of tier sizes can ensure the same thing - in the
> above, for example, if most timers are between 8s and 262s, and are
> abandoned at least 8192ms before they expire, they're never
> redistributed (and one that do live long enough to fire, would be
> redistributed twice).
> 
> 
>> Having said all that... :>
>>
>> The bigger question:  is there a way of bounding the number
>> of timers so that I can determine implementation costs at
>> design time instead of run-time?  e.g., my suggestion of a
>> "Service Timer" (per thread) seems like this does that
>> (ASSUMING ALL SERVICES ARE SYNCHRONOUS!).  However, I think
>> it would require each service to execute on the invoking
>> thread's stack -- or, force constraints on the services
>> regarding what they can use, etc. (or, otherwise constrain
>> the design of the service e.g., thread per client)
> 
> You could embed a timer object in each object that might need one.

Yes, that's my idea of creating a "service timer" for each thread.
With synchronous services, you would only ever need *one* such
service timer.  And, it could "belong" to that thread.  Since
you *know* how it will be used, you could exploit that knowledge
in your design of the scheduler, etc. instead of treating it as
"yet another timer".

> They're not necessarily very big.  A pair of linked list pointers, a

Size, as always, is relative.  :>  I am trying to prune this
application down to "nothing" to make it's application ubiquitous.
E.g., we've gone from "speakers" to "powered speakers"... I
want to go to "smart speaker" (plug ethernet cable in back
and you're listening to "whatever").  Ditto for video.

> pointer to the fire routine, a word of user data and a ~64 bit time.
> So ~24 bytes on a 32 bit system.  Obviously you might have timers
> unrelated to a specific object, and embedding them somewhat assumes a
> single address space OS.  Of course that just moves the problem
> elsewhere, but presumably you&#4294967295;re having to deal with the allocation of
> those other objects anyway.

Reply by robe...@yahoo.com ●December 9, 20102010-12-09

On Dec 9, 2:41=A0pm, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Robert,
>
> robertwess...@yahoo.com wrote:
> > On Dec 7, 3:53 am, D Yuniskis <not.going.to...@seen.com> wrote:
>
> [chomp (i.e., "big snip" =A0:> )]
>
> > The scheme with the fixed buckets has a couple of advantages. =A0First,
> > there's no real computation done at the tick - you just grab the
> > linked list of timers in the next 1ms bucket (and the list heads are a
> > circular structure, so there's never any allocation of new list
> > heads), fire them all, and you're done. =A0A timer that's abandoned
> > before being fired (as you'd expect for most timeouts) is just a
> > deletion from its current linked list. =A0Second, the cost of managing
> > the timers is basically fixed based on the interval set, and is
> > basically always a small integer number of liked list insertions and
> > deletions (no more than seven for the aforementioned 278 years). =A0The
>
> (sigh) =A0I don't think I'm following you. =A0:-/
>
> Hopefully restating your environment:
> - the tick happens at 1ms intervals
> - all timers for a given "duration" (blech) are linked into a
> =A0 =A0"duration-specific" list (i.e., any timer set for 3ms is
> =A0 =A0linked into the "3ms list" -- this is just an efficiency
> =A0 =A0hack to make it easy to grab *all* of the timers expiring
> =A0 =A0at a given "instant"/tick, right?) =A0You call each such
> =A0 =A0list a "bucket" (??)
> - these lists are grouped into "tiers": =A0the first tier
> =A0 =A0spans the interval (0 - 128ms) and contains 128 lists
> =A0 =A0(i.e., a list of 1ms timers, a list of 2ms timers, a list
> =A0 =A0of 3ms timers, etc.); the second tier spans the interval
> =A0 =A0(128 - 8192ms) and also contains 128 lists (i.e., the
> =A0 =A0list of timers expiring in 128-191ms, the list of 192-235ms,
> =A0 =A0etc.); the third tier spans (8192- ....
> =A0 =A0[I hope my arithmetic is correct at this hour of the morning]
> - within a list, the individual timers are sorted by duration.
> =A0 =A0For the lists in the finest grained tier, no sorting is necessary
> =A0 =A0as each timer has the *same* duration (i.e., N ms). =A0For higher
> =A0 =A0tiers, the durations may vary due to a coarser grading (e.g.,
> =A0 =A0the 128ms list contains timers expiring in 128, 129, 130... 191ms)
> =A0 =A0[sorting could be donw at insertion or postponed to "redistributio=
n"]
>
> Is this roughly correct? =A0(I haven't checked the actual tiers; your
> values change between, e.g., 8192 and 4096 later)
>
> > only time you compute the time is at the initial insertion of the
> > timer or at one of its redistributions (FWIW, the redistributions are
> > marginally simpler, since you don't have to decide which tier you're
> > going to insert into - always the next finer one). =A0The number of
> > timers in the system does not impact the cost per timer at all.
>
> It seems like all the "tiers" buys you is convenient places to
> peek into what would otherwise be a single (sorted) list of
> timer expirations (?). =A0If so, it loses its appeal when the
> range of timeout *values* is reduced -- yet the number of
> timers remains high.
>
> E.g., if I'm seeing *lots* of timers at a *few* different "values",
> then those might all end up in a single "tier". =A0Or, a "few" tiers.
> So, the improvement the tiers give you is marginal compared to just
> having timers in "identical valued" lists. =A0(?)
>
> I, for example, just have a single list with "markers" at
> particular points *into* the list (to cut search time).
> For expiration, I just pull timers off the head of the list
> having "0" time on them until I find one that has a nonzero
> (positive) "delta time". =A0That time defines the next time that
> the "timers" will need to be examined (i.e., nothing expires
> now and that "delta time").
>
> If all equal valued timers were placed in explicit lists,
> then I wouldn't have to examine subsequent timers in my
> *single* list -- I could just pull *the* list of "expiring
> timers" off the head of the queue.
>
> But, I would then need to walk through *that* list so
> I still have to process each individual timer... =A0:-/

No.  Each bucket contains timer expiring at a certain time, nothing to
do with their durations.

So the first tier is a set of 128 buckets, the first of which holds
timers expiring 1ms from now (IOW, at the next tick), the second 2ms
from now (second tick), etc.  You'd obviously implement that as a
circular structure, and inserting a timer with an expiration 5ms in
the future would add that timer to the linked list in the fifth bucket
from the current start.  Then at each tick you just grab the next
bucket, dispatch all those timers, and step to the next position in
the circular structure.

The obvious problem with that is that in general it's impractical to
have enough buckets to cover all the possible timer expirations (you=92d
need 3.6 million buckets to cover an hour).  So instead you have
addition tiers of buckets at exponentially lower resolutions.  So if
your adding a timer firing in 500ms, which doesn't fit in the 128 1ms
buckets of the first tier, you add it to the second tier, in the
seventh bucket.  The second tier is composed of 64ms buckets, the
first of which contains timers that will expire at least 64ms from
now.  Periodically (every 64ms), you take the timers in the first
bucket in the second tier (which will be expiring in 64-127ms), and
redistribute them in to the first (1ms) tier, from where they
eventually get dispatched.  The second tier is also a circular
structure, and gets stepped around after the redistribution.

Repeat with additional coarser grained tiers as needed.

So to set a timer expiring 10,000ms from now, you'd add it to the
second bucket of the third tier.  Somewhere between 4096 and 8191
ticks before that timer is due to fire, the first bucket in the third
tier (now containing that timer) will be redistributed to the second
tier.  Let's say the redistribution happens 5000ms before expiration,
then the timer would end up in the 78th bucket in the second tier.
Later, somewhere between 64 and 127ms before firing, the first bucket
in the second tier (which now contains the timer in question) gets
redistributed to the first tier.  Assuming my arithmetic is correct,
and the redistributions are synced on the 64th/4096th/etc. ticks, the
timer would be moved to the 72nd bucket in the first tier.  72 ticks
later, the timer would be dispatched.

As each "bucket" is just the head of a doubly linked list, adding or
removing a timer from a bucket is a trivial linked list operation.  In
the case of the initial insertion, you need to determine in which tier
to initially place the timer (timer expires in less than 128ms,
8192ms, 524,288ms=85), and then it's a subtract and divide (by a power
of 64, in this example) to select the bucket into which the timer is
inserted.  The redistribution between tiers is even simpler, you just
run through the linked list of timers that need to be redistributed
(from the first bucket in the tier), and you just insert each of them
in the appropriate bucket in the next finer tier.

So the total processing of a timer consists of the initial insertion,
zero or more redistributions (based on the log64() of how far in the
future the timer expires), and the final dispatch.  All of which are
simple linked list insertions or just a step through a linked list.
The total storage required is several tiers of 128 buckets, the
buckets basically being just the list heads.  And as I mentioned,
seven tiers gets you a couple of centuries of timer.  Plus, of course,
the timers themselves.

Previous12 Next

"Efficient" timer implementations

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group