A timer driver for Cortex-M0+... it rarely doesn't work| page 2

Reply by David Brown ●April 27, 20172017-04-27

On 27/04/17 10:02, Dimiter_Popoff wrote:
> On 27.4.2017 &#1075;. 01:55, pozz wrote:

>> I suspect there's a problem in my low-level driver and sometimes, maybe
>> near the overflow, the code doesn't work as I expect.
>> Maybe the TimerSet() function sometimes sets a wrong value to the
>> uint64_t timer, maybe a tick value that will happen only at the next
>> overflow of the hw counter.
>>
>> Do you see where is the problem?
>>
> 
> Hoping that the compiler will know how to fix this is not a very
> efficient approach.
> It is just a few lines of assembly, no matter how inconvenient the ARM
> assembly may be it will still save you weeks (months) of blind trial
> and error.
> You just need to _know_ how this is done, typing blindly this or that
> is unlikely to ever work.
> 

Why would you think it is easier to write this in assembly than C?  The
issue is to be sure that the /algorithm/ is entirely safe regardless of
any ordering of interrupts, interrupt disables, timer overflows, etc.
Exactly the same issues are relevant in assembly.

Another poster suggested fiddling with optimisation levels, which is
equally odd here.  There are enough "volatile" accesses here to ensure
that the order of accesses in the C code will be kept when the code is
compiled to assembly - writing it in assembly or changing optimisation
levels will not affect that.  At most, it will change the timing -
making the problem appear more or less often.  The problem is to ensure
that the ordering of actions specified by the programmer and the
algorithm is correct - not that it is being compiled as expected.

Reply by pozz ●April 27, 20172017-04-27

Il 27/04/2017 10:01, David Brown ha scritto:

>> #define volatileAccess(v) *((volatile typeof((v)) *) &(v))
>
> That looks familiar :-)

;-)


>> static inline uint64_t ticks(void) {
>>   uint32_t h1 = volatileAccess(_ticks_high);
>>   TC0->COUNT32.CTRLBSET.reg =
>>        TC_CTRLBSET_CMD(TC_CTRLBSET_CMD_READSYNC_Val);
>>   uint32_t l1 = TC0->COUNT32.COUNT.reg;
>>   uint32_t h2 = volatileAccess(_ticks_high);
>>   if (h2 != h1) return ((uint64_t)h2 << 32) + 0;
>>   else          return ((uint64_t)h1 << 32) + l1;
>> }
>
> (I don't know what the CTRLBSET stuff is for - I am not familiar with
> Atmel's chips here.)

Atmel says you *need* to write CTRLB register before reading COUNT 
register from TC0 peripheral.  I don't know why exactly, but if you 
don't, you can't read the correct COUNT value.


> If the low part of the counter rolls over and the interrupt has not run
> (maybe interrupts are disabled, or you are already in a higher priority
> interrupt, or interrupts take a few cycles to work through the system)
> then this will be wrong - l1 will have rolled over, but h2 will not show
> an updated value yet.

Yes, you are right and I thought about this possibility... but IMHO it's 
impossibile.

ticks() are called only by TimerSet() and TimerExpired() and those two 
functions are called only in normal background (not interrupt) code.
This means ticks() always runs in a lower priority than TC0 ISR. 
Moreover, I never disable interrupts in any part of the code.

I don't understand what do you mean with "[...] or interrupts take a few 
cycles to work through the system". Is it possible to read a rolled-over 
hw counter value (0x00000003) and a not incremented _ticks_high?

    uint32_t l1 = TC0->COUNT32.COUNT.reg;
    uint32_t h2 = volatileAccess(_ticks_high);

COUNT register is read before _ticks_high second read. If COUNT has 
rolled over and the rolled value is in l1, the interrupt was fired for 
sure... IMHO.  So h2 should contain the incremented value.


> So if the high parts don't match, you need to re-read the low part and
> re-check the high parts.  It is perhaps easiest to express in a loop:
>
> static inline uint64_t ticks(void) {
> 	uint32_t h1 = volatileAccess(_ticks_high);
> 	while (true) {
> 		uint32_t l1 = TC0->COUNT32.COUNT.reg;
> 		uint32_t h2 = volatileAccess(_ticks_high);
> 		if (h1 == h2) return ((uint64_t) h2 << 32) | l1;
> 		h1 = h2;
> 	}
> }
>
> You can also reasonably note that this loop is not going to be re-run
> more than once, unless you have interrupt functions that last for an
> hour and a half, or something similarly bad - and then the failure of
> the ticks() function is the least of your problems!  Then you can write:
>
> static inline uint64_t ticks(void) {
> 	uint32_t h1 = volatileAccess(_ticks_high);
> 	uint32_t l1 = TC0->COUNT32.COUNT.reg;
> 	uint32_t h2 = volatileAccess(_ticks_high);
> 	if (h1 != h2) l1 = TC0->COUNT32.COUNT.reg;
> 	return ((uint64_t) h2 << 32) | l1;
> }

This doesn't solve the problem you described.

_ticks_high = 1
90    <- h1 = 1
91
...
99
0     <- read l1
1
2     <- h2 = 1 (interrupt has not fired, your assumption)
3

ticks() could be {1,0} that is completely wrong.


>> static inline void TimerSet(Timer *tmr, uint64_t delay) {
>>   /* delay is in ms */
>>   *tmr = ticks() + delay * TIMER_FREQ / 1000;
>> }
>>
>> static inline int TimerExpired(Timer *tmr) {
>>     return ticks() >= *tmr;
>> }
>
> When you want a boolean result, return "bool", not "int".  (This is not
> your problem, of course - it's just good habit.)
>
> An alternative method here is to just use the low 32-bit part (with
> delays limited to 2^31 ticks), but be sure that you have considered
> wraparound.  Keep everything in unsigned at first, to make overflows
> work as modulo arithmetic:
>
> static uint32_t ticks32(void) {
> 	return TC0->COUNT32.COUNT.reg;
> }
>
> static inline void TimerSet32(Timer32 *tmr, uint32_t delay) {
> 	*tmr = ticks32() + delay * TIMER_FREQ / 1000;
> }
>
> static inline bool TimerExpired32(Timer32 *tmr) {
> 	int32_t d = ticks32() - *tmr;
> 	return (d >= 0);
> }
>
> Note that doing the subtraction as unsigned, then converting to signed
> int, gives you the wraparound behaviour you need.  Converting an
> unsigned int to a signed int when the result is out of range is
> implementation dependent (&#4294967295;6.3.1.3), but gcc and many other compilers
> define it as modulo arithmetic:
> <https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html>

Yes, I know this method and I used it many times in the past. However it 
has two drawbacks:
- the maximum delay is 2^31 (in my case, around 40 minutes, but I have
   delays of 1h)
- many times you need an additional "timer running/active" flag

The second point is more important. If you want to switch on a LED after 
10 minutes when a button is pressed:

bool tmr_led_active;
Timer32 tmr_led;
void button_pressed_callback(void) {
    TimerSet32(&tmr_led, 10 * 60 * 1000);
    tmr_led_active = true;
}
void main_loop(void) {
   ...
   if (tmr_led_active && TimerExpired32(&tmr_led)) {
     switch_on_led();
     tmr_led_active = false;
   }
}

If you don't check the timer flag, the led will switch on at random times.


>> --- End of timer.h ---
>>
>> In timer.c I have the ISR:
>> --- Start of timer.c ---
>> ...
>> void TC0_Handler(void) {
>>   if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) {
>>     ++_ticks_high;
>>     TC0->COUNT32.INTFLAG.reg = TC_INTFLAG_OVF;
>>   }
>> }
>> ...
>> --- End of timer.c ---
>>
>> The idea is simple (and stolen from a post appeared on this newsgroup).
>
> "Good artists copy, great artists steal", as someone famous once said.
>
>> At first, the 64-bits software counter must be calculated disabling
>> interrupts, because if a timer interrupt triggers during calculation,
>> the overall software counter could be wrong.
>> By reading _ticks_high before and after reading hw counter, we can avoid
>> to disable the interrupts.
>>
>>
>> Now I have a code that uses timers. It's a simple state-machine that
>> manages communication over a bus.
>> --- Start of bus.c ---
>> ...
>> int bus_task(void) {
>>   switch(bus.state) {
>>     case BUS_IDLE:
>>       if (TimerExpired(&bus.tmr_answer)) {
>>         /* Send new request on the bus */
>>         ...
>>         TimerSet(&bus.tmr_answer, timeout_answer);
>>         bus.state = BUS_WAITING_ANSWER;
>>       }
>>       break;
>>
>>     case BUS_WAITING_ANSWER:
>>       if (TimerExpired(&bus.tmr_answer)) {
>>         /* No reply */
>>         bus.state = BUS_IDLE;
>>         TimerSet(&bus.tmr_answer, 0);
>>       } else {
>>         if (reply_received() == true)    {
>>           /* Analyze the reply */
>>           bus.state = BUS_IDLE;
>>           TimerSet(&bus.tmr_answer, 0);
>>         }
>>       }
>>       break;
>>   }
>>   return 0;
>> }
>> ...
>> --- End of bus.c ---
>>
>> I don't think I need to explain the code in bus.c. The only thing to
>> specify is that bus_task() is called continuously in the main loop.
>>
>> 99% of the time this code works well. Unfortunately I have seen some
>> strange events. Rarely (very rarely, one time in a week) the bus seems
>> frozen for a time. After that it restarts the normal activity magically.
>> There's a thing that relates those strange events to driver of timers:
>> the bus stall time lasts exactly 1h 21', the overflow period of hw counter.
>
> It sounds like you are missing a tick interrupt - maybe you are somehow
> blocking the overflow interrupt on occasion.  Maybe the CTRLBSET line is
> the problem (since I don't know what it does!)  But it could be the
> problem I noted above in ticks().
>
> Diagnosing problems like this are a real pain - and demonstrating that
> you have fixed them is even worse.  The key is to find some way to speed
> up the hardware timer so that you can provoke the problem regularly -
> then you can be sure when you have fixed it.  Are you able to make this
> timer overflow at a lower bit count - say, 12 bits rather than 32 bits?
>  If not, then try this for your hardware interrupt function:
>
> void TC0_Handler(void) {
> 	if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) {
> 		++_ticks_high;
> 		TC0->COUNT32.INTFLAG.reg = TC_INTFLAG_OVF;
> 		TC0->COUNT32.COUNT.reg = 0xfffff000;
> 	}
> }
>
> That will force overflows and _ticks_high counts to run much faster.

Yes, I will try your ideas.

Reply by pozz ●April 27, 20172017-04-27

Il 27/04/2017 10:27, David Brown ha scritto:
 >> [...]
>> static inline uint64_t ticks(void) {
>> 	uint32_t h1 = volatileAccess(_ticks_high);
>> 	uint32_t l1 = TC0->COUNT32.COUNT.reg;
>> 	uint32_t h2 = volatileAccess(_ticks_high);
>> 	if (h1 != h2) l1 = TC0->COUNT32.COUNT.reg;
>> 	return ((uint64_t) h2 << 32) | l1;
>> }
>>
>
> Looking over these again, I can see that they will have the same
> potential problem.  They are fine for when the high part of the counter
> is also in hardware - but /not/ if it is dependent on interrupts which
> could be delayed.  If you have a situation like this:
>
> _ticks_high is 1
> timer reg is 0xffff'fff0
> interrupts are disabled
> timer reg rolls over, and is now 0x0000'0003
> the ticks() function will return 0x0000'0001'0000'0003 instead of
> 0x0000'0002'0000'0003
>
> You will need to check the hardware overflow flag to make this work.

Yes, but this situation it's impossible in my case.  Who could disable 
interrupts here?


>
> static inline uint64_t ticks(void) {
> 	uint32_t h1 = volatileAccess(_ticks_high);
> 	uint32_t l1 = TC0->COUNT32.COUNT.reg;
> 	uint32_t h2 = volatileAccess(_ticks_high);
> 	if (h1 != h2) {
> 		// We just had an interrupt, so we know there is
> 		// no pending overflow
> 		l1 = TC0->COUNT32.COUNT.reg;
> 		// Or l1 = 0 for slightly faster code
> 		return ((uint64_t) h2 << 32) | l1;
> 	}
> 	if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) {
> 		// There has been an overflow, and it was not handled
> 		l1 = TC0->COUNT32.COUNT.reg;
> 		return ((uint64_t) (h2 + 1) << 32) | l1;
> 	}
> 	// If there has been an overflow or an interrupt, it happened
> 	// after the first reads - so these are consistent and safe
> 	return ((uint64_t) h1 << 32) | l1;
> }
>
> If possible, change to the 32 bit version :-)  Failing that, if your
> chip supports chaining of timers in hardware, use that.

Atmel SAM TCx peripherals are 16-bits counters/timers, but they can be 
chained in a couple to have a 32-bits counter/timer. I already coupled 
TC0 with TC1 to have a 32-bits hw counter. I can't chain TC0/TC1 with 
TC2/TC3 to have a hardware 64-bits counter/timer.


> Thanks for this thread, Pozz - it's a good question, makes people think,
> and can hopefully help other developers.

Yes, those problems are fascinating :-)

Reply by David Brown ●April 27, 20172017-04-27

On 27/04/17 11:38, pozz wrote:
> Il 27/04/2017 10:01, David Brown ha scritto:
> 
>>> #define volatileAccess(v) *((volatile typeof((v)) *) &(v))
>>
>> That looks familiar :-)
> 
> ;-)
> 
> 
>>> static inline uint64_t ticks(void) {
>>>   uint32_t h1 = volatileAccess(_ticks_high);
>>>   TC0->COUNT32.CTRLBSET.reg =
>>>        TC_CTRLBSET_CMD(TC_CTRLBSET_CMD_READSYNC_Val);
>>>   uint32_t l1 = TC0->COUNT32.COUNT.reg;
>>>   uint32_t h2 = volatileAccess(_ticks_high);
>>>   if (h2 != h1) return ((uint64_t)h2 << 32) + 0;
>>>   else          return ((uint64_t)h1 << 32) + l1;
>>> }
>>
>> (I don't know what the CTRLBSET stuff is for - I am not familiar with
>> Atmel's chips here.)
> 
> Atmel says you *need* to write CTRLB register before reading COUNT
> register from TC0 peripheral.  I don't know why exactly, but if you
> don't, you can't read the correct COUNT value.
> 
> 
>> If the low part of the counter rolls over and the interrupt has not run
>> (maybe interrupts are disabled, or you are already in a higher priority
>> interrupt, or interrupts take a few cycles to work through the system)
>> then this will be wrong - l1 will have rolled over, but h2 will not show
>> an updated value yet.
> 
> Yes, you are right and I thought about this possibility... but IMHO it's
> impossibile.

You are sure you are not calling ticks() from within another interrupt
function?

> 
> ticks() are called only by TimerSet() and TimerExpired() and those two
> functions are called only in normal background (not interrupt) code.
> This means ticks() always runs in a lower priority than TC0 ISR.
> Moreover, I never disable interrupts in any part of the code.
> 
> I don't understand what do you mean with "[...] or interrupts take a few
> cycles to work through the system". Is it possible to read a rolled-over
> hw counter value (0x00000003) and a not incremented _ticks_high?

There is always a delay between an overflow in the timer, and the actual
interrupt function being started.  Depending on details of the chip, it
is possible there will be several cpu cycles of the current instruction
stream executed before the interrupt function is run.  If I remember
rightly, the M3/M4 has a 5 stage pipeline.  When the interrupt is
registered by the core, it effectively means that a "jump to interrupt
vector" instruction is squeezed into the instruction stream - but these
current 5 instructions must be completed before the interrupt call takes
effect.  You will also have a cycle or two delay in the NVIC, and
perhaps a cycle or two delay getting the signal out of the timer block
(especially if the timer does not run at full core speed).  Cortex M
interrupts are handled quickly - but not immediately.

> 
>    uint32_t l1 = TC0->COUNT32.COUNT.reg;
>    uint32_t h2 = volatileAccess(_ticks_high);
> 
> COUNT register is read before _ticks_high second read. If COUNT has
> rolled over and the rolled value is in l1, the interrupt was fired for
> sure... IMHO.  So h2 should contain the incremented value.
> 
> 
>> So if the high parts don't match, you need to re-read the low part and
>> re-check the high parts.  It is perhaps easiest to express in a loop:
>>
>> static inline uint64_t ticks(void) {
>>     uint32_t h1 = volatileAccess(_ticks_high);
>>     while (true) {
>>         uint32_t l1 = TC0->COUNT32.COUNT.reg;
>>         uint32_t h2 = volatileAccess(_ticks_high);
>>         if (h1 == h2) return ((uint64_t) h2 << 32) | l1;
>>         h1 = h2;
>>     }
>> }
>>
>> You can also reasonably note that this loop is not going to be re-run
>> more than once, unless you have interrupt functions that last for an
>> hour and a half, or something similarly bad - and then the failure of
>> the ticks() function is the least of your problems!  Then you can write:
>>
>> static inline uint64_t ticks(void) {
>>     uint32_t h1 = volatileAccess(_ticks_high);
>>     uint32_t l1 = TC0->COUNT32.COUNT.reg;
>>     uint32_t h2 = volatileAccess(_ticks_high);
>>     if (h1 != h2) l1 = TC0->COUNT32.COUNT.reg;
>>     return ((uint64_t) h2 << 32) | l1;
>> }
> 
> This doesn't solve the problem you described.
> 

Yes - hence my follow-up post.

> _ticks_high = 1
> 90    <- h1 = 1
> 91
> ...
> 99
> 0     <- read l1
> 1
> 2     <- h2 = 1 (interrupt has not fired, your assumption)
> 3
> 
> ticks() could be {1,0} that is completely wrong.
> 
> 
>>> static inline void TimerSet(Timer *tmr, uint64_t delay) {
>>>   /* delay is in ms */
>>>   *tmr = ticks() + delay * TIMER_FREQ / 1000;
>>> }
>>>
>>> static inline int TimerExpired(Timer *tmr) {
>>>     return ticks() >= *tmr;
>>> }
>>
>> When you want a boolean result, return "bool", not "int".  (This is not
>> your problem, of course - it's just good habit.)
>>
>> An alternative method here is to just use the low 32-bit part (with
>> delays limited to 2^31 ticks), but be sure that you have considered
>> wraparound.  Keep everything in unsigned at first, to make overflows
>> work as modulo arithmetic:
>>
>> static uint32_t ticks32(void) {
>>     return TC0->COUNT32.COUNT.reg;
>> }
>>
>> static inline void TimerSet32(Timer32 *tmr, uint32_t delay) {
>>     *tmr = ticks32() + delay * TIMER_FREQ / 1000;
>> }
>>
>> static inline bool TimerExpired32(Timer32 *tmr) {
>>     int32_t d = ticks32() - *tmr;
>>     return (d >= 0);
>> }
>>
>> Note that doing the subtraction as unsigned, then converting to signed
>> int, gives you the wraparound behaviour you need.  Converting an
>> unsigned int to a signed int when the result is out of range is
>> implementation dependent (&#4294967295;6.3.1.3), but gcc and many other compilers
>> define it as modulo arithmetic:
>> <https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html>
> 
> Yes, I know this method and I used it many times in the past. However it
> has two drawbacks:
> - the maximum delay is 2^31 (in my case, around 40 minutes, but I have
>   delays of 1h)

In such cases, you rarely need the same level of accuracy.  Use your
accurate ticks32() counter to let you track a seconds counter in the
main loop of your code - this can be read freely without worrying about
synchronisation.

Another method is to make the timer overflow interrupt occur more often
- say, every millisecond.  This increments a single millisecond counter
(either 32-bit, or 64-bit, or split lo/hi 32-bit - whatever suits).
Then reading that is easy, because the change to this counter is always
done in the same context as an atomic operation - there is no
complication of independent updates of the low and high halves.  And
most of the time, it is enough to just use the low 32-bit part.


> - many times you need an additional "timer running/active" flag
> 
> The second point is more important. If you want to switch on a LED after
> 10 minutes when a button is pressed:
> 
> bool tmr_led_active;
> Timer32 tmr_led;
> void button_pressed_callback(void) {
>    TimerSet32(&tmr_led, 10 * 60 * 1000);
>    tmr_led_active = true;
> }
> void main_loop(void) {
>   ...
>   if (tmr_led_active && TimerExpired32(&tmr_led)) {
>     switch_on_led();
>     tmr_led_active = false;
>   }
> }
> 
> If you don't check the timer flag, the led will switch on at random times.

So you have to add a flag - big deal.  You are not using any more
memory, your code is smaller, and it is simpler to be sure that
everything is correct.

<snip>
> 
> Yes, I will try your ideas.

Good luck - it is not an easy task.  Don't forget to let us know the
source of the problem, and the solution!

Reply by Dimiter_Popoff ●April 27, 20172017-04-27

On 27.4.2017 &#1075;. 12:06, David Brown wrote:
> On 27/04/17 10:02, Dimiter_Popoff wrote:
>> On 27.4.2017 &#1075;. 01:55, pozz wrote:
>
>>> I suspect there's a problem in my low-level driver and sometimes, maybe
>>> near the overflow, the code doesn't work as I expect.
>>> Maybe the TimerSet() function sometimes sets a wrong value to the
>>> uint64_t timer, maybe a tick value that will happen only at the next
>>> overflow of the hw counter.
>>>
>>> Do you see where is the problem?
>>>
>>
>> Hoping that the compiler will know how to fix this is not a very
>> efficient approach.
>> It is just a few lines of assembly, no matter how inconvenient the ARM
>> assembly may be it will still save you weeks (months) of blind trial
>> and error.
>> You just need to _know_ how this is done, typing blindly this or that
>> is unlikely to ever work.
>>
>
> Why would you think it is easier to write this in assembly than C?  The
> issue is to be sure that the /algorithm/ is entirely safe regardless of
> any ordering of interrupts, interrupt disables, timer overflows, etc.
> Exactly the same issues are relevant in assembly.

Perhaps not "easier", just taking a few minutes rather than a few
months.
Doing an lwarx/stwcx. like thing on a timer in a high level language
is simply a waste of someone's time - but then someone might find
it easier to waste time than to write the < 10 lines, keep on
wondering why todays compiler output does not work while yesterdays
did, what is actually happening etc.
The obvious thing to do here is:

-lock the 64 bit entity (test a flag and set it, if found set
try again),
-read the 64 bit counter lwarx/stwcx.-ish,
-unlock the timer.
- ensure this is done more frequently than the lower 32 bits
overflow.

And you want to write this in C (or whatever)?! Well, it is your time.

>
> Another poster suggested fiddling with optimisation levels, which is
> equally odd here.  There are enough "volatile" accesses here to ensure
> that the order of accesses in the C code will be kept when the code is
> compiled to assembly - writing it in assembly or changing optimisation
> levels will not affect that.

Well good, if your C compiler behaves in such a predictable way go
ahead. And don't come crying a year later when you discover it did
so indeed - almost :-).

The mere fact that I see so _many_ and _lengthy_ posts lately all
related to wrestling the compiler to do what one wants it to do, this
taking often _months_ of peoples time instead of a few minutes
speaks for itself.

Dimiter

------------------------------------------------------
Dimiter Popoff, TGI             http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/

Reply by pozz ●April 27, 20172017-04-27

Il 27/04/2017 12:54, David Brown ha scritto:

> You are sure you are not calling ticks() from within another interrupt
> function?

Yes, sure.  I am very careful during writing interrupts code, even if it 
wasn't sufficient in this case.


>> ticks() are called only by TimerSet() and TimerExpired() and those two
>> functions are called only in normal background (not interrupt) code.
>> This means ticks() always runs in a lower priority than TC0 ISR.
>> Moreover, I never disable interrupts in any part of the code.
>>
>> I don't understand what do you mean with "[...] or interrupts take a few
>> cycles to work through the system". Is it possible to read a rolled-over
>> hw counter value (0x00000003) and a not incremented _ticks_high?
>
> There is always a delay between an overflow in the timer, and the actual
> interrupt function being started.  Depending on details of the chip, it
> is possible there will be several cpu cycles of the current instruction
> stream executed before the interrupt function is run.  If I remember
> rightly, the M3/M4 has a 5 stage pipeline.  When the interrupt is
> registered by the core, it effectively means that a "jump to interrupt
> vector" instruction is squeezed into the instruction stream - but these
> current 5 instructions must be completed before the interrupt call takes
> effect.  You will also have a cycle or two delay in the NVIC, and
> perhaps a cycle or two delay getting the signal out of the timer block
> (especially if the timer does not run at full core speed).  Cortex M
> interrupts are handled quickly - but not immediately.

Really?  So, the core is able to read the register of a peripheral 
(TC0->COUNT32.COUNT register, in my case) with a *new* (i.e., rolled) 
value, but the ISR hasn't run yet?  In other words, a bus access can be 
done (TC0 is on the bus) while interrupt request is pending?

If this is the case, it is a mess :-(


>> - many times you need an additional "timer running/active" flag
>>
>> The second point is more important. If you want to switch on a LED after
>> 10 minutes when a button is pressed:
>>
>> bool tmr_led_active;
>> Timer32 tmr_led;
>> void button_pressed_callback(void) {
>>    TimerSet32(&tmr_led, 10 * 60 * 1000);
>>    tmr_led_active = true;
>> }
>> void main_loop(void) {
>>   ...
>>   if (tmr_led_active && TimerExpired32(&tmr_led)) {
>>     switch_on_led();
>>     tmr_led_active = false;
>>   }
>> }
>>
>> If you don't check the timer flag, the led will switch on at random times.
>
> So you have to add a flag - big deal.  You are not using any more
> memory, your code is smaller, and it is simpler to be sure that
> everything is correct.

Yes, of course. My point here is that you have to **remember** that the 
timers you are using can roll-over at any time in the future, so they 
can change from "not expired" to "expired".

After using TimerSet32() and after the timer expires, you could expect 
it stays "expired", until you arm it again with TimerSet32(). This is 
not true, because TimerExpired32() could returns "false" at a certain 
time in the future.

This is not a problem with timers that are repetitive (armed again as 
soon as they expire), but with one-shot timers.


>> Yes, I will try your ideas.
>
> Good luck - it is not an easy task.  Don't forget to let us know the
> source of the problem, and the solution!

I configured TC0 in 16-bis mode, so now TC0_Handler() is fired every 
75ms.  I changed accordingly ticks() to create a 64-bits by shifting 
_ticks_high for 16-bits.

     if (h2 != h1) return ((uint64_t)h2 << 16) + 0;
     else          return ((uint64_t)h1 << 16) + l1;

I was lucky because I can reproduce the problem more often... and 
incredibly the problem is the opposite.

Remember my state-machine in bus.c:

   switch(bus.state) {
     case BUS_IDLE:
       if (TimerExpired(&bus.tmr_answer)) {
         /* Send new request on the bus */
         ...
         TimerSet(&bus.tmr_answer, timeout_answer);
         bus.state = BUS_WAITING_ANSWER;
       }
       break;

     case BUS_WAITING_ANSWER:
       if (TimerExpired(&bus.tmr_answer)) {
         /* No reply */
         bus.state = BUS_IDLE;
         TimerSet(&bus.tmr_answer, 0);
       } else {
         if (reply_received() == true)    {
           /* Analyze the reply */
           bus.state = BUS_IDLE;
           TimerSet(&bus.tmr_answer, 0);    [*]
         }
       }
       break;
   }

When the problem occurs (the bus blocks), bus.state is BUS_IDLE. The 
problem is with instruction [*].  That instructions should arm 
bus.tmr_answer timer such that it expires immediately and a new request 
is send on the bus (in the future, it will be simple to introduce a 
delay between the reply and the next request).

Sometimes the timer doesn't expire immediately, but after the hw counter 
roll-over again.  Indeed I see the bus blocked for 75ms.

We were thinking that _ticks_high hasn't incremented yet in task() when 
reading hw counter value.  But this would have produced an old already 
expired time, not a future not-expired time. Here the problem is with a 
wrong time in the *future*.
In other words, when the problem occurs, ticks() reads a new corrected 
and incremented value for _ticks_high, but an old not-rolled hw counter 
value.

Why this?  I think it's the usual problem with register syncronization 
in Atmel SAM devices?  Before reading or writing certain registers, you 
need to check if the peripheral is in syncing.  I don't know what this 
exactly means, but it relates to the presence of different asyncronous 
clocks.  But I use only one reference clock (an external crystal) that 
is routed to the Cortex-M core and all peripherals... so I thought 
syncronization wasn't necessary.

In this case, it seems syncronization solve my problem, so my ticks() 
function is now:

   static inline uint64_t ticks(void) {
     uint32_t h1 = volatileAccess(_ticks_high);
     TC0->COUNT32.CTRLBSET.reg =
          TC_CTRLBSET_CMD(TC_CTRLBSET_CMD_READSYNC_Val);
     while(TC0->COUNT32.SYNCBUSY) {
     }
     uint32_t l1 = TC0->COUNT32.COUNT.reg;
     uint32_t h2 = volatileAccess(_ticks_high);
     if (h2 != h1) return ((uint64_t)h2 << 32) + 0;
     else          return ((uint64_t)h1 << 32) + l1;
   }

Datasheet says you need to write the CMD bits of TC0->CTRLB register 
with a known value (TC_CTRLBSET_CMD_READSYNC_Val), before reading COUNT 
register.  Why?  I don't know.  TC0->CTRLB is a **Write-Syncronized** 
register, i.e. the value you are writing will be really wrote after sync 
time.  Maybe the sync loop I added waits for time needed to the CTRLB 
command to be executed... otherwise the COUNT value you read immediately 
after could be wrong (IMHO!)


After this... I think my code is always affected by the original problem 
if, as you explained, TC0 interrupt is delayed when I'm reading hw 
counter in ticks().  I don't have the expertise to understand this 
possibility happens really, so it's better to find another method.

I liked the idea to use a 64-bits counter for ticks that will never 
roll-over during the entire lifetime of the device.

Reply by Simon Clubley ●April 27, 20172017-04-27

On 2017-04-27, David Brown <david.brown@hesbynett.no> wrote:
>
> Another poster suggested fiddling with optimisation levels, which is
> equally odd here.

Not really. I suggested changing the optimisation level because if it
works reliably with lower optimisation levels then that strongly implies
an issue within the code itself instead of in the hardware.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by David Brown ●April 27, 20172017-04-27

On 27/04/17 13:51, pozz wrote:
> Il 27/04/2017 12:54, David Brown ha scritto:
> 
>> You are sure you are not calling ticks() from within another interrupt
>> function?
> 
> Yes, sure.  I am very careful during writing interrupts code, even if it
> wasn't sufficient in this case.
> 
> 
>>> ticks() are called only by TimerSet() and TimerExpired() and those two
>>> functions are called only in normal background (not interrupt) code.
>>> This means ticks() always runs in a lower priority than TC0 ISR.
>>> Moreover, I never disable interrupts in any part of the code.
>>>
>>> I don't understand what do you mean with "[...] or interrupts take a few
>>> cycles to work through the system". Is it possible to read a rolled-over
>>> hw counter value (0x00000003) and a not incremented _ticks_high?
>>
>> There is always a delay between an overflow in the timer, and the actual
>> interrupt function being started.  Depending on details of the chip, it
>> is possible there will be several cpu cycles of the current instruction
>> stream executed before the interrupt function is run.  If I remember
>> rightly, the M3/M4 has a 5 stage pipeline.  When the interrupt is
>> registered by the core, it effectively means that a "jump to interrupt
>> vector" instruction is squeezed into the instruction stream - but these
>> current 5 instructions must be completed before the interrupt call takes
>> effect.  You will also have a cycle or two delay in the NVIC, and
>> perhaps a cycle or two delay getting the signal out of the timer block
>> (especially if the timer does not run at full core speed).  Cortex M
>> interrupts are handled quickly - but not immediately.
> 
> Really?  So, the core is able to read the register of a peripheral
> (TC0->COUNT32.COUNT register, in my case) with a *new* (i.e., rolled)
> value, but the ISR hasn't run yet?  In other words, a bus access can be
> done (TC0 is on the bus) while interrupt request is pending?
> 

I cannot say for sure that this can happen.  But unless I can say for
sure that it /cannot/ happen, I prefer to assume the worst is possible.

> If this is the case, it is a mess :-(
> 
> 
>>> - many times you need an additional "timer running/active" flag
>>>
>>> The second point is more important. If you want to switch on a LED after
>>> 10 minutes when a button is pressed:
>>>
>>> bool tmr_led_active;
>>> Timer32 tmr_led;
>>> void button_pressed_callback(void) {
>>>    TimerSet32(&tmr_led, 10 * 60 * 1000);
>>>    tmr_led_active = true;
>>> }
>>> void main_loop(void) {
>>>   ...
>>>   if (tmr_led_active && TimerExpired32(&tmr_led)) {
>>>     switch_on_led();
>>>     tmr_led_active = false;
>>>   }
>>> }
>>>
>>> If you don't check the timer flag, the led will switch on at random
>>> times.
>>
>> So you have to add a flag - big deal.  You are not using any more
>> memory, your code is smaller, and it is simpler to be sure that
>> everything is correct.
> 
> Yes, of course. My point here is that you have to **remember** that the
> timers you are using can roll-over at any time in the future, so they
> can change from "not expired" to "expired".

True.  But perhaps that can be baked into your "Set" and "Expired"
functions, or otherwise made "unforgettable".  The aim is to simplify
the code that /may/ have rare race conditions into something that cannot
possibly have such problems - even if it means other code is bigger or
less efficient.

> 
> After using TimerSet32() and after the timer expires, you could expect
> it stays "expired", until you arm it again with TimerSet32(). This is
> not true, because TimerExpired32() could returns "false" at a certain
> time in the future.
> 
> This is not a problem with timers that are repetitive (armed again as
> soon as they expire), but with one-shot timers.
> 
> 
>>> Yes, I will try your ideas.
>>
>> Good luck - it is not an easy task.  Don't forget to let us know the
>> source of the problem, and the solution!
> 
> I configured TC0 in 16-bis mode, so now TC0_Handler() is fired every
> 75ms.  I changed accordingly ticks() to create a 64-bits by shifting
> _ticks_high for 16-bits.
> 
>     if (h2 != h1) return ((uint64_t)h2 << 16) + 0;
>     else          return ((uint64_t)h1 << 16) + l1;
> 
> I was lucky because I can reproduce the problem more often... and
> incredibly the problem is the opposite.
> 
> Remember my state-machine in bus.c:
> 
>   switch(bus.state) {
>     case BUS_IDLE:
>       if (TimerExpired(&bus.tmr_answer)) {
>         /* Send new request on the bus */
>         ...
>         TimerSet(&bus.tmr_answer, timeout_answer);
>         bus.state = BUS_WAITING_ANSWER;
>       }
>       break;
> 
>     case BUS_WAITING_ANSWER:
>       if (TimerExpired(&bus.tmr_answer)) {
>         /* No reply */
>         bus.state = BUS_IDLE;
>         TimerSet(&bus.tmr_answer, 0);
>       } else {
>         if (reply_received() == true)    {
>           /* Analyze the reply */
>           bus.state = BUS_IDLE;
>           TimerSet(&bus.tmr_answer, 0);    [*]
>         }
>       }
>       break;
>   }
> 
> When the problem occurs (the bus blocks), bus.state is BUS_IDLE. The
> problem is with instruction [*].  That instructions should arm
> bus.tmr_answer timer such that it expires immediately and a new request
> is send on the bus (in the future, it will be simple to introduce a
> delay between the reply and the next request).
> 
> Sometimes the timer doesn't expire immediately, but after the hw counter
> roll-over again.  Indeed I see the bus blocked for 75ms.
> 
> We were thinking that _ticks_high hasn't incremented yet in task() when
> reading hw counter value.  But this would have produced an old already
> expired time, not a future not-expired time. Here the problem is with a
> wrong time in the *future*.
> In other words, when the problem occurs, ticks() reads a new corrected
> and incremented value for _ticks_high, but an old not-rolled hw counter
> value.
> 
> Why this?  I think it's the usual problem with register syncronization
> in Atmel SAM devices?  Before reading or writing certain registers, you
> need to check if the peripheral is in syncing.  I don't know what this
> exactly means, but it relates to the presence of different asyncronous
> clocks.  But I use only one reference clock (an external crystal) that
> is routed to the Cortex-M core and all peripherals... so I thought
> syncronization wasn't necessary.
> 
> In this case, it seems syncronization solve my problem, so my ticks()
> function is now:
> 
>   static inline uint64_t ticks(void) {
>     uint32_t h1 = volatileAccess(_ticks_high);
>     TC0->COUNT32.CTRLBSET.reg =
>          TC_CTRLBSET_CMD(TC_CTRLBSET_CMD_READSYNC_Val);
>     while(TC0->COUNT32.SYNCBUSY) {
>     }
>     uint32_t l1 = TC0->COUNT32.COUNT.reg;
>     uint32_t h2 = volatileAccess(_ticks_high);
>     if (h2 != h1) return ((uint64_t)h2 << 32) + 0;
>     else          return ((uint64_t)h1 << 32) + l1;
>   }
> 
> Datasheet says you need to write the CMD bits of TC0->CTRLB register
> with a known value (TC_CTRLBSET_CMD_READSYNC_Val), before reading COUNT
> register.  Why?  I don't know.  TC0->CTRLB is a **Write-Syncronized**
> register, i.e. the value you are writing will be really wrote after sync
> time.  Maybe the sync loop I added waits for time needed to the CTRLB
> command to be executed... otherwise the COUNT value you read immediately
> after could be wrong (IMHO!)
> 
> 
> After this... I think my code is always affected by the original problem
> if, as you explained, TC0 interrupt is delayed when I'm reading hw
> counter in ticks().  I don't have the expertise to understand this
> possibility happens really, so it's better to find another method.
> 
> I liked the idea to use a 64-bits counter for ticks that will never
> roll-over during the entire lifetime of the device.

If you can make your hardware timer function run every millisecond (or
whatever accuracy you need), then use this:

extern volatile uint64_t tickCounter_;

static inline uint64_t ticks(void) {
	uint64_t a = tickCounter_;
	while (true) {
		uint64_t b = tickCounter_;
		if (a == b) return a;
		a = b;
	}
}

void TC0_Handler(void) {
	if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) {
		tickCounter++;
		TC0->COUNT32.INTFLAG.reg = TC_INTFLAG_OVF;
	}
}	

If ticks never accesses the timer hardware register, there cannot be a
problem with synchronisation.  There is no need to use the timer
peripherals complicated synchronisation and locking mechanism, nor any
concern about interrupt delays.  Re-reading the 64-bit value until you
have two identical reads is sufficient to ensure that you have a
consistent value even if a timer interrupt occurs in the middle of the
64-bit read.

Reply by David Brown ●April 27, 20172017-04-27

On 27/04/17 13:20, Dimiter_Popoff wrote:
> On 27.4.2017 &#1075;. 12:06, David Brown wrote:
>> On 27/04/17 10:02, Dimiter_Popoff wrote:
>>> On 27.4.2017 &#1075;. 01:55, pozz wrote:
>>
>>>> I suspect there's a problem in my low-level driver and sometimes, maybe
>>>> near the overflow, the code doesn't work as I expect.
>>>> Maybe the TimerSet() function sometimes sets a wrong value to the
>>>> uint64_t timer, maybe a tick value that will happen only at the next
>>>> overflow of the hw counter.
>>>>
>>>> Do you see where is the problem?
>>>>
>>>
>>> Hoping that the compiler will know how to fix this is not a very
>>> efficient approach.
>>> It is just a few lines of assembly, no matter how inconvenient the ARM
>>> assembly may be it will still save you weeks (months) of blind trial
>>> and error.
>>> You just need to _know_ how this is done, typing blindly this or that
>>> is unlikely to ever work.
>>>
>>
>> Why would you think it is easier to write this in assembly than C?  The
>> issue is to be sure that the /algorithm/ is entirely safe regardless of
>> any ordering of interrupts, interrupt disables, timer overflows, etc.
>> Exactly the same issues are relevant in assembly.
> 
> Perhaps not "easier", just taking a few minutes rather than a few
> months.

You don't think you are exaggerating just a /little/ ?

> Doing an lwarx/stwcx. like thing on a timer in a high level language
> is simply a waste of someone's time - but then someone might find
> it easier to waste time than to write the < 10 lines, keep on
> wondering why todays compiler output does not work while yesterdays
> did, what is actually happening etc.
> The obvious thing to do here is:
> 
> -lock the 64 bit entity (test a flag and set it, if found set
> try again),
> -read the 64 bit counter lwarx/stwcx.-ish,
> -unlock the timer.
> - ensure this is done more frequently than the lower 32 bits
> overflow.
> 
> And you want to write this in C (or whatever)?! Well, it is your time.

The problem has nothing to do with making an atomic 64 bit read of data
- that is /easy/ in C.  The problem is the locking and synchronisation
system for reading this timer peripheral in this particular
microcontroller, which is inconvenient no matter what language is used.
 The other considerations are synchronisation between updates of the
data in the interrupt function, the calling function, and the hardware
timer - again, language independent.

A lwarx/stwcx can be useful for doing read/write/modify operations on an
object - it is not necessary for atomically reading data which could be
changed underway.  The simplest way to handle that, in C /or/ in
assembly, is to read it more than once:

volatile uint64_t count;
uint64_t readCount(void) {
    uint64_t a = count;
    while (true) {
        uint64_t b = count;
        if (a == b) return a;
        a = b;
    }
}

Alternatively, using C11, you handle it like this:

_Atomic uint64_t a_count;
uint64_t readCount(void) {
	return a_count;
}

I appreciate that you are very familiar with assembly and may write it
faster than C, but the above function did not take me a month to figure out.

Using lwarx/stwcx (or ldrex/strex on ARM) will not be shorter or faster,
and it will be more complex.  Basically, you need to use these to create
a lock for the access to the non-atomic type.  They are always harder to
use than something like C11's atomic access support, regardless of the
language.  And if you want to program in C, and don't have C11 atomics
(or other help from your compiler), you can easily write inline assembly
wrappers for those two instructions and write everything else in C.

> 
>>
>> Another poster suggested fiddling with optimisation levels, which is
>> equally odd here.  There are enough "volatile" accesses here to ensure
>> that the order of accesses in the C code will be kept when the code is
>> compiled to assembly - writing it in assembly or changing optimisation
>> levels will not affect that.
> 
> Well good, if your C compiler behaves in such a predictable way go
> ahead. And don't come crying a year later when you discover it did
> so indeed - almost :-).

My C compiler(s) behave in a predictable manner - because I know how C
works.  People who don't know all the fine details can get it wrong, but
that applies to any language.

> 
> The mere fact that I see so _many_ and _lengthy_ posts lately all
> related to wrestling the compiler to do what one wants it to do, this
> taking often _months_ of peoples time instead of a few minutes
> speaks for itself.
> 

I follow most threads in this newsgroup, but I can't figure out what you
are referring to here.  There are occasional posts where people make
mistakes due to misunderstandings about how C works and how their
compiler works - but they are not /that/ common.  And by what possible
justification could you claim that writing the code in assembly would
result in fewer bugs or shorter development times?  You might reasonably
say that the programmer would not make those particular mistakes - but
they could easily make many, many more errors in their assembly code
that they would not have made with C.

Reply by David Brown ●April 27, 20172017-04-27

On 27/04/17 14:03, Simon Clubley wrote:
> On 2017-04-27, David Brown <david.brown@hesbynett.no> wrote:
>>
>> Another poster suggested fiddling with optimisation levels, which is
>> equally odd here.
> 
> Not really. I suggested changing the optimisation level because if it
> works reliably with lower optimisation levels then that strongly implies
> an issue within the code itself instead of in the hardware.
> 

As in, "it is worth giving it a try to see if it affects behaviour, and
therefore gives a clue to help debugging" ?  Yes, I can agree with that.
 Once the OP has made changes that speed up the error rate to something
workable, that kind of thing can be worth trying.

Previous 123 4 Next

A timer driver for Cortex-M0+... it rarely doesn't work

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group