EmbeddedRelated.com
Forums
Memfault Beyond the Launch

A timer driver for Cortex-M0+... it rarely doesn't work

Started by pozz April 26, 2017
On 27/04/17 10:02, Dimiter_Popoff wrote:
> On 27.4.2017 г. 01:55, pozz wrote:
>> I suspect there's a problem in my low-level driver and sometimes, maybe >> near the overflow, the code doesn't work as I expect. >> Maybe the TimerSet() function sometimes sets a wrong value to the >> uint64_t timer, maybe a tick value that will happen only at the next >> overflow of the hw counter. >> >> Do you see where is the problem? >> > > Hoping that the compiler will know how to fix this is not a very > efficient approach. > It is just a few lines of assembly, no matter how inconvenient the ARM > assembly may be it will still save you weeks (months) of blind trial > and error. > You just need to _know_ how this is done, typing blindly this or that > is unlikely to ever work. >
Why would you think it is easier to write this in assembly than C? The issue is to be sure that the /algorithm/ is entirely safe regardless of any ordering of interrupts, interrupt disables, timer overflows, etc. Exactly the same issues are relevant in assembly. Another poster suggested fiddling with optimisation levels, which is equally odd here. There are enough "volatile" accesses here to ensure that the order of accesses in the C code will be kept when the code is compiled to assembly - writing it in assembly or changing optimisation levels will not affect that. At most, it will change the timing - making the problem appear more or less often. The problem is to ensure that the ordering of actions specified by the programmer and the algorithm is correct - not that it is being compiled as expected.
Il 27/04/2017 10:01, David Brown ha scritto:

>> #define volatileAccess(v) *((volatile typeof((v)) *) &(v)) > > That looks familiar :-)
;-)
>> static inline uint64_t ticks(void) { >> uint32_t h1 = volatileAccess(_ticks_high); >> TC0->COUNT32.CTRLBSET.reg = >> TC_CTRLBSET_CMD(TC_CTRLBSET_CMD_READSYNC_Val); >> uint32_t l1 = TC0->COUNT32.COUNT.reg; >> uint32_t h2 = volatileAccess(_ticks_high); >> if (h2 != h1) return ((uint64_t)h2 << 32) + 0; >> else return ((uint64_t)h1 << 32) + l1; >> } > > (I don't know what the CTRLBSET stuff is for - I am not familiar with > Atmel's chips here.)
Atmel says you *need* to write CTRLB register before reading COUNT register from TC0 peripheral. I don't know why exactly, but if you don't, you can't read the correct COUNT value.
> If the low part of the counter rolls over and the interrupt has not run > (maybe interrupts are disabled, or you are already in a higher priority > interrupt, or interrupts take a few cycles to work through the system) > then this will be wrong - l1 will have rolled over, but h2 will not show > an updated value yet.
Yes, you are right and I thought about this possibility... but IMHO it's impossibile. ticks() are called only by TimerSet() and TimerExpired() and those two functions are called only in normal background (not interrupt) code. This means ticks() always runs in a lower priority than TC0 ISR. Moreover, I never disable interrupts in any part of the code. I don't understand what do you mean with "[...] or interrupts take a few cycles to work through the system". Is it possible to read a rolled-over hw counter value (0x00000003) and a not incremented _ticks_high? uint32_t l1 = TC0->COUNT32.COUNT.reg; uint32_t h2 = volatileAccess(_ticks_high); COUNT register is read before _ticks_high second read. If COUNT has rolled over and the rolled value is in l1, the interrupt was fired for sure... IMHO. So h2 should contain the incremented value.
> So if the high parts don't match, you need to re-read the low part and > re-check the high parts. It is perhaps easiest to express in a loop: > > static inline uint64_t ticks(void) { > uint32_t h1 = volatileAccess(_ticks_high); > while (true) { > uint32_t l1 = TC0->COUNT32.COUNT.reg; > uint32_t h2 = volatileAccess(_ticks_high); > if (h1 == h2) return ((uint64_t) h2 << 32) | l1; > h1 = h2; > } > } > > You can also reasonably note that this loop is not going to be re-run > more than once, unless you have interrupt functions that last for an > hour and a half, or something similarly bad - and then the failure of > the ticks() function is the least of your problems! Then you can write: > > static inline uint64_t ticks(void) { > uint32_t h1 = volatileAccess(_ticks_high); > uint32_t l1 = TC0->COUNT32.COUNT.reg; > uint32_t h2 = volatileAccess(_ticks_high); > if (h1 != h2) l1 = TC0->COUNT32.COUNT.reg; > return ((uint64_t) h2 << 32) | l1; > }
This doesn't solve the problem you described. _ticks_high = 1 90 <- h1 = 1 91 ... 99 0 <- read l1 1 2 <- h2 = 1 (interrupt has not fired, your assumption) 3 ticks() could be {1,0} that is completely wrong.
>> static inline void TimerSet(Timer *tmr, uint64_t delay) { >> /* delay is in ms */ >> *tmr = ticks() + delay * TIMER_FREQ / 1000; >> } >> >> static inline int TimerExpired(Timer *tmr) { >> return ticks() >= *tmr; >> } > > When you want a boolean result, return "bool", not "int". (This is not > your problem, of course - it's just good habit.) > > An alternative method here is to just use the low 32-bit part (with > delays limited to 2^31 ticks), but be sure that you have considered > wraparound. Keep everything in unsigned at first, to make overflows > work as modulo arithmetic: > > static uint32_t ticks32(void) { > return TC0->COUNT32.COUNT.reg; > } > > static inline void TimerSet32(Timer32 *tmr, uint32_t delay) { > *tmr = ticks32() + delay * TIMER_FREQ / 1000; > } > > static inline bool TimerExpired32(Timer32 *tmr) { > int32_t d = ticks32() - *tmr; > return (d >= 0); > } > > Note that doing the subtraction as unsigned, then converting to signed > int, gives you the wraparound behaviour you need. Converting an > unsigned int to a signed int when the result is out of range is > implementation dependent (&#4294967295;6.3.1.3), but gcc and many other compilers > define it as modulo arithmetic: > <https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html>
Yes, I know this method and I used it many times in the past. However it has two drawbacks: - the maximum delay is 2^31 (in my case, around 40 minutes, but I have delays of 1h) - many times you need an additional "timer running/active" flag The second point is more important. If you want to switch on a LED after 10 minutes when a button is pressed: bool tmr_led_active; Timer32 tmr_led; void button_pressed_callback(void) { TimerSet32(&tmr_led, 10 * 60 * 1000); tmr_led_active = true; } void main_loop(void) { ... if (tmr_led_active && TimerExpired32(&tmr_led)) { switch_on_led(); tmr_led_active = false; } } If you don't check the timer flag, the led will switch on at random times.
>> --- End of timer.h --- >> >> In timer.c I have the ISR: >> --- Start of timer.c --- >> ... >> void TC0_Handler(void) { >> if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) { >> ++_ticks_high; >> TC0->COUNT32.INTFLAG.reg = TC_INTFLAG_OVF; >> } >> } >> ... >> --- End of timer.c --- >> >> The idea is simple (and stolen from a post appeared on this newsgroup). > > "Good artists copy, great artists steal", as someone famous once said. > >> At first, the 64-bits software counter must be calculated disabling >> interrupts, because if a timer interrupt triggers during calculation, >> the overall software counter could be wrong. >> By reading _ticks_high before and after reading hw counter, we can avoid >> to disable the interrupts. >> >> >> Now I have a code that uses timers. It's a simple state-machine that >> manages communication over a bus. >> --- Start of bus.c --- >> ... >> int bus_task(void) { >> switch(bus.state) { >> case BUS_IDLE: >> if (TimerExpired(&bus.tmr_answer)) { >> /* Send new request on the bus */ >> ... >> TimerSet(&bus.tmr_answer, timeout_answer); >> bus.state = BUS_WAITING_ANSWER; >> } >> break; >> >> case BUS_WAITING_ANSWER: >> if (TimerExpired(&bus.tmr_answer)) { >> /* No reply */ >> bus.state = BUS_IDLE; >> TimerSet(&bus.tmr_answer, 0); >> } else { >> if (reply_received() == true) { >> /* Analyze the reply */ >> bus.state = BUS_IDLE; >> TimerSet(&bus.tmr_answer, 0); >> } >> } >> break; >> } >> return 0; >> } >> ... >> --- End of bus.c --- >> >> I don't think I need to explain the code in bus.c. The only thing to >> specify is that bus_task() is called continuously in the main loop. >> >> 99% of the time this code works well. Unfortunately I have seen some >> strange events. Rarely (very rarely, one time in a week) the bus seems >> frozen for a time. After that it restarts the normal activity magically. >> There's a thing that relates those strange events to driver of timers: >> the bus stall time lasts exactly 1h 21', the overflow period of hw counter. > > It sounds like you are missing a tick interrupt - maybe you are somehow > blocking the overflow interrupt on occasion. Maybe the CTRLBSET line is > the problem (since I don't know what it does!) But it could be the > problem I noted above in ticks(). > > Diagnosing problems like this are a real pain - and demonstrating that > you have fixed them is even worse. The key is to find some way to speed > up the hardware timer so that you can provoke the problem regularly - > then you can be sure when you have fixed it. Are you able to make this > timer overflow at a lower bit count - say, 12 bits rather than 32 bits? > If not, then try this for your hardware interrupt function: > > void TC0_Handler(void) { > if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) { > ++_ticks_high; > TC0->COUNT32.INTFLAG.reg = TC_INTFLAG_OVF; > TC0->COUNT32.COUNT.reg = 0xfffff000; > } > } > > That will force overflows and _ticks_high counts to run much faster.
Yes, I will try your ideas.
Il 27/04/2017 10:27, David Brown ha scritto:
 >> [...]
>> static inline uint64_t ticks(void) { >> uint32_t h1 = volatileAccess(_ticks_high); >> uint32_t l1 = TC0->COUNT32.COUNT.reg; >> uint32_t h2 = volatileAccess(_ticks_high); >> if (h1 != h2) l1 = TC0->COUNT32.COUNT.reg; >> return ((uint64_t) h2 << 32) | l1; >> } >> > > Looking over these again, I can see that they will have the same > potential problem. They are fine for when the high part of the counter > is also in hardware - but /not/ if it is dependent on interrupts which > could be delayed. If you have a situation like this: > > _ticks_high is 1 > timer reg is 0xffff'fff0 > interrupts are disabled > timer reg rolls over, and is now 0x0000'0003 > the ticks() function will return 0x0000'0001'0000'0003 instead of > 0x0000'0002'0000'0003 > > You will need to check the hardware overflow flag to make this work.
Yes, but this situation it's impossible in my case. Who could disable interrupts here?
> > static inline uint64_t ticks(void) { > uint32_t h1 = volatileAccess(_ticks_high); > uint32_t l1 = TC0->COUNT32.COUNT.reg; > uint32_t h2 = volatileAccess(_ticks_high); > if (h1 != h2) { > // We just had an interrupt, so we know there is > // no pending overflow > l1 = TC0->COUNT32.COUNT.reg; > // Or l1 = 0 for slightly faster code > return ((uint64_t) h2 << 32) | l1; > } > if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) { > // There has been an overflow, and it was not handled > l1 = TC0->COUNT32.COUNT.reg; > return ((uint64_t) (h2 + 1) << 32) | l1; > } > // If there has been an overflow or an interrupt, it happened > // after the first reads - so these are consistent and safe > return ((uint64_t) h1 << 32) | l1; > } > > If possible, change to the 32 bit version :-) Failing that, if your > chip supports chaining of timers in hardware, use that.
Atmel SAM TCx peripherals are 16-bits counters/timers, but they can be chained in a couple to have a 32-bits counter/timer. I already coupled TC0 with TC1 to have a 32-bits hw counter. I can't chain TC0/TC1 with TC2/TC3 to have a hardware 64-bits counter/timer.
> Thanks for this thread, Pozz - it's a good question, makes people think, > and can hopefully help other developers.
Yes, those problems are fascinating :-)
On 27/04/17 11:38, pozz wrote:
> Il 27/04/2017 10:01, David Brown ha scritto: > >>> #define volatileAccess(v) *((volatile typeof((v)) *) &(v)) >> >> That looks familiar :-) > > ;-) > > >>> static inline uint64_t ticks(void) { >>> uint32_t h1 = volatileAccess(_ticks_high); >>> TC0->COUNT32.CTRLBSET.reg = >>> TC_CTRLBSET_CMD(TC_CTRLBSET_CMD_READSYNC_Val); >>> uint32_t l1 = TC0->COUNT32.COUNT.reg; >>> uint32_t h2 = volatileAccess(_ticks_high); >>> if (h2 != h1) return ((uint64_t)h2 << 32) + 0; >>> else return ((uint64_t)h1 << 32) + l1; >>> } >> >> (I don't know what the CTRLBSET stuff is for - I am not familiar with >> Atmel's chips here.) > > Atmel says you *need* to write CTRLB register before reading COUNT > register from TC0 peripheral. I don't know why exactly, but if you > don't, you can't read the correct COUNT value. > > >> If the low part of the counter rolls over and the interrupt has not run >> (maybe interrupts are disabled, or you are already in a higher priority >> interrupt, or interrupts take a few cycles to work through the system) >> then this will be wrong - l1 will have rolled over, but h2 will not show >> an updated value yet. > > Yes, you are right and I thought about this possibility... but IMHO it's > impossibile.
You are sure you are not calling ticks() from within another interrupt function?
> > ticks() are called only by TimerSet() and TimerExpired() and those two > functions are called only in normal background (not interrupt) code. > This means ticks() always runs in a lower priority than TC0 ISR. > Moreover, I never disable interrupts in any part of the code. > > I don't understand what do you mean with "[...] or interrupts take a few > cycles to work through the system". Is it possible to read a rolled-over > hw counter value (0x00000003) and a not incremented _ticks_high?
There is always a delay between an overflow in the timer, and the actual interrupt function being started. Depending on details of the chip, it is possible there will be several cpu cycles of the current instruction stream executed before the interrupt function is run. If I remember rightly, the M3/M4 has a 5 stage pipeline. When the interrupt is registered by the core, it effectively means that a "jump to interrupt vector" instruction is squeezed into the instruction stream - but these current 5 instructions must be completed before the interrupt call takes effect. You will also have a cycle or two delay in the NVIC, and perhaps a cycle or two delay getting the signal out of the timer block (especially if the timer does not run at full core speed). Cortex M interrupts are handled quickly - but not immediately.
> > uint32_t l1 = TC0->COUNT32.COUNT.reg; > uint32_t h2 = volatileAccess(_ticks_high); > > COUNT register is read before _ticks_high second read. If COUNT has > rolled over and the rolled value is in l1, the interrupt was fired for > sure... IMHO. So h2 should contain the incremented value. > > >> So if the high parts don't match, you need to re-read the low part and >> re-check the high parts. It is perhaps easiest to express in a loop: >> >> static inline uint64_t ticks(void) { >> uint32_t h1 = volatileAccess(_ticks_high); >> while (true) { >> uint32_t l1 = TC0->COUNT32.COUNT.reg; >> uint32_t h2 = volatileAccess(_ticks_high); >> if (h1 == h2) return ((uint64_t) h2 << 32) | l1; >> h1 = h2; >> } >> } >> >> You can also reasonably note that this loop is not going to be re-run >> more than once, unless you have interrupt functions that last for an >> hour and a half, or something similarly bad - and then the failure of >> the ticks() function is the least of your problems! Then you can write: >> >> static inline uint64_t ticks(void) { >> uint32_t h1 = volatileAccess(_ticks_high); >> uint32_t l1 = TC0->COUNT32.COUNT.reg; >> uint32_t h2 = volatileAccess(_ticks_high); >> if (h1 != h2) l1 = TC0->COUNT32.COUNT.reg; >> return ((uint64_t) h2 << 32) | l1; >> } > > This doesn't solve the problem you described. >
Yes - hence my follow-up post.
> _ticks_high = 1 > 90 <- h1 = 1 > 91 > ... > 99 > 0 <- read l1 > 1 > 2 <- h2 = 1 (interrupt has not fired, your assumption) > 3 > > ticks() could be {1,0} that is completely wrong. > > >>> static inline void TimerSet(Timer *tmr, uint64_t delay) { >>> /* delay is in ms */ >>> *tmr = ticks() + delay * TIMER_FREQ / 1000; >>> } >>> >>> static inline int TimerExpired(Timer *tmr) { >>> return ticks() >= *tmr; >>> } >> >> When you want a boolean result, return "bool", not "int". (This is not >> your problem, of course - it's just good habit.) >> >> An alternative method here is to just use the low 32-bit part (with >> delays limited to 2^31 ticks), but be sure that you have considered >> wraparound. Keep everything in unsigned at first, to make overflows >> work as modulo arithmetic: >> >> static uint32_t ticks32(void) { >> return TC0->COUNT32.COUNT.reg; >> } >> >> static inline void TimerSet32(Timer32 *tmr, uint32_t delay) { >> *tmr = ticks32() + delay * TIMER_FREQ / 1000; >> } >> >> static inline bool TimerExpired32(Timer32 *tmr) { >> int32_t d = ticks32() - *tmr; >> return (d >= 0); >> } >> >> Note that doing the subtraction as unsigned, then converting to signed >> int, gives you the wraparound behaviour you need. Converting an >> unsigned int to a signed int when the result is out of range is >> implementation dependent (&#4294967295;6.3.1.3), but gcc and many other compilers >> define it as modulo arithmetic: >> <https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html> > > Yes, I know this method and I used it many times in the past. However it > has two drawbacks: > - the maximum delay is 2^31 (in my case, around 40 minutes, but I have > delays of 1h)
In such cases, you rarely need the same level of accuracy. Use your accurate ticks32() counter to let you track a seconds counter in the main loop of your code - this can be read freely without worrying about synchronisation. Another method is to make the timer overflow interrupt occur more often - say, every millisecond. This increments a single millisecond counter (either 32-bit, or 64-bit, or split lo/hi 32-bit - whatever suits). Then reading that is easy, because the change to this counter is always done in the same context as an atomic operation - there is no complication of independent updates of the low and high halves. And most of the time, it is enough to just use the low 32-bit part.
> - many times you need an additional "timer running/active" flag > > The second point is more important. If you want to switch on a LED after > 10 minutes when a button is pressed: > > bool tmr_led_active; > Timer32 tmr_led; > void button_pressed_callback(void) { > TimerSet32(&tmr_led, 10 * 60 * 1000); > tmr_led_active = true; > } > void main_loop(void) { > ... > if (tmr_led_active && TimerExpired32(&tmr_led)) { > switch_on_led(); > tmr_led_active = false; > } > } > > If you don't check the timer flag, the led will switch on at random times.
So you have to add a flag - big deal. You are not using any more memory, your code is smaller, and it is simpler to be sure that everything is correct. <snip>
> > Yes, I will try your ideas.
Good luck - it is not an easy task. Don't forget to let us know the source of the problem, and the solution!
On 27.4.2017 &#1075;. 12:06, David Brown wrote:
> On 27/04/17 10:02, Dimiter_Popoff wrote: >> On 27.4.2017 &#1075;. 01:55, pozz wrote: > >>> I suspect there's a problem in my low-level driver and sometimes, maybe >>> near the overflow, the code doesn't work as I expect. >>> Maybe the TimerSet() function sometimes sets a wrong value to the >>> uint64_t timer, maybe a tick value that will happen only at the next >>> overflow of the hw counter. >>> >>> Do you see where is the problem? >>> >> >> Hoping that the compiler will know how to fix this is not a very >> efficient approach. >> It is just a few lines of assembly, no matter how inconvenient the ARM >> assembly may be it will still save you weeks (months) of blind trial >> and error. >> You just need to _know_ how this is done, typing blindly this or that >> is unlikely to ever work. >> > > Why would you think it is easier to write this in assembly than C? The > issue is to be sure that the /algorithm/ is entirely safe regardless of > any ordering of interrupts, interrupt disables, timer overflows, etc. > Exactly the same issues are relevant in assembly.
Perhaps not "easier", just taking a few minutes rather than a few months. Doing an lwarx/stwcx. like thing on a timer in a high level language is simply a waste of someone's time - but then someone might find it easier to waste time than to write the < 10 lines, keep on wondering why todays compiler output does not work while yesterdays did, what is actually happening etc. The obvious thing to do here is: -lock the 64 bit entity (test a flag and set it, if found set try again), -read the 64 bit counter lwarx/stwcx.-ish, -unlock the timer. - ensure this is done more frequently than the lower 32 bits overflow. And you want to write this in C (or whatever)?! Well, it is your time.
> > Another poster suggested fiddling with optimisation levels, which is > equally odd here. There are enough "volatile" accesses here to ensure > that the order of accesses in the C code will be kept when the code is > compiled to assembly - writing it in assembly or changing optimisation > levels will not affect that.
Well good, if your C compiler behaves in such a predictable way go ahead. And don't come crying a year later when you discover it did so indeed - almost :-). The mere fact that I see so _many_ and _lengthy_ posts lately all related to wrestling the compiler to do what one wants it to do, this taking often _months_ of peoples time instead of a few minutes speaks for itself. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
Il 27/04/2017 12:54, David Brown ha scritto:

> You are sure you are not calling ticks() from within another interrupt > function?
Yes, sure. I am very careful during writing interrupts code, even if it wasn't sufficient in this case.
>> ticks() are called only by TimerSet() and TimerExpired() and those two >> functions are called only in normal background (not interrupt) code. >> This means ticks() always runs in a lower priority than TC0 ISR. >> Moreover, I never disable interrupts in any part of the code. >> >> I don't understand what do you mean with "[...] or interrupts take a few >> cycles to work through the system". Is it possible to read a rolled-over >> hw counter value (0x00000003) and a not incremented _ticks_high? > > There is always a delay between an overflow in the timer, and the actual > interrupt function being started. Depending on details of the chip, it > is possible there will be several cpu cycles of the current instruction > stream executed before the interrupt function is run. If I remember > rightly, the M3/M4 has a 5 stage pipeline. When the interrupt is > registered by the core, it effectively means that a "jump to interrupt > vector" instruction is squeezed into the instruction stream - but these > current 5 instructions must be completed before the interrupt call takes > effect. You will also have a cycle or two delay in the NVIC, and > perhaps a cycle or two delay getting the signal out of the timer block > (especially if the timer does not run at full core speed). Cortex M > interrupts are handled quickly - but not immediately.
Really? So, the core is able to read the register of a peripheral (TC0->COUNT32.COUNT register, in my case) with a *new* (i.e., rolled) value, but the ISR hasn't run yet? In other words, a bus access can be done (TC0 is on the bus) while interrupt request is pending? If this is the case, it is a mess :-(
>> - many times you need an additional "timer running/active" flag >> >> The second point is more important. If you want to switch on a LED after >> 10 minutes when a button is pressed: >> >> bool tmr_led_active; >> Timer32 tmr_led; >> void button_pressed_callback(void) { >> TimerSet32(&tmr_led, 10 * 60 * 1000); >> tmr_led_active = true; >> } >> void main_loop(void) { >> ... >> if (tmr_led_active && TimerExpired32(&tmr_led)) { >> switch_on_led(); >> tmr_led_active = false; >> } >> } >> >> If you don't check the timer flag, the led will switch on at random times. > > So you have to add a flag - big deal. You are not using any more > memory, your code is smaller, and it is simpler to be sure that > everything is correct.
Yes, of course. My point here is that you have to **remember** that the timers you are using can roll-over at any time in the future, so they can change from "not expired" to "expired". After using TimerSet32() and after the timer expires, you could expect it stays "expired", until you arm it again with TimerSet32(). This is not true, because TimerExpired32() could returns "false" at a certain time in the future. This is not a problem with timers that are repetitive (armed again as soon as they expire), but with one-shot timers.
>> Yes, I will try your ideas. > > Good luck - it is not an easy task. Don't forget to let us know the > source of the problem, and the solution!
I configured TC0 in 16-bis mode, so now TC0_Handler() is fired every 75ms. I changed accordingly ticks() to create a 64-bits by shifting _ticks_high for 16-bits. if (h2 != h1) return ((uint64_t)h2 << 16) + 0; else return ((uint64_t)h1 << 16) + l1; I was lucky because I can reproduce the problem more often... and incredibly the problem is the opposite. Remember my state-machine in bus.c: switch(bus.state) { case BUS_IDLE: if (TimerExpired(&bus.tmr_answer)) { /* Send new request on the bus */ ... TimerSet(&bus.tmr_answer, timeout_answer); bus.state = BUS_WAITING_ANSWER; } break; case BUS_WAITING_ANSWER: if (TimerExpired(&bus.tmr_answer)) { /* No reply */ bus.state = BUS_IDLE; TimerSet(&bus.tmr_answer, 0); } else { if (reply_received() == true) { /* Analyze the reply */ bus.state = BUS_IDLE; TimerSet(&bus.tmr_answer, 0); [*] } } break; } When the problem occurs (the bus blocks), bus.state is BUS_IDLE. The problem is with instruction [*]. That instructions should arm bus.tmr_answer timer such that it expires immediately and a new request is send on the bus (in the future, it will be simple to introduce a delay between the reply and the next request). Sometimes the timer doesn't expire immediately, but after the hw counter roll-over again. Indeed I see the bus blocked for 75ms. We were thinking that _ticks_high hasn't incremented yet in task() when reading hw counter value. But this would have produced an old already expired time, not a future not-expired time. Here the problem is with a wrong time in the *future*. In other words, when the problem occurs, ticks() reads a new corrected and incremented value for _ticks_high, but an old not-rolled hw counter value. Why this? I think it's the usual problem with register syncronization in Atmel SAM devices? Before reading or writing certain registers, you need to check if the peripheral is in syncing. I don't know what this exactly means, but it relates to the presence of different asyncronous clocks. But I use only one reference clock (an external crystal) that is routed to the Cortex-M core and all peripherals... so I thought syncronization wasn't necessary. In this case, it seems syncronization solve my problem, so my ticks() function is now: static inline uint64_t ticks(void) { uint32_t h1 = volatileAccess(_ticks_high); TC0->COUNT32.CTRLBSET.reg = TC_CTRLBSET_CMD(TC_CTRLBSET_CMD_READSYNC_Val); while(TC0->COUNT32.SYNCBUSY) { } uint32_t l1 = TC0->COUNT32.COUNT.reg; uint32_t h2 = volatileAccess(_ticks_high); if (h2 != h1) return ((uint64_t)h2 << 32) + 0; else return ((uint64_t)h1 << 32) + l1; } Datasheet says you need to write the CMD bits of TC0->CTRLB register with a known value (TC_CTRLBSET_CMD_READSYNC_Val), before reading COUNT register. Why? I don't know. TC0->CTRLB is a **Write-Syncronized** register, i.e. the value you are writing will be really wrote after sync time. Maybe the sync loop I added waits for time needed to the CTRLB command to be executed... otherwise the COUNT value you read immediately after could be wrong (IMHO!) After this... I think my code is always affected by the original problem if, as you explained, TC0 interrupt is delayed when I'm reading hw counter in ticks(). I don't have the expertise to understand this possibility happens really, so it's better to find another method. I liked the idea to use a 64-bits counter for ticks that will never roll-over during the entire lifetime of the device.
On 2017-04-27, David Brown <david.brown@hesbynett.no> wrote:
> > Another poster suggested fiddling with optimisation levels, which is > equally odd here.
Not really. I suggested changing the optimisation level because if it works reliably with lower optimisation levels then that strongly implies an issue within the code itself instead of in the hardware. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On 27/04/17 13:51, pozz wrote:
> Il 27/04/2017 12:54, David Brown ha scritto: > >> You are sure you are not calling ticks() from within another interrupt >> function? > > Yes, sure. I am very careful during writing interrupts code, even if it > wasn't sufficient in this case. > > >>> ticks() are called only by TimerSet() and TimerExpired() and those two >>> functions are called only in normal background (not interrupt) code. >>> This means ticks() always runs in a lower priority than TC0 ISR. >>> Moreover, I never disable interrupts in any part of the code. >>> >>> I don't understand what do you mean with "[...] or interrupts take a few >>> cycles to work through the system". Is it possible to read a rolled-over >>> hw counter value (0x00000003) and a not incremented _ticks_high? >> >> There is always a delay between an overflow in the timer, and the actual >> interrupt function being started. Depending on details of the chip, it >> is possible there will be several cpu cycles of the current instruction >> stream executed before the interrupt function is run. If I remember >> rightly, the M3/M4 has a 5 stage pipeline. When the interrupt is >> registered by the core, it effectively means that a "jump to interrupt >> vector" instruction is squeezed into the instruction stream - but these >> current 5 instructions must be completed before the interrupt call takes >> effect. You will also have a cycle or two delay in the NVIC, and >> perhaps a cycle or two delay getting the signal out of the timer block >> (especially if the timer does not run at full core speed). Cortex M >> interrupts are handled quickly - but not immediately. > > Really? So, the core is able to read the register of a peripheral > (TC0->COUNT32.COUNT register, in my case) with a *new* (i.e., rolled) > value, but the ISR hasn't run yet? In other words, a bus access can be > done (TC0 is on the bus) while interrupt request is pending? >
I cannot say for sure that this can happen. But unless I can say for sure that it /cannot/ happen, I prefer to assume the worst is possible.
> If this is the case, it is a mess :-( > > >>> - many times you need an additional "timer running/active" flag >>> >>> The second point is more important. If you want to switch on a LED after >>> 10 minutes when a button is pressed: >>> >>> bool tmr_led_active; >>> Timer32 tmr_led; >>> void button_pressed_callback(void) { >>> TimerSet32(&tmr_led, 10 * 60 * 1000); >>> tmr_led_active = true; >>> } >>> void main_loop(void) { >>> ... >>> if (tmr_led_active && TimerExpired32(&tmr_led)) { >>> switch_on_led(); >>> tmr_led_active = false; >>> } >>> } >>> >>> If you don't check the timer flag, the led will switch on at random >>> times. >> >> So you have to add a flag - big deal. You are not using any more >> memory, your code is smaller, and it is simpler to be sure that >> everything is correct. > > Yes, of course. My point here is that you have to **remember** that the > timers you are using can roll-over at any time in the future, so they > can change from "not expired" to "expired".
True. But perhaps that can be baked into your "Set" and "Expired" functions, or otherwise made "unforgettable". The aim is to simplify the code that /may/ have rare race conditions into something that cannot possibly have such problems - even if it means other code is bigger or less efficient.
> > After using TimerSet32() and after the timer expires, you could expect > it stays "expired", until you arm it again with TimerSet32(). This is > not true, because TimerExpired32() could returns "false" at a certain > time in the future. > > This is not a problem with timers that are repetitive (armed again as > soon as they expire), but with one-shot timers. > > >>> Yes, I will try your ideas. >> >> Good luck - it is not an easy task. Don't forget to let us know the >> source of the problem, and the solution! > > I configured TC0 in 16-bis mode, so now TC0_Handler() is fired every > 75ms. I changed accordingly ticks() to create a 64-bits by shifting > _ticks_high for 16-bits. > > if (h2 != h1) return ((uint64_t)h2 << 16) + 0; > else return ((uint64_t)h1 << 16) + l1; > > I was lucky because I can reproduce the problem more often... and > incredibly the problem is the opposite. > > Remember my state-machine in bus.c: > > switch(bus.state) { > case BUS_IDLE: > if (TimerExpired(&bus.tmr_answer)) { > /* Send new request on the bus */ > ... > TimerSet(&bus.tmr_answer, timeout_answer); > bus.state = BUS_WAITING_ANSWER; > } > break; > > case BUS_WAITING_ANSWER: > if (TimerExpired(&bus.tmr_answer)) { > /* No reply */ > bus.state = BUS_IDLE; > TimerSet(&bus.tmr_answer, 0); > } else { > if (reply_received() == true) { > /* Analyze the reply */ > bus.state = BUS_IDLE; > TimerSet(&bus.tmr_answer, 0); [*] > } > } > break; > } > > When the problem occurs (the bus blocks), bus.state is BUS_IDLE. The > problem is with instruction [*]. That instructions should arm > bus.tmr_answer timer such that it expires immediately and a new request > is send on the bus (in the future, it will be simple to introduce a > delay between the reply and the next request). > > Sometimes the timer doesn't expire immediately, but after the hw counter > roll-over again. Indeed I see the bus blocked for 75ms. > > We were thinking that _ticks_high hasn't incremented yet in task() when > reading hw counter value. But this would have produced an old already > expired time, not a future not-expired time. Here the problem is with a > wrong time in the *future*. > In other words, when the problem occurs, ticks() reads a new corrected > and incremented value for _ticks_high, but an old not-rolled hw counter > value. > > Why this? I think it's the usual problem with register syncronization > in Atmel SAM devices? Before reading or writing certain registers, you > need to check if the peripheral is in syncing. I don't know what this > exactly means, but it relates to the presence of different asyncronous > clocks. But I use only one reference clock (an external crystal) that > is routed to the Cortex-M core and all peripherals... so I thought > syncronization wasn't necessary. > > In this case, it seems syncronization solve my problem, so my ticks() > function is now: > > static inline uint64_t ticks(void) { > uint32_t h1 = volatileAccess(_ticks_high); > TC0->COUNT32.CTRLBSET.reg = > TC_CTRLBSET_CMD(TC_CTRLBSET_CMD_READSYNC_Val); > while(TC0->COUNT32.SYNCBUSY) { > } > uint32_t l1 = TC0->COUNT32.COUNT.reg; > uint32_t h2 = volatileAccess(_ticks_high); > if (h2 != h1) return ((uint64_t)h2 << 32) + 0; > else return ((uint64_t)h1 << 32) + l1; > } > > Datasheet says you need to write the CMD bits of TC0->CTRLB register > with a known value (TC_CTRLBSET_CMD_READSYNC_Val), before reading COUNT > register. Why? I don't know. TC0->CTRLB is a **Write-Syncronized** > register, i.e. the value you are writing will be really wrote after sync > time. Maybe the sync loop I added waits for time needed to the CTRLB > command to be executed... otherwise the COUNT value you read immediately > after could be wrong (IMHO!) > > > After this... I think my code is always affected by the original problem > if, as you explained, TC0 interrupt is delayed when I'm reading hw > counter in ticks(). I don't have the expertise to understand this > possibility happens really, so it's better to find another method. > > I liked the idea to use a 64-bits counter for ticks that will never > roll-over during the entire lifetime of the device.
If you can make your hardware timer function run every millisecond (or whatever accuracy you need), then use this: extern volatile uint64_t tickCounter_; static inline uint64_t ticks(void) { uint64_t a = tickCounter_; while (true) { uint64_t b = tickCounter_; if (a == b) return a; a = b; } } void TC0_Handler(void) { if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) { tickCounter++; TC0->COUNT32.INTFLAG.reg = TC_INTFLAG_OVF; } } If ticks never accesses the timer hardware register, there cannot be a problem with synchronisation. There is no need to use the timer peripherals complicated synchronisation and locking mechanism, nor any concern about interrupt delays. Re-reading the 64-bit value until you have two identical reads is sufficient to ensure that you have a consistent value even if a timer interrupt occurs in the middle of the 64-bit read.
On 27/04/17 13:20, Dimiter_Popoff wrote:
> On 27.4.2017 &#1075;. 12:06, David Brown wrote: >> On 27/04/17 10:02, Dimiter_Popoff wrote: >>> On 27.4.2017 &#1075;. 01:55, pozz wrote: >> >>>> I suspect there's a problem in my low-level driver and sometimes, maybe >>>> near the overflow, the code doesn't work as I expect. >>>> Maybe the TimerSet() function sometimes sets a wrong value to the >>>> uint64_t timer, maybe a tick value that will happen only at the next >>>> overflow of the hw counter. >>>> >>>> Do you see where is the problem? >>>> >>> >>> Hoping that the compiler will know how to fix this is not a very >>> efficient approach. >>> It is just a few lines of assembly, no matter how inconvenient the ARM >>> assembly may be it will still save you weeks (months) of blind trial >>> and error. >>> You just need to _know_ how this is done, typing blindly this or that >>> is unlikely to ever work. >>> >> >> Why would you think it is easier to write this in assembly than C? The >> issue is to be sure that the /algorithm/ is entirely safe regardless of >> any ordering of interrupts, interrupt disables, timer overflows, etc. >> Exactly the same issues are relevant in assembly. > > Perhaps not "easier", just taking a few minutes rather than a few > months.
You don't think you are exaggerating just a /little/ ?
> Doing an lwarx/stwcx. like thing on a timer in a high level language > is simply a waste of someone's time - but then someone might find > it easier to waste time than to write the < 10 lines, keep on > wondering why todays compiler output does not work while yesterdays > did, what is actually happening etc. > The obvious thing to do here is: > > -lock the 64 bit entity (test a flag and set it, if found set > try again), > -read the 64 bit counter lwarx/stwcx.-ish, > -unlock the timer. > - ensure this is done more frequently than the lower 32 bits > overflow. > > And you want to write this in C (or whatever)?! Well, it is your time.
The problem has nothing to do with making an atomic 64 bit read of data - that is /easy/ in C. The problem is the locking and synchronisation system for reading this timer peripheral in this particular microcontroller, which is inconvenient no matter what language is used. The other considerations are synchronisation between updates of the data in the interrupt function, the calling function, and the hardware timer - again, language independent. A lwarx/stwcx can be useful for doing read/write/modify operations on an object - it is not necessary for atomically reading data which could be changed underway. The simplest way to handle that, in C /or/ in assembly, is to read it more than once: volatile uint64_t count; uint64_t readCount(void) { uint64_t a = count; while (true) { uint64_t b = count; if (a == b) return a; a = b; } } Alternatively, using C11, you handle it like this: _Atomic uint64_t a_count; uint64_t readCount(void) { return a_count; } I appreciate that you are very familiar with assembly and may write it faster than C, but the above function did not take me a month to figure out. Using lwarx/stwcx (or ldrex/strex on ARM) will not be shorter or faster, and it will be more complex. Basically, you need to use these to create a lock for the access to the non-atomic type. They are always harder to use than something like C11's atomic access support, regardless of the language. And if you want to program in C, and don't have C11 atomics (or other help from your compiler), you can easily write inline assembly wrappers for those two instructions and write everything else in C.
> >> >> Another poster suggested fiddling with optimisation levels, which is >> equally odd here. There are enough "volatile" accesses here to ensure >> that the order of accesses in the C code will be kept when the code is >> compiled to assembly - writing it in assembly or changing optimisation >> levels will not affect that. > > Well good, if your C compiler behaves in such a predictable way go > ahead. And don't come crying a year later when you discover it did > so indeed - almost :-).
My C compiler(s) behave in a predictable manner - because I know how C works. People who don't know all the fine details can get it wrong, but that applies to any language.
> > The mere fact that I see so _many_ and _lengthy_ posts lately all > related to wrestling the compiler to do what one wants it to do, this > taking often _months_ of peoples time instead of a few minutes > speaks for itself. >
I follow most threads in this newsgroup, but I can't figure out what you are referring to here. There are occasional posts where people make mistakes due to misunderstandings about how C works and how their compiler works - but they are not /that/ common. And by what possible justification could you claim that writing the code in assembly would result in fewer bugs or shorter development times? You might reasonably say that the programmer would not make those particular mistakes - but they could easily make many, many more errors in their assembly code that they would not have made with C.
On 27/04/17 14:03, Simon Clubley wrote:
> On 2017-04-27, David Brown <david.brown@hesbynett.no> wrote: >> >> Another poster suggested fiddling with optimisation levels, which is >> equally odd here. > > Not really. I suggested changing the optimisation level because if it > works reliably with lower optimisation levels then that strongly implies > an issue within the code itself instead of in the hardware. >
As in, "it is worth giving it a try to see if it affects behaviour, and therefore gives a clue to help debugging" ? Yes, I can agree with that. Once the OP has made changes that speed up the error rate to something workable, that kind of thing can be worth trying.

Memfault Beyond the Launch