EmbeddedRelated.com
Forums

Testing watchdog

Started by pozz September 27, 2020
I usually enable watchdog during boot init code. I usually use the 
internal watchdog that is available in the MCU I'm using. I know an 
external watchdog can be more secure, but internal watchdog is usually 
better than nothing.

Some MCUs can be programmed to enable watchdog immediately after reset 
and stays enabled forever. Other times I enable watchdog timer 
immediately after some initialization code.

During my previous thread "Power On Self Test", Richard Damon said:

 > "Yes, testing watchdogs is tricky."

Tricky? Why?

What I usually do to test what happens if the watchdog timer expires is 
adding a while(1); loop in a certain part of the code, for example when 
a button is pressed or a command is received from a serial line. It is 
very simple to see the watchdog action (the reset) when I force the 
execution flow to pass through the added test code.

Of course, after testing I remove the while(1); loop.

In this way I'm sure the watchdog is enabled and configured correctly.

Do you think it is not enough?

PS: I know there are some risks the watchdog is disabled by an 
unpredictable and bugged code, indeed many MCUs pretend a precise 
sequence of operations to disable the watchdog.
In this post, I'm not refering to these issues, but only to the 
behaviour of the watchdog when it expires. I think it's not so difficult 
to test it.
On 27/09/2020 17:56, pozz wrote:
> I usually enable watchdog during boot init code. I usually use the > internal watchdog that is available in the MCU I'm using. I know an > external watchdog can be more secure, but internal watchdog is usually > better than nothing. > > Some MCUs can be programmed to enable watchdog immediately after reset > and stays enabled forever. Other times I enable watchdog timer > immediately after some initialization code. > > During my previous thread "Power On Self Test", Richard Damon said: > >> "Yes, testing watchdogs is tricky." > > Tricky? Why? >
Certainly it is not hard to test a watchdog trigger by simply not kicking it for a while. But think about what the watchdog is for - what are you actually trying to do with it? Are you trying to find where you have an infinite loop in your program, that you have failed to find with other debugging? The watchdog /might/ help as a last resort, but you typically have no idea what has gone wrong. The "good" reason for a watchdog is to handle unexpected hardware issues that have lead to a broken system - things like out-of-spec electrical interference, cosmic ray bit-flips, power glitches, etc. How do you test that the watchdog does its job in such situations? If these were faults that were easily provoked, you'd already have hardware solutions in place to deal with them.
On Sun, 27 Sep 2020 18:38:13 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>On 27/09/2020 17:56, pozz wrote: >> I usually enable watchdog during boot init code. I usually use the >> internal watchdog that is available in the MCU I'm using. I know an >> external watchdog can be more secure, but internal watchdog is usually >> better than nothing. >> >> Some MCUs can be programmed to enable watchdog immediately after reset >> and stays enabled forever. Other times I enable watchdog timer >> immediately after some initialization code. >> >> During my previous thread "Power On Self Test", Richard Damon said: >> >>> "Yes, testing watchdogs is tricky." >> >> Tricky? Why? >> > >Certainly it is not hard to test a watchdog trigger by simply not >kicking it for a while. > >But think about what the watchdog is for - what are you actually trying >to do with it? Are you trying to find where you have an infinite loop >in your program, that you have failed to find with other debugging? The >watchdog /might/ help as a last resort, but you typically have no idea >what has gone wrong. > >The "good" reason for a watchdog is to handle unexpected hardware issues >that have lead to a broken system - things like out-of-spec electrical >interference, cosmic ray bit-flips, power glitches, etc. How do you >test that the watchdog does its job in such situations? If these were >faults that were easily provoked, you'd already have hardware solutions >in place to deal with them.
You could enable a while loop that just goes when a pin is pulled high or low and then the processor would reboot after you pull that pin one way or the other. I always try to have the code go through at least two or more areas and set a flag so that more than one piece of code has to be ran in order to feed the watchdog. Like one in main and one in a timer interrupt (at least)
Il 27/09/2020 18:38, David Brown ha scritto:
> On 27/09/2020 17:56, pozz wrote: >> I usually enable watchdog during boot init code. I usually use the >> internal watchdog that is available in the MCU I'm using. I know an >> external watchdog can be more secure, but internal watchdog is usually >> better than nothing. >> >> Some MCUs can be programmed to enable watchdog immediately after reset >> and stays enabled forever. Other times I enable watchdog timer >> immediately after some initialization code. >> >> During my previous thread "Power On Self Test", Richard Damon said: >> >>> "Yes, testing watchdogs is tricky." >> >> Tricky? Why? >> > > Certainly it is not hard to test a watchdog trigger by simply not > kicking it for a while. > > But think about what the watchdog is for - what are you actually trying > to do with it? Are you trying to find where you have an infinite loop > in your program, that you have failed to find with other debugging? The > watchdog /might/ help as a last resort, but you typically have no idea > what has gone wrong.
Yes, you're right, but I don't think watchdog is a solution for every issues that could happen in the field. It is a solution for big problems with the execution flow of the program, because normal flow feeds the watchdog, unexepected flow doesn't.
> The "good" reason for a watchdog is to handle unexpected hardware issues > that have lead to a broken system - things like out-of-spec electrical > interference, cosmic ray bit-flips, power glitches, etc. How do you > test that the watchdog does its job in such situations? If these were > faults that were easily provoked, you'd already have hardware solutions > in place to deal with them.
I didn't think watchdog was used for hardware issues, but software bugs. For example, when you use a wrongly initialized function pointer and jump to a wrong address in the code... or similar things.
On Monday, September 28, 2020 at 2:38:40 AM UTC-4, pozz wrote:
> Il 27/09/2020 18:38, David Brown ha scritto: > > On 27/09/2020 17:56, pozz wrote: > >> I usually enable watchdog during boot init code. I usually use the > >> internal watchdog that is available in the MCU I'm using. I know an > >> external watchdog can be more secure, but internal watchdog is usually > >> better than nothing. > >> > >> Some MCUs can be programmed to enable watchdog immediately after reset > >> and stays enabled forever. Other times I enable watchdog timer > >> immediately after some initialization code. > >> > >> During my previous thread "Power On Self Test", Richard Damon said: > >> > >>> "Yes, testing watchdogs is tricky." > >> > >> Tricky? Why? > >> > > > > Certainly it is not hard to test a watchdog trigger by simply not > > kicking it for a while. > > > > But think about what the watchdog is for - what are you actually trying > > to do with it? Are you trying to find where you have an infinite loop > > in your program, that you have failed to find with other debugging? The > > watchdog /might/ help as a last resort, but you typically have no idea > > what has gone wrong. > > Yes, you're right, but I don't think watchdog is a solution for every > issues that could happen in the field. It is a solution for big problems > with the execution flow of the program, because normal flow feeds the > watchdog, unexepected flow doesn't. > > > > The "good" reason for a watchdog is to handle unexpected hardware issues > > that have lead to a broken system - things like out-of-spec electrical > > interference, cosmic ray bit-flips, power glitches, etc. How do you > > test that the watchdog does its job in such situations? If these were > > faults that were easily provoked, you'd already have hardware solutions > > in place to deal with them. > > I didn't think watchdog was used for hardware issues, but software bugs. > For example, when you use a wrongly initialized function pointer and > jump to a wrong address in the code... or similar things.
I recall a software team who thought it was a good idea to use a timer driven interrupt routine to tickle the watchdog. -- Rick C. - Get 1,000 miles of free Supercharging - Tesla referral code - https://ts.la/richard11209
On 28/09/2020 08:38, pozz wrote:
> Il 27/09/2020 18:38, David Brown ha scritto: >> On 27/09/2020 17:56, pozz wrote: >>> I usually enable watchdog during boot init code. I usually use the >>> internal watchdog that is available in the MCU I'm using. I know an >>> external watchdog can be more secure, but internal watchdog is usually >>> better than nothing. >>> >>> Some MCUs can be programmed to enable watchdog immediately after reset >>> and stays enabled forever. Other times I enable watchdog timer >>> immediately after some initialization code. >>> >>> During my previous thread "Power On Self Test", Richard Damon said: >>> >>>> "Yes, testing watchdogs is tricky." >>> >>> Tricky? Why? >>> >> >> Certainly it is not hard to test a watchdog trigger by simply not >> kicking it for a while. >> >> But think about what the watchdog is for - what are you actually trying >> to do with it?&#4294967295; Are you trying to find where you have an infinite loop >> in your program, that you have failed to find with other debugging?&#4294967295; The >> watchdog /might/ help as a last resort, but you typically have no idea >> what has gone wrong. > > Yes, you're right, but I don't think watchdog is a solution for every > issues that could happen in the field. It is a solution for big problems > with the execution flow of the program, because normal flow feeds the > watchdog, unexepected flow doesn't. > > >> The "good" reason for a watchdog is to handle unexpected hardware issues >> that have lead to a broken system - things like out-of-spec electrical >> interference, cosmic ray bit-flips, power glitches, etc.&#4294967295; How do you >> test that the watchdog does its job in such situations?&#4294967295; If these were >> faults that were easily provoked, you'd already have hardware solutions >> in place to deal with them. > > I didn't think watchdog was used for hardware issues, but software bugs. > For example, when you use a wrongly initialized function pointer and > jump to a wrong address in the code... or similar things. >
You are not supposed to use a watchdog for that - you are supposed to use good software development techniques and comprehensive testing. (Yes, I know the real world is not perfect.) In embedded development, treat function pointers like dynamic memory - something you /really/ want to avoid unless there is no other option, because it is often simple to get wrong, it is far more difficult to analyse than static alternatives, and it is also less efficient.
On 28/09/2020 09:17, Rick C wrote:
> On Monday, September 28, 2020 at 2:38:40 AM UTC-4, pozz wrote: >> Il 27/09/2020 18:38, David Brown ha scritto: >>> On 27/09/2020 17:56, pozz wrote: >>>> I usually enable watchdog during boot init code. I usually use >>>> the internal watchdog that is available in the MCU I'm using. I >>>> know an external watchdog can be more secure, but internal >>>> watchdog is usually better than nothing. >>>> >>>> Some MCUs can be programmed to enable watchdog immediately >>>> after reset and stays enabled forever. Other times I enable >>>> watchdog timer immediately after some initialization code. >>>> >>>> During my previous thread "Power On Self Test", Richard Damon >>>> said: >>>> >>>>> "Yes, testing watchdogs is tricky." >>>> >>>> Tricky? Why? >>>> >>> >>> Certainly it is not hard to test a watchdog trigger by simply >>> not kicking it for a while. >>> >>> But think about what the watchdog is for - what are you actually >>> trying to do with it? Are you trying to find where you have an >>> infinite loop in your program, that you have failed to find with >>> other debugging? The watchdog /might/ help as a last resort, but >>> you typically have no idea what has gone wrong. >> >> Yes, you're right, but I don't think watchdog is a solution for >> every issues that could happen in the field. It is a solution for >> big problems with the execution flow of the program, because normal >> flow feeds the watchdog, unexepected flow doesn't. >> >> >>> The "good" reason for a watchdog is to handle unexpected hardware >>> issues that have lead to a broken system - things like >>> out-of-spec electrical interference, cosmic ray bit-flips, power >>> glitches, etc. How do you test that the watchdog does its job in >>> such situations? If these were faults that were easily provoked, >>> you'd already have hardware solutions in place to deal with >>> them. >> >> I didn't think watchdog was used for hardware issues, but software >> bugs. For example, when you use a wrongly initialized function >> pointer and jump to a wrong address in the code... or similar >> things. > > I recall a software team who thought it was a good idea to use a > timer driven interrupt routine to tickle the watchdog. >
That is fine - /if/ the kicking (I kick my watchdogs, rather than tickle them!) is conditional on other checks. For example, each other task in the system sets a boolean flag when it runs its loop, and the watchdog check in the timer interrupt checks that all flags are on before kicking the watchdog and resetting all the flags. It's a standard way to get the effect of having multiple watchdogs.
Il 28/09/2020 11:26, David Brown ha scritto:
> On 28/09/2020 08:38, pozz wrote: >> Il 27/09/2020 18:38, David Brown ha scritto: >>> On 27/09/2020 17:56, pozz wrote: >>>> I usually enable watchdog during boot init code. I usually use the >>>> internal watchdog that is available in the MCU I'm using. I know an >>>> external watchdog can be more secure, but internal watchdog is usually >>>> better than nothing. >>>> >>>> Some MCUs can be programmed to enable watchdog immediately after reset >>>> and stays enabled forever. Other times I enable watchdog timer >>>> immediately after some initialization code. >>>> >>>> During my previous thread "Power On Self Test", Richard Damon said: >>>> >>>>> "Yes, testing watchdogs is tricky." >>>> >>>> Tricky? Why? >>>> >>> >>> Certainly it is not hard to test a watchdog trigger by simply not >>> kicking it for a while. >>> >>> But think about what the watchdog is for - what are you actually trying >>> to do with it?&#4294967295; Are you trying to find where you have an infinite loop >>> in your program, that you have failed to find with other debugging?&#4294967295; The >>> watchdog /might/ help as a last resort, but you typically have no idea >>> what has gone wrong. >> >> Yes, you're right, but I don't think watchdog is a solution for every >> issues that could happen in the field. It is a solution for big problems >> with the execution flow of the program, because normal flow feeds the >> watchdog, unexepected flow doesn't. >> >> >>> The "good" reason for a watchdog is to handle unexpected hardware issues >>> that have lead to a broken system - things like out-of-spec electrical >>> interference, cosmic ray bit-flips, power glitches, etc.&#4294967295; How do you >>> test that the watchdog does its job in such situations?&#4294967295; If these were >>> faults that were easily provoked, you'd already have hardware solutions >>> in place to deal with them. >> >> I didn't think watchdog was used for hardware issues, but software bugs. >> For example, when you use a wrongly initialized function pointer and >> jump to a wrong address in the code... or similar things. >> > > You are not supposed to use a watchdog for that - you are supposed to > use good software development techniques and comprehensive testing. > (Yes, I know the real world is not perfect.)
Yes I know, but Good Software Development Techniques aren't perfect, so the watchdog is one of the last chance to put the system in a safe state (restarting in my case). I admit it can be useful for hardware issues too. <ot>
> In embedded development, treat function pointers like dynamic memory - > something you /really/ want to avoid unless there is no other option, > because it is often simple to get wrong,
Why? Poor code (responsability of the developer) or and intrinsic characteristics of the function pointers? Even null-terminated strings can be risky if you process a string without the null char at the end. Your "Good Software Development Techniques" should help to avoid such errors. I think function pointers are a good solution in many cases. For example, I don't like a long if/else: if (me->type == LAMP1) lamp1_func(); else if (me->type == LAMP2) lamp2_func(); else if (me->type == LAMP3) lamp3_func(); I prefer to put a function pointer in the struct and call it directly: me->func(); This helps to mimic OOP in C too.
> it is far more difficult to > analyse than static alternatives,
This is true.
> and it is also less efficient.
Do you think my example above is less efficient with function pointer?
On 28/09/2020 12:12, pozz wrote:
> Il 28/09/2020 11:26, David Brown ha scritto: >> On 28/09/2020 08:38, pozz wrote: >>> Il 27/09/2020 18:38, David Brown ha scritto: >>>> On 27/09/2020 17:56, pozz wrote: >>>>> I usually enable watchdog during boot init code. I usually use the >>>>> internal watchdog that is available in the MCU I'm using. I know an >>>>> external watchdog can be more secure, but internal watchdog is usually >>>>> better than nothing. >>>>> >>>>> Some MCUs can be programmed to enable watchdog immediately after reset >>>>> and stays enabled forever. Other times I enable watchdog timer >>>>> immediately after some initialization code. >>>>> >>>>> During my previous thread "Power On Self Test", Richard Damon said: >>>>> >>>>>> "Yes, testing watchdogs is tricky." >>>>> >>>>> Tricky? Why? >>>>> >>>> >>>> Certainly it is not hard to test a watchdog trigger by simply not >>>> kicking it for a while. >>>> >>>> But think about what the watchdog is for - what are you actually trying >>>> to do with it?&#4294967295; Are you trying to find where you have an infinite loop >>>> in your program, that you have failed to find with other debugging?&#4294967295; >>>> The >>>> watchdog /might/ help as a last resort, but you typically have no idea >>>> what has gone wrong. >>> >>> Yes, you're right, but I don't think watchdog is a solution for every >>> issues that could happen in the field. It is a solution for big problems >>> with the execution flow of the program, because normal flow feeds the >>> watchdog, unexepected flow doesn't. >>> >>> >>>> The "good" reason for a watchdog is to handle unexpected hardware >>>> issues >>>> that have lead to a broken system - things like out-of-spec electrical >>>> interference, cosmic ray bit-flips, power glitches, etc.&#4294967295; How do you >>>> test that the watchdog does its job in such situations?&#4294967295; If these were >>>> faults that were easily provoked, you'd already have hardware solutions >>>> in place to deal with them. >>> >>> I didn't think watchdog was used for hardware issues, but software bugs. >>> For example, when you use a wrongly initialized function pointer and >>> jump to a wrong address in the code... or similar things. >>> >> >> You are not supposed to use a watchdog for that - you are supposed to >> use good software development techniques and comprehensive testing. >> (Yes, I know the real world is not perfect.) > > Yes I know, but Good Software Development Techniques aren't perfect, so > the watchdog is one of the last chance to put the system in a safe state > (restarting in my case). > > I admit it can be useful for hardware issues too. > > > <ot> > >> In embedded development, treat function pointers like dynamic memory - >> something you /really/ want to avoid unless there is no other option, >> because it is often simple to get wrong, > > Why? Poor code (responsability of the developer) or and intrinsic > characteristics of the function pointers?
As I said - they are easy to get wrong, and difficult to analyse. They make a mockery of call chain analysis, stack size checking, and other things that depend on a static code flow. They make it difficult or impossible to follow code flows backwards, or simply to answer the question "where can this code be called from?". Sometimes function pointers /are/ the best way to organise code. But they should be used with care and consideration. (I even recommend minimising the use of data pointers, for similar reasons, though obviously data pointers have more essential uses.)
> > Even null-terminated strings can be risky if you process a string > without the null char at the end. Your "Good Software Development > Techniques" should help to avoid such errors. >
Making mistakes in programming is certainly easy.
> I think function pointers are a good solution in many cases. For > example, I don't like a long if/else: > > &#4294967295; if (me->type == LAMP1) lamp1_func(); > &#4294967295; else if (me->type == LAMP2) lamp2_func(); > &#4294967295; else if (me->type == LAMP3) lamp3_func(); > > I prefer to put a function pointer in the struct and call it directly: > > &#4294967295; me->func();
That, to me, is a very bad design choice. typedef enum { lampType1, lampType2, lampType3 } lampTypes; Whatever the "me" struct is, have its "type" field (after a name change) of type "lampTypes". Then you have: switch (me->lamp_type) { case lampType1 : lamp1_func(); break; case lampType2 : lamp2_func(); break; case lampType3 : lamp3_func(); break; } This way you have a clear structure in the call graphs - you know exactly what can be called, from where, by what. You have flexibility - the lampX_func functions don't have to have the same signature. You have better optimisation, as the compiler can combine parts of these (if they are visible or you are using LTO). You can't get bad or uninitialised function pointers. You can't miss out on anything, because your compiler or third-party static analysis tool will tell you if an enumeration is missed in the switch.
> > This helps to mimic OOP in C too.
That is not a good thing. If you want to use OOP, use C++. (C++ compilers and analysis tools know far more about virtual functions than about unrestricted function pointers.)
> >> it is far more difficult to >> analyse than static alternatives, > > This is true. > >> and it is also less efficient. > > Do you think my example above is less efficient with function pointer?
It is not unlikely, but it depends on the rest of the source code and organisation. If the source of these lampX_func functions is known to the compiler when it generates the switch statement, and they are declared "static" (or if you are using LTO), then yes, the switch solution will probably be more efficient. But efficiency arguments are secondary to making code easier to write correctly, harder to write incorrectly (these are not the same thing), easier to analyse, and easier to debug.
On Monday, September 28, 2020 at 5:29:29 AM UTC-4, David Brown wrote:
> On 28/09/2020 09:17, Rick C wrote: > > On Monday, September 28, 2020 at 2:38:40 AM UTC-4, pozz wrote: > >> Il 27/09/2020 18:38, David Brown ha scritto: > >>> On 27/09/2020 17:56, pozz wrote: > >>>> I usually enable watchdog during boot init code. I usually use > >>>> the internal watchdog that is available in the MCU I'm using. I > >>>> know an external watchdog can be more secure, but internal > >>>> watchdog is usually better than nothing. > >>>> > >>>> Some MCUs can be programmed to enable watchdog immediately > >>>> after reset and stays enabled forever. Other times I enable > >>>> watchdog timer immediately after some initialization code. > >>>> > >>>> During my previous thread "Power On Self Test", Richard Damon > >>>> said: > >>>> > >>>>> "Yes, testing watchdogs is tricky." > >>>> > >>>> Tricky? Why? > >>>> > >>> > >>> Certainly it is not hard to test a watchdog trigger by simply > >>> not kicking it for a while. > >>> > >>> But think about what the watchdog is for - what are you actually > >>> trying to do with it? Are you trying to find where you have an > >>> infinite loop in your program, that you have failed to find with > >>> other debugging? The watchdog /might/ help as a last resort, but > >>> you typically have no idea what has gone wrong. > >> > >> Yes, you're right, but I don't think watchdog is a solution for > >> every issues that could happen in the field. It is a solution for > >> big problems with the execution flow of the program, because normal > >> flow feeds the watchdog, unexepected flow doesn't. > >> > >> > >>> The "good" reason for a watchdog is to handle unexpected hardware > >>> issues that have lead to a broken system - things like > >>> out-of-spec electrical interference, cosmic ray bit-flips, power > >>> glitches, etc. How do you test that the watchdog does its job in > >>> such situations? If these were faults that were easily provoked, > >>> you'd already have hardware solutions in place to deal with > >>> them. > >> > >> I didn't think watchdog was used for hardware issues, but software > >> bugs. For example, when you use a wrongly initialized function > >> pointer and jump to a wrong address in the code... or similar > >> things. > > > > I recall a software team who thought it was a good idea to use a > > timer driven interrupt routine to tickle the watchdog. > > > > That is fine - /if/ the kicking (I kick my watchdogs, rather than tickle > them!) is conditional on other checks. For example, each other task in > the system sets a boolean flag when it runs its loop, and the watchdog > check in the timer interrupt checks that all flags are on before kicking > the watchdog and resetting all the flags. It's a standard way to get > the effect of having multiple watchdogs.
Yes, but they weren't doing that. Also, you need to vet the watchdog software very carefully. A "sometimes" watchdog is about the same as none. -- Rick C. -+ Get 1,000 miles of free Supercharging -+ Tesla referral code - https://ts.la/richard11209