EmbeddedRelated.com
Forums

Watchdog that should be taken out and shot

Started by Tom Lucas September 13, 2006
"larwe" <zwsdotcom@gmail.com> wrote in message 
news:1158166778.041070.58170@m73g2000cwd.googlegroups.com...
> > Tim Wescott wrote: > >> Do you know of any really good discussions of the care and feeding* >> of >> watchdogs? Where I've seen it done well it involved _all_ of the > > I haven't written anything formally on the topic, but it has been > discussed here many times. One of the techniques I've used is to have > an entry counter in each thread. At regular intervals (in an ISR), > these counters are checked. If they fall within guard bands [which may > have to be pretty wide], the WDT is kicked; otherwise, bite. > > Ganssle has some excellent material on the topic, of course.
Which I have found and it is very good.
Hello Martin,

>> >>>>>2) Write 0x1984 to the watchdog RST register to seed the counter. >>>>>(Goodness knows why they chose that value) >>>> >>>>It's probably the lead developer's dog's birthday written in base 9 and >>>>shown as a hex number. Almost certainly there is an ex post facto >>>>explanation as to why it is the technically sound choice. >>>> >>>> >>>>>Then I periodically write Ox1984 to the RST register to keep everything >>>>>alive. >>>> >>>>Silly question, but "periodically" = inside an ISR? >>>> >>> >>>Oh honestly Lewin, where else would you kick the dog? >>> >> >>Kicking is animal cruelty... >> >> >> >>>Do you know of any really good discussions of the care and feeding* of >>>watchdogs? >> >> >>"Go Natural" works best for our shepherd mix, can be bought at our local >>feed store ;-) > > > OK, a bit late in the thread, but > http://ganssle.com/watchdogs.htm >
Thanks, very nice link. Jack really knows his stuff and I subscribed to his Embedded Muse. One statement in there that I don't really get is this: "Reloading the program counter may not properly reinitialize the CPU&#4294967295;s internals." I'd have thought that properly written code will do one thing first: Set all the CPU internals before doing anything. Then there is the external watchdog where he rightfully criticizes that wiggling it's "I'm alive" line can also be done by code that has run out of control. This begs the question: Are there reasonably priced external WDTs that require a little serial password? If there aren't, why not? I mean, it doesn't have to be a 128bit hacker-proof variety. -- Regards, Joerg http://www.analogconsultants.com
Joerg wrote:

> of control. This begs the question: Are there reasonably priced external > WDTs that require a little serial password? If there aren't, why not? I > mean, it doesn't have to be a 128bit hacker-proof variety.
Or a rolling code. A 5-pin micro from Microchip might fit the bill :)
Joerg wrote:
> Hello Martin, > > > OK, a bit late in the thread, but > > http://ganssle.com/watchdogs.htm > > > > Thanks, very nice link. Jack really knows his stuff and I subscribed to > his Embedded Muse. > > One statement in there that I don't really get is this: "Reloading the > program counter may not properly reinitialize the CPU's internals." I'd > have thought that properly written code will do one thing first: Set all > the CPU internals before doing anything. > > Then there is the external watchdog where he rightfully criticizes that > wiggling it's "I'm alive" line can also be done by code that has run out > of control. This begs the question: Are there reasonably priced external > WDTs that require a little serial password?
I don't remember seeing a password protected one but there are a few that only allow tickles in a specific interval. So if it's tickled at t then if you tickle outside of t + t1 +/- delta it resets. Now your accidental reset sequence has to match the correct frequency not just be faster than x. Robert
Hello Robert,

>> >>>OK, a bit late in the thread, but >>>http://ganssle.com/watchdogs.htm >>> >> >>Thanks, very nice link. Jack really knows his stuff and I subscribed to >>his Embedded Muse. >> >>One statement in there that I don't really get is this: "Reloading the >>program counter may not properly reinitialize the CPU's internals." I'd >>have thought that properly written code will do one thing first: Set all >>the CPU internals before doing anything. >> >>Then there is the external watchdog where he rightfully criticizes that >>wiggling it's "I'm alive" line can also be done by code that has run out >>of control. This begs the question: Are there reasonably priced external >>WDTs that require a little serial password? > > > I don't remember seeing a password protected one but there are a few > that only allow tickles in a specific interval. So if it's tickled at > t then if you tickle outside of t + t1 +/- delta it resets. Now your > accidental reset sequence has to match the correct frequency not just > be faster than x. >
I have seen those as well. However, that may cost you a timer or at least one of the CCRs. -- Regards, Joerg http://www.analogconsultants.com
"Joerg" <notthisjoergsch@removethispacbell.net> wrote in message 
news:kSeOg.1655$7I1.1636@newssvr27.news.prodigy.net...
> Hello Martin, >> OK, a bit late in the thread, but >> http://ganssle.com/watchdogs.htm > > Thanks, very nice link. Jack really knows his stuff and I subscribed to > his Embedded Muse. > > One statement in there that I don't really get is this: "Reloading the > program counter may not properly reinitialize the CPU&#4294967295;s internals." I'd > have thought that properly written code will do one thing first: Set all > the CPU internals before doing anything.
Yes, true. But that doesn't necessary put all the CPU hardware in the same state as a hardware reset. You'd think so, wouldn't you, but it doesn't. If you've done any hardware design, you'll know about excluded states (which are hardware states that normal sequencing won't exit from). Following an EMI glitch (not normal software behaviour, or even software runaway), the state of the CPU is effectively random. If you're really unlucky, you could wind up with an excluded state somewhere on the CPU. (Or more likely a microcontroller - more on-chip subsystems.) I've seen this happen in practice, with code which did indeed reset everything as per the datasheet. The CPU partially worked, but not completely. This wasn't my design, and I was trying to convince the designer to use a hardware watchdog, when this came up. Result: he used a hardware watchdog. Steve http://www.fivetrees.com
Hello Steve,


> "Joerg" <notthisjoergsch@removethispacbell.net> wrote in message > news:kSeOg.1655$7I1.1636@newssvr27.news.prodigy.net... > >>Hello Martin, >> >>>OK, a bit late in the thread, but >>>http://ganssle.com/watchdogs.htm >> >>Thanks, very nice link. Jack really knows his stuff and I subscribed to >>his Embedded Muse. >> >>One statement in there that I don't really get is this: "Reloading the >>program counter may not properly reinitialize the CPU&#4294967295;s internals." I'd >>have thought that properly written code will do one thing first: Set all >>the CPU internals before doing anything. > > > Yes, true. But that doesn't necessary put all the CPU hardware in the same > state as a hardware reset. You'd think so, wouldn't you, but it doesn't. > > If you've done any hardware design, you'll know about excluded states (which > are hardware states that normal sequencing won't exit from). Following an > EMI glitch (not normal software behaviour, or even software runaway), the > state of the CPU is effectively random. If you're really unlucky, you could > wind up with an excluded state somewhere on the CPU. (Or more likely a > microcontroller - more on-chip subsystems.) >
Yes, it can happen. But it shouldn't. Unstable states are often a sign of a not too careful hardware or chip design. I mostly design around discretes and logic. There I pay meticulous attention to undefined states. IOW that there is always some way out. Mostly this is easy, for example when a register can set five different parameters I make sure that the unused states 5 through 7 map to something meaningful and don't just point to lalaland. If in doubt always have the watchdog do a full HW reset.
> I've seen this happen in practice, with code which did indeed reset > everything as per the datasheet. The CPU partially worked, but not > completely. This wasn't my design, and I was trying to convince the designer > to use a hardware watchdog, when this came up. Result: he used a hardware > watchdog. >
That's the beauty of design reviews. Even if it's just two people in an informal review. All of us can make mistakes. -- Regards, Joerg http://www.analogconsultants.com
"Tom Lucas" <news@REMOVE_auto_THIS_flame_TO_REPLY.clara.co.uk> wrote in 
message news:1158232347.43487.0@despina.uk.clara.net...
> "Tom Lucas" <news@REMOVE_auto_THIS_flame_TO_REPLY.clara.co.uk> wrote in > message news:1158161579.6562.0@proxy00.news.clara.net... >> I've been trying to implement the watchdog timer on my Sharp79524 and it >> appears that the mutt is sleeping on the job! >> >> I believe I have it set up to trigger after about 3s of inactivity but it >> doesn't seem to reset the system at all. > > Problem Solved! > > It turns out that the address of the watchdog register is actually > 0xFFFE3000 instead of the 0xFFFC3000 that is shown in the user guide. This > is a case where RTFM has caused the problem! > > D'oh! >
Oooh, we hate when that happens. Appears to be wrong only on the one page. In the register descriptions it is correct. Scott
On Thu, 14 Sep 2006 13:47:17 -0600, Not Really Me wrote:

> > "Tom Lucas" <news@REMOVE_auto_THIS_flame_TO_REPLY.clara.co.uk> wrote in > message news:1158232347.43487.0@despina.uk.clara.net... >> >> It turns out that the address of the watchdog register is actually >> 0xFFFE3000 instead of the 0xFFFC3000 that is shown in the user guide. This >> is a case where RTFM has caused the problem! >> >> D'oh! >> > > Oooh, we hate when that happens. > > Appears to be wrong only on the one page. In the register descriptions it > is correct.
Ah Ha! Tom, you didn't RTWFM! W->Whole ~Dave~
"Dave" <dave@comteck.com> wrote in message 
news:pan.2006.09.15.01.23.37.909984@comteck.com...
> On Thu, 14 Sep 2006 13:47:17 -0600, Not Really Me wrote: > >> >> "Tom Lucas" <news@REMOVE_auto_THIS_flame_TO_REPLY.clara.co.uk> wrote >> in >> message news:1158232347.43487.0@despina.uk.clara.net... >>> >>> It turns out that the address of the watchdog register is actually >>> 0xFFFE3000 instead of the 0xFFFC3000 that is shown in the user >>> guide. This >>> is a case where RTFM has caused the problem! >>> >>> D'oh! >>> >> >> Oooh, we hate when that happens. >> >> Appears to be wrong only on the one page. In the register >> descriptions it >> is correct. > > Ah Ha! Tom, you didn't RTWFM! W->Whole
The dirty buggers - I reckon they did that on purpose. Next time I will assume that the manual was written by an evil sadist hell-bent on derailing my project and I shall be better prepared!