Reply by brendanmurphy37 December 13, 20052005-12-13

Robert,

What you suggest sounds like a good approach: the key thing that it
achieves is to get as much coverage as possible and minimise the risk
of false feeds to the watchdog. The challenge, as always, is to
balance this against complexity and time to implement.

A couple of other key ideas I'd add to the mix:

- Design the system as if there's no watchdog: if you don't do this,
you can get into the (lazy) mode of always saying "oh, the watchdog
will sort that out.....". This is one reason I tend to encourage
implementing the watchdog as the last feature, and not one that the
system depends on to do its basic function. That is, get your system
fully functional and reliable, and then add in a watchdog (after
fully testing it of course).

- I'd strongly agree with Jack Ganssle's paper that to be really
effective, a watchdog must reset the processor and all its
peripherals. I haven't come up against it on the LPC2000 series, but
other MCUs can get some of their peripherals into various locked
states that can only be removed by an external MCU reset (i.e. the
CPU can't clear it by resetting the device control registers). In
other words, a simple "jump to zero", and re-initialise peripherals
isn't guaranteed to work. It's this kind of stuff you really only
learn from experience (and forums like this): they don't tend to
mention it in device data sheets for some reason.

Brendan

--- In lpc2000@lpc2..., Robert Adsett <subscriptions@a...>
wrote:
>
> At 02:57 PM 12/12/05 +0000, brendanmurphy37 wrote:
> >Jack Ganssle's paper gives one way out of this (a supervisory task
> >monitoring the health of all other tasks).
>
> I first started using that technique about 10 years ago. My usual
way is
> to have the watchdog supervisor in a periodic interrupt (sometimes
the main
> clock tick) watching a number of countdown watchdog timers and
flags, one
> of each for each task or process being monitored. Each cycle of
the
> supervisor decrements the countdown timers and if none of them have
overrun
> feeds the watchdog. As soon as one is no longer fed the watchdog
no longer
> gets fed.
>
> As additional protection the task countdown timers are hamming
protected so
> overwrites are less likely to give valid results. The flags
usually are
> required to have some sort of sequence or specific value (ie just
setting
> them to any old value is not sufficient). As well the task timers
are not
> regarded as active until they have first been fed but they cannot
be turned
> off.
>
> They can be made fancier from there but that is sufficient to put
watchdogs
> in as many periodic tasks and interrupts as you can afford the
overhead
> for. In my systems that usually only amounts to a small handful
that you
> want to watch anyway.
>
> Robert
>
> " 'Freedom' has no meaning of itself. There are always
restrictions, be
> they legal, genetic, or physical. If you don't believe me, try to
chew a
> radio signal. " -- Kelvin Throop, III
> http://www.aeolusdevelopment.com/
>


An Engineer's Guide to the LPC2100 Series

Reply by Robert Adsett December 13, 20052005-12-13
At 02:57 PM 12/12/05 +0000, brendanmurphy37 wrote:
>Jack Ganssle's paper gives one way out of this (a supervisory task
>monitoring the health of all other tasks).

I first started using that technique about 10 years ago. My usual way is
to have the watchdog supervisor in a periodic interrupt (sometimes the main
clock tick) watching a number of countdown watchdog timers and flags, one
of each for each task or process being monitored. Each cycle of the
supervisor decrements the countdown timers and if none of them have overrun
feeds the watchdog. As soon as one is no longer fed the watchdog no longer
gets fed.

As additional protection the task countdown timers are hamming protected so
overwrites are less likely to give valid results. The flags usually are
required to have some sort of sequence or specific value (ie just setting
them to any old value is not sufficient). As well the task timers are not
regarded as active until they have first been fed but they cannot be turned
off.

They can be made fancier from there but that is sufficient to put watchdogs
in as many periodic tasks and interrupts as you can afford the overhead
for. In my systems that usually only amounts to a small handful that you
want to watch anyway.

Robert

" 'Freedom' has no meaning of itself. There are always restrictions, be
they legal, genetic, or physical. If you don't believe me, try to chew a
radio signal. " -- Kelvin Throop, III
http://www.aeolusdevelopment.com/


Reply by Bruce Paterson December 12, 20052005-12-12
> Jack Ganssle's paper gives one way out of this (a supervisory
> task monitoring the health of all other tasks). There are
> alternatives.

I have done something like this in the past for a multi-tasking system.

The supervisory task was responsible for feeding the hardware watchdog.
It was one of the first tasks started up in the system. As other tasks
are started, they each "register" with the supervisory task with a
requested watchdog time. This means different tasks can register at
different software pat rates (how often they promise to keep telling the
supervisory task they're all ok), allowing for slow low priority tasks
or tasks that know they can take longer times to get around their
"loops". Though it could be considered a slight hole, I also allowed
tasks to adjust their pat time dynamically, or de-register because they
were just about to close. The advantages outweighed the risk in my
instance, allowing coverage of all tasks, whether transitory or not.
(Perhaps when a task first registers, it could specify whether it can
later deregister or not, makeing that hole a little more secure.)

I have no idea if this sort of thing has been documented before; it was
just my own design that rose from a requirement. It's a little bit
"INIT" like, but prompting a full system restart on failure rather than
a single task restart.

Cheers,
Bruce



Reply by Ken Wada December 12, 20052005-12-12
>How I solved it:
>
>I have replaced all dog-feeds with
>
> mycpsr = disableIRQ();
> WDFEED = 0xAA; WDFEED = 0x55;
> restoreIRQ( mycpsr );
>
>where disableIRQ and restoreIRQ is from R O Software
Precisely my point. It appears as if this is an extra undocumented
feature.

Ken Wada

--- In lpc2000@lpc2..., Tom Walsh <tom@o...> wrote:
>
> Ake Hedman, eurosource wrote:
>
> >OK Guys,
> >
> >I can confirm that the WD problem is solved in my case. Thanks a
lot
> >all! (Description of solution below.)
> >
> >I think this definitely should be mentioned in the err data sheets
for
> >the affected processors (or clearly in the UM) as the resulting
> >symptoms are such that the problem is very hard to debug..
> >
> >Cheers
> >/Ake
> >
> >
> >How I solved it:
> >
> >I have replaced all dog-feeds with
> >
> > mycpsr = disableIRQ();
> > WDFEED = 0xAA; WDFEED = 0x55;
> > restoreIRQ( mycpsr );
> >
> >where disableIRQ and restoreIRQ is from R O Software
> >
> >
> >
>
> I've been following this thread with interest as I expected to use
the
> watchdog in my application. The old design used a MAX690 where you
> would toggle the input to keep the watchdog happy. This worked well
for
> us as the foreground software would set the pin high and the
interrupt
> routines would reset it low. This ensured that both sections of the
> program were operating okay. Foreground did non-time-critical tasks
> while the background serviced realtime critical tasks &
communications
> over RS485.
>
> I had expected to do something similar with the LPC2000 parts. Feed
the
> watchdog with 0xaa from the interrupt layer and then feed it with
0x55
> from the foreground. It was expected that even if the watchdog
received
> something like this, it would still be fine:
>
> 0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 .... > What value is a watchdog that requires you to carefully feed it, it
> sounds as if the watchdog is very fragile? I noticed that Philips
> Applications people studiously stayed out of this thread about the
> watchdog! I suspect that the watchdog is severly broken and they
did
> not want to comment on it? > Regards,
>
> TomW
>
> --
> Tom Walsh - WN3L - Embedded Systems Consultant
> http://openhardware.net, http://cyberiansoftware.com
> "Windows? No thanks, I have work to do..."
> ----------------
>




Reply by brendanmurphy37 December 12, 20052005-12-12

Robert,

Thanks for the reference to the watchdog paper - I hadn't seen it
before, and it's certainly very good i/p to watchdog design.

I'd agree with the argument that to be really robust, you need to
consider more than just an on-board watchdog with something giving it
a feed now and then.

Tom: putting together a system where the background does half the
watchdig task, and interrupts the other is a good idea in principle.
However, it does need careful thought as to the detail: I have seen
such a scheme fail: the watchdog needed to toggle a pin to be fed.
The main (system) loop put the pin one way and a timer interrupt put
it another. The problem was the system could get into a state (after
a very heavy burst of RF noise) where the watchdog was being fed, but
the system was effectively dead as not much else was running.

Jack Ganssle's paper gives one way out of this (a supervisory task
monitoring the health of all other tasks). There are alternatives.
For example, we have a watchdog message passed from one active task
to the other (if it's in a happy state): if it reaches the watchdog
task, the watchdog gets set (from one and only one location that's
not in a loop). A new message is then generated by the watchdog task
with a one second delay. In other words, for the watchdog to be fed,
all the system's tasks/components have to be in a "happy" state, task
scheduling, timers etc. all have to be working. It's really down to
balance between what gives good coverage to complexity of
implementation etc.

For what it's worth, my own view is:

- the LPC2000 watchdog is fine for an on-board watchdog (with
corrected documentation!)

- to detect all failures, you have to go to an off-chip watchdog (or
at least one that doesn't use ANY shared component - in particular
the clock)

- you need to put some thought into the software to drive it: simple
minded implementations will give better than nothing, but can be
greatly imrpoved on

- try not to feed a watchdog from a software loop: there's a chance a
wayword processor will end up in that loop

- the AA, 55 (or similar) feed sequence is better than the
alternative of not having one (but not by much!)

Brendan

--- In lpc2000@lpc2..., Robert Adsett <subscriptions@a...>
wrote:
>
> At 01:09 AM 12/11/05 -0500, Tom Walsh wrote:
> >I've been following this thread with interest as I expected to use
the
> >watchdog in my application. The old design used a MAX690 where you
> >would toggle the input to keep the watchdog happy. This worked
well for
> >us as the foreground software would set the pin high and the
interrupt
> >routines would reset it low. This ensured that both sections of
the
> >program were operating okay. Foreground did non-time-critical
tasks
> >while the background serviced realtime critical tasks &
communications
> >over RS485.
> >
> >I had expected to do something similar with the LPC2000 parts.
Feed the
> >watchdog with 0xaa from the interrupt layer and then feed it with
0x55
> >from the foreground. It was expected that even if the watchdog
received
> >something like this, it would still be fine:
> >
> >0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 ....
> >
> >
> >What value is a watchdog that requires you to carefully feed it, it
> >sounds as if the watchdog is very fragile? > Actually, that is rather the point. The idea is to prevent being
able to
> accidentally satisfy the watchdog with writes from multiple places
in the
> program. Watchdogs are supposed to be somewhat twitchy. Even so
an
> internal watchdog doesn't usually catch events like loss of
> oscillator. The Philips watchdog also appears to allow the trip
time to be
> reprogrammed during operation which is a big weakness.
>
> BTW this is not an unusual requirement for microcontroller based
watchdogs,
> at least in my experience.
>
> You may find (if you haven't read it already) Jack Ganssle's
article on
> watchdogs interesting. http://www.ganssle.com/watchdogs.pdf >
> >I noticed that Philips
> >Applications people studiously stayed out of this thread about the
> >watchdog! I suspect that the watchdog is severly broken and they
did
> >not want to comment on it?
>
> Actually, from these accounts the watchdog is fine. The
documentation
> needs to be updated though. The line referring the watchdog
register space
> probably needs to be updated to match the wording used for the
PLLFEED
> register which explicitly refers to back to back VPB writes being
> needed. I expect the two peripherals are using identical logic.
>
> I wouldn't mind that period hole being fixed though, and maybe even
seeing
> a minimum time between feeds enforced.
>
> Robert
>
> " 'Freedom' has no meaning of itself. There are always
restrictions, be
> they legal, genetic, or physical. If you don't believe me, try to
chew a
> radio signal. " -- Kelvin Throop, III
> http://www.aeolusdevelopment.com/
>




Reply by derbaier December 11, 20052005-12-11
--- In lpc2000@lpc2..., "Ake Hedman, eurosource" <akhe@b...>
wrote:
>

> >
> >
> The absolute majority of watchdogs, internal or external, function by
> toggling a bit from time to time. Where in the code you do this
> toggling is something you have to investigate for each project. The
> common point in any case is that if the program pass a certain point in
> the flow we allow the application to live for n milliseconds more.

True enough. :-)

>
> The 0xAA, 0x55 sequence is nothing else either. It may look safer then
> toggling a bit but false watchdog triggers have never been a problem
for
> an extended time during my years as an embedded developer. The chance
> that your crashed program start to toggle a watchdog bit or write a
> 0xAA, 0x55 sequence is just minimal and in that light I think Tom's
> reasoning is fully correct and would not make the watchdogs
> functionality less good. At least IMHO.

Minimal is NOT zero, so conservative hardware design pratices dictate
erring on the side of caution. IMHO, that reasoning makes the
software design task easier, but makes the watchdog less reliable when
atomic kick sequences are not required. >
> But requiring the 0xAA,0x55 sequence is acceptable to of course. The
> need to mask interrupts while doing it is not. Thats really bad.
>
Bad is probably a matter of opinion?
More difficult, certainly.

The need for atomic code sequences is not all that unusual, so it is
just something that needs to be dealt with in the software design,IMHO. -- Dave


Reply by Robert Adsett December 11, 20052005-12-11
At 07:10 PM 12/11/05 +0100, Ake Hedman, eurosource wrote:
>The absolute majority of watchdogs, internal or external, function by
>toggling a bit from time to time. Where in the code you do this
>toggling is something you have to investigate for each project. The
>common point in any case is that if the program pass a certain point in
>the flow we allow the application to live for n milliseconds more.
>
>The 0xAA, 0x55 sequence is nothing else either. It may look safer then
>toggling a bit but false watchdog triggers have never been a problem for
>an extended time during my years as an embedded developer. The chance
>that your crashed program start to toggle a watchdog bit or write a
>0xAA, 0x55 sequence is just minimal and in that light I think Tom's
>reasoning is fully correct and would not make the watchdogs
>functionality less good. At least IMHO.

Umm minimal chances are what the watchdog is there to protect against. If
the chance isn't minimal you can make a very good case that is should be
covered in your application code and dealt with there. >But requiring the 0xAA,0x55 sequence is acceptable to of course. The
>need to mask interrupts while doing it is not. Thats really bad.

This is far from unusual. To quote from the 8xc196MC user manual

"We recommend that you disable interrupts before writing to the watchdog
register. If an interrupt occurs between two writes, the watchdog register
will not be cleared."

The Freescale M68HC11 family acts in the manner you suggest, while the
ST10/C167 family from ST and Infinion use a single 'protected instruction'
which is effectively two instructions in series to perform the
feed. Effectively ST/Infinion have integrated the feed and interrupt
protection into a single instruction. Note that the later family also
implements Oscillator watchdogs on many of the family members.

Robert

" 'Freedom' has no meaning of itself. There are always restrictions, be
they legal, genetic, or physical. If you don't believe me, try to chew a
radio signal. " -- Kelvin Throop, III
http://www.aeolusdevelopment.com/


Reply by Ake Hedman, eurosource December 11, 20052005-12-11
derbaier wrote:

> --- In lpc2000@lpc2..., Tom Walsh <tom@o...> wrote:
> >
> > I had expected to do something similar with the LPC2000 parts. Feed
> the
> > watchdog with 0xaa from the interrupt layer and then feed it with 0x55
> > from the foreground. It was expected that even if the watchdog
> received
> > something like this, it would still be fine:
> >
> > 0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 ....
> >
> >
> > What value is a watchdog that requires you to carefully feed it, it
> > sounds as if the watchdog is very fragile? I noticed that Philips
> > Applications people studiously stayed out of this thread about the
> > watchdog! I suspect that the watchdog is severly broken and they did
> > not want to comment on it?
> >
> >
> > Regards,
> >
> > TomW
> >
> > --
> > Tom Walsh - WN3L - Embedded Systems Consultant
> > http://openhardware.net, http://cyberiansoftware.com
> > "Windows? No thanks, I have work to do..."
> > ----------------
> >
> It sounds like you have described a really robust watchdog timer!!
> It detects and stops fragile software.
>
> It's function is NOT to make life easy for the software developer, it
> is to make the system more reliable for the end user by forcing good
> software and hardware design. The watchdog kick should always be
> cosecutive instructions to reduce the possibility of random events
> from kicking the timer.
>
> At least, that was the philosophy of the ASIC teams that I have worked
> with.
>
> --Dave
The absolute majority of watchdogs, internal or external, function by
toggling a bit from time to time. Where in the code you do this
toggling is something you have to investigate for each project. The
common point in any case is that if the program pass a certain point in
the flow we allow the application to live for n milliseconds more.

The 0xAA, 0x55 sequence is nothing else either. It may look safer then
toggling a bit but false watchdog triggers have never been a problem for
an extended time during my years as an embedded developer. The chance
that your crashed program start to toggle a watchdog bit or write a
0xAA, 0x55 sequence is just minimal and in that light I think Tom's
reasoning is fully correct and would not make the watchdogs
functionality less good. At least IMHO.

But requiring the 0xAA,0x55 sequence is acceptable to of course. The
need to mask interrupts while doing it is not. Thats really bad.

/Ake

--
---
Ake Hedman (YAP - Yet Another Programmer)
eurosource, Brattbergavägen 17, 820 50 LOS, Sweden
Phone: (46) 657 413430 Cellular: (46) 73 84 84 102
Company home: http://www.eurosource.se
Kryddor/Te/Kaffe: http://www.brattberg.com
Personal homepage: http://www.eurosource.se/akhe
Automated home: http://www.vscp.org


Reply by derbaier December 11, 20052005-12-11
--- In lpc2000@lpc2..., Tom Walsh <tom@o...> wrote:
>
> I had expected to do something similar with the LPC2000 parts. Feed
the
> watchdog with 0xaa from the interrupt layer and then feed it with 0x55
> from the foreground. It was expected that even if the watchdog
received
> something like this, it would still be fine:
>
> 0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 .... > What value is a watchdog that requires you to carefully feed it, it
> sounds as if the watchdog is very fragile? I noticed that Philips
> Applications people studiously stayed out of this thread about the
> watchdog! I suspect that the watchdog is severly broken and they did
> not want to comment on it? > Regards,
>
> TomW
>
> --
> Tom Walsh - WN3L - Embedded Systems Consultant
> http://openhardware.net, http://cyberiansoftware.com
> "Windows? No thanks, I have work to do..."
> ----------------
>
It sounds like you have described a really robust watchdog timer!!
It detects and stops fragile software.

It's function is NOT to make life easy for the software developer, it
is to make the system more reliable for the end user by forcing good
software and hardware design. The watchdog kick should always be
cosecutive instructions to reduce the possibility of random events
from kicking the timer.

At least, that was the philosophy of the ASIC teams that I have worked
with.

--Dave


Reply by Robert Adsett December 11, 20052005-12-11
At 01:09 AM 12/11/05 -0500, Tom Walsh wrote:
>I've been following this thread with interest as I expected to use the
>watchdog in my application. The old design used a MAX690 where you
>would toggle the input to keep the watchdog happy. This worked well for
>us as the foreground software would set the pin high and the interrupt
>routines would reset it low. This ensured that both sections of the
>program were operating okay. Foreground did non-time-critical tasks
>while the background serviced realtime critical tasks & communications
>over RS485.
>
>I had expected to do something similar with the LPC2000 parts. Feed the
>watchdog with 0xaa from the interrupt layer and then feed it with 0x55
>from the foreground. It was expected that even if the watchdog received
>something like this, it would still be fine:
>
>0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 .... >What value is a watchdog that requires you to carefully feed it, it
>sounds as if the watchdog is very fragile?


Actually, that is rather the point. The idea is to prevent being able to
accidentally satisfy the watchdog with writes from multiple places in the
program. Watchdogs are supposed to be somewhat twitchy. Even so an
internal watchdog doesn't usually catch events like loss of
oscillator. The Philips watchdog also appears to allow the trip time to be
reprogrammed during operation which is a big weakness.

BTW this is not an unusual requirement for microcontroller based watchdogs,
at least in my experience.

You may find (if you haven't read it already) Jack Ganssle's article on
watchdogs interesting. http://www.ganssle.com/watchdogs.pdf
>I noticed that Philips
>Applications people studiously stayed out of this thread about the
>watchdog! I suspect that the watchdog is severly broken and they did
>not want to comment on it?

Actually, from these accounts the watchdog is fine. The documentation
needs to be updated though. The line referring the watchdog register space
probably needs to be updated to match the wording used for the PLLFEED
register which explicitly refers to back to back VPB writes being
needed. I expect the two peripherals are using identical logic.

I wouldn't mind that period hole being fixed though, and maybe even seeing
a minimum time between feeds enforced.

Robert

" 'Freedom' has no meaning of itself. There are always restrictions, be
they legal, genetic, or physical. If you don't believe me, try to chew a
radio signal. " -- Kelvin Throop, III
http://www.aeolusdevelopment.com/