Robert, What you suggest sounds like a good approach: the key thing that it achieves is to get as much coverage as possible and minimise the risk of false feeds to the watchdog. The challenge, as always, is to balance this against complexity and time to implement. A couple of other key ideas I'd add to the mix: - Design the system as if there's no watchdog: if you don't do this, you can get into the (lazy) mode of always saying "oh, the watchdog will sort that out.....". This is one reason I tend to encourage implementing the watchdog as the last feature, and not one that the system depends on to do its basic function. That is, get your system fully functional and reliable, and then add in a watchdog (after fully testing it of course). - I'd strongly agree with Jack Ganssle's paper that to be really effective, a watchdog must reset the processor and all its peripherals. I haven't come up against it on the LPC2000 series, but other MCUs can get some of their peripherals into various locked states that can only be removed by an external MCU reset (i.e. the CPU can't clear it by resetting the device control registers). In other words, a simple "jump to zero", and re-initialise peripherals isn't guaranteed to work. It's this kind of stuff you really only learn from experience (and forums like this): they don't tend to mention it in device data sheets for some reason. Brendan --- In lpc2000@lpc2..., Robert Adsett <subscriptions@a...> wrote: > > At 02:57 PM 12/12/05 +0000, brendanmurphy37 wrote: > >Jack Ganssle's paper gives one way out of this (a supervisory task > >monitoring the health of all other tasks). > > I first started using that technique about 10 years ago. My usual way is > to have the watchdog supervisor in a periodic interrupt (sometimes the main > clock tick) watching a number of countdown watchdog timers and flags, one > of each for each task or process being monitored. Each cycle of the > supervisor decrements the countdown timers and if none of them have overrun > feeds the watchdog. As soon as one is no longer fed the watchdog no longer > gets fed. > > As additional protection the task countdown timers are hamming protected so > overwrites are less likely to give valid results. The flags usually are > required to have some sort of sequence or specific value (ie just setting > them to any old value is not sufficient). As well the task timers are not > regarded as active until they have first been fed but they cannot be turned > off. > > They can be made fancier from there but that is sufficient to put watchdogs > in as many periodic tasks and interrupts as you can afford the overhead > for. In my systems that usually only amounts to a small handful that you > want to watch anyway. > > Robert > > " 'Freedom' has no meaning of itself. There are always restrictions, be > they legal, genetic, or physical. If you don't believe me, try to chew a > radio signal. " -- Kelvin Throop, III > http://www.aeolusdevelopment.com/ >

An Engineer's Guide to the LPC2100 Series

At 02:57 PM 12/12/05 +0000, brendanmurphy37 wrote: >Jack Ganssle's paper gives one way out of this (a supervisory task >monitoring the health of all other tasks). I first started using that technique about 10 years ago. My usual way is to have the watchdog supervisor in a periodic interrupt (sometimes the main clock tick) watching a number of countdown watchdog timers and flags, one of each for each task or process being monitored. Each cycle of the supervisor decrements the countdown timers and if none of them have overrun feeds the watchdog. As soon as one is no longer fed the watchdog no longer gets fed. As additional protection the task countdown timers are hamming protected so overwrites are less likely to give valid results. The flags usually are required to have some sort of sequence or specific value (ie just setting them to any old value is not sufficient). As well the task timers are not regarded as active until they have first been fed but they cannot be turned off. They can be made fancier from there but that is sufficient to put watchdogs in as many periodic tasks and interrupts as you can afford the overhead for. In my systems that usually only amounts to a small handful that you want to watch anyway. Robert " 'Freedom' has no meaning of itself. There are always restrictions, be they legal, genetic, or physical. If you don't believe me, try to chew a radio signal. " -- Kelvin Throop, III http://www.aeolusdevelopment.com/

> Jack Ganssle's paper gives one way out of this (a supervisory > task monitoring the health of all other tasks). There are > alternatives. I have done something like this in the past for a multi-tasking system. The supervisory task was responsible for feeding the hardware watchdog. It was one of the first tasks started up in the system. As other tasks are started, they each "register" with the supervisory task with a requested watchdog time. This means different tasks can register at different software pat rates (how often they promise to keep telling the supervisory task they're all ok), allowing for slow low priority tasks or tasks that know they can take longer times to get around their "loops". Though it could be considered a slight hole, I also allowed tasks to adjust their pat time dynamically, or de-register because they were just about to close. The advantages outweighed the risk in my instance, allowing coverage of all tasks, whether transitory or not. (Perhaps when a task first registers, it could specify whether it can later deregister or not, makeing that hole a little more secure.) I have no idea if this sort of thing has been documented before; it was just my own design that rose from a requirement. It's a little bit "INIT" like, but prompting a full system restart on failure rather than a single task restart. Cheers, Bruce

>How I solved it: > >I have replaced all dog-feeds with > > mycpsr = disableIRQ(); > WDFEED = 0xAA; WDFEED = 0x55; > restoreIRQ( mycpsr ); > >where disableIRQ and restoreIRQ is from R O Software Precisely my point. It appears as if this is an extra undocumented feature. Ken Wada --- In lpc2000@lpc2..., Tom Walsh <tom@o...> wrote: > > Ake Hedman, eurosource wrote: > > >OK Guys, > > > >I can confirm that the WD problem is solved in my case. Thanks a lot > >all! (Description of solution below.) > > > >I think this definitely should be mentioned in the err data sheets for > >the affected processors (or clearly in the UM) as the resulting > >symptoms are such that the problem is very hard to debug.. > > > >Cheers > >/Ake > > > > > >How I solved it: > > > >I have replaced all dog-feeds with > > > > mycpsr = disableIRQ(); > > WDFEED = 0xAA; WDFEED = 0x55; > > restoreIRQ( mycpsr ); > > > >where disableIRQ and restoreIRQ is from R O Software > > > > > > > > I've been following this thread with interest as I expected to use the > watchdog in my application. The old design used a MAX690 where you > would toggle the input to keep the watchdog happy. This worked well for > us as the foreground software would set the pin high and the interrupt > routines would reset it low. This ensured that both sections of the > program were operating okay. Foreground did non-time-critical tasks > while the background serviced realtime critical tasks & communications > over RS485. > > I had expected to do something similar with the LPC2000 parts. Feed the > watchdog with 0xaa from the interrupt layer and then feed it with 0x55 > from the foreground. It was expected that even if the watchdog received > something like this, it would still be fine: > > 0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 .... > What value is a watchdog that requires you to carefully feed it, it > sounds as if the watchdog is very fragile? I noticed that Philips > Applications people studiously stayed out of this thread about the > watchdog! I suspect that the watchdog is severly broken and they did > not want to comment on it? > Regards, > > TomW > > -- > Tom Walsh - WN3L - Embedded Systems Consultant > http://openhardware.net, http://cyberiansoftware.com > "Windows? No thanks, I have work to do..." > ---------------- >

Robert, Thanks for the reference to the watchdog paper - I hadn't seen it before, and it's certainly very good i/p to watchdog design. I'd agree with the argument that to be really robust, you need to consider more than just an on-board watchdog with something giving it a feed now and then. Tom: putting together a system where the background does half the watchdig task, and interrupts the other is a good idea in principle. However, it does need careful thought as to the detail: I have seen such a scheme fail: the watchdog needed to toggle a pin to be fed. The main (system) loop put the pin one way and a timer interrupt put it another. The problem was the system could get into a state (after a very heavy burst of RF noise) where the watchdog was being fed, but the system was effectively dead as not much else was running. Jack Ganssle's paper gives one way out of this (a supervisory task monitoring the health of all other tasks). There are alternatives. For example, we have a watchdog message passed from one active task to the other (if it's in a happy state): if it reaches the watchdog task, the watchdog gets set (from one and only one location that's not in a loop). A new message is then generated by the watchdog task with a one second delay. In other words, for the watchdog to be fed, all the system's tasks/components have to be in a "happy" state, task scheduling, timers etc. all have to be working. It's really down to balance between what gives good coverage to complexity of implementation etc. For what it's worth, my own view is: - the LPC2000 watchdog is fine for an on-board watchdog (with corrected documentation!) - to detect all failures, you have to go to an off-chip watchdog (or at least one that doesn't use ANY shared component - in particular the clock) - you need to put some thought into the software to drive it: simple minded implementations will give better than nothing, but can be greatly imrpoved on - try not to feed a watchdog from a software loop: there's a chance a wayword processor will end up in that loop - the AA, 55 (or similar) feed sequence is better than the alternative of not having one (but not by much!) Brendan --- In lpc2000@lpc2..., Robert Adsett <subscriptions@a...> wrote: > > At 01:09 AM 12/11/05 -0500, Tom Walsh wrote: > >I've been following this thread with interest as I expected to use the > >watchdog in my application. The old design used a MAX690 where you > >would toggle the input to keep the watchdog happy. This worked well for > >us as the foreground software would set the pin high and the interrupt > >routines would reset it low. This ensured that both sections of the > >program were operating okay. Foreground did non-time-critical tasks > >while the background serviced realtime critical tasks & communications > >over RS485. > > > >I had expected to do something similar with the LPC2000 parts. Feed the > >watchdog with 0xaa from the interrupt layer and then feed it with 0x55 > >from the foreground. It was expected that even if the watchdog received > >something like this, it would still be fine: > > > >0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 .... > > > > > >What value is a watchdog that requires you to carefully feed it, it > >sounds as if the watchdog is very fragile? > Actually, that is rather the point. The idea is to prevent being able to > accidentally satisfy the watchdog with writes from multiple places in the > program. Watchdogs are supposed to be somewhat twitchy. Even so an > internal watchdog doesn't usually catch events like loss of > oscillator. The Philips watchdog also appears to allow the trip time to be > reprogrammed during operation which is a big weakness. > > BTW this is not an unusual requirement for microcontroller based watchdogs, > at least in my experience. > > You may find (if you haven't read it already) Jack Ganssle's article on > watchdogs interesting. http://www.ganssle.com/watchdogs.pdf > > >I noticed that Philips > >Applications people studiously stayed out of this thread about the > >watchdog! I suspect that the watchdog is severly broken and they did > >not want to comment on it? > > Actually, from these accounts the watchdog is fine. The documentation > needs to be updated though. The line referring the watchdog register space > probably needs to be updated to match the wording used for the PLLFEED > register which explicitly refers to back to back VPB writes being > needed. I expect the two peripherals are using identical logic. > > I wouldn't mind that period hole being fixed though, and maybe even seeing > a minimum time between feeds enforced. > > Robert > > " 'Freedom' has no meaning of itself. There are always restrictions, be > they legal, genetic, or physical. If you don't believe me, try to chew a > radio signal. " -- Kelvin Throop, III > http://www.aeolusdevelopment.com/ >

--- In lpc2000@lpc2..., "Ake Hedman, eurosource" <akhe@b...> wrote: > > > > > > The absolute majority of watchdogs, internal or external, function by > toggling a bit from time to time. Where in the code you do this > toggling is something you have to investigate for each project. The > common point in any case is that if the program pass a certain point in > the flow we allow the application to live for n milliseconds more. True enough. :-) > > The 0xAA, 0x55 sequence is nothing else either. It may look safer then > toggling a bit but false watchdog triggers have never been a problem for > an extended time during my years as an embedded developer. The chance > that your crashed program start to toggle a watchdog bit or write a > 0xAA, 0x55 sequence is just minimal and in that light I think Tom's > reasoning is fully correct and would not make the watchdogs > functionality less good. At least IMHO. Minimal is NOT zero, so conservative hardware design pratices dictate erring on the side of caution. IMHO, that reasoning makes the software design task easier, but makes the watchdog less reliable when atomic kick sequences are not required. > > But requiring the 0xAA,0x55 sequence is acceptable to of course. The > need to mask interrupts while doing it is not. Thats really bad. > Bad is probably a matter of opinion? More difficult, certainly. The need for atomic code sequences is not all that unusual, so it is just something that needs to be dealt with in the software design,IMHO. -- Dave

At 07:10 PM 12/11/05 +0100, Ake Hedman, eurosource wrote: >The absolute majority of watchdogs, internal or external, function by >toggling a bit from time to time. Where in the code you do this >toggling is something you have to investigate for each project. The >common point in any case is that if the program pass a certain point in >the flow we allow the application to live for n milliseconds more. > >The 0xAA, 0x55 sequence is nothing else either. It may look safer then >toggling a bit but false watchdog triggers have never been a problem for >an extended time during my years as an embedded developer. The chance >that your crashed program start to toggle a watchdog bit or write a >0xAA, 0x55 sequence is just minimal and in that light I think Tom's >reasoning is fully correct and would not make the watchdogs >functionality less good. At least IMHO. Umm minimal chances are what the watchdog is there to protect against. If the chance isn't minimal you can make a very good case that is should be covered in your application code and dealt with there. >But requiring the 0xAA,0x55 sequence is acceptable to of course. The >need to mask interrupts while doing it is not. Thats really bad. This is far from unusual. To quote from the 8xc196MC user manual "We recommend that you disable interrupts before writing to the watchdog register. If an interrupt occurs between two writes, the watchdog register will not be cleared." The Freescale M68HC11 family acts in the manner you suggest, while the ST10/C167 family from ST and Infinion use a single 'protected instruction' which is effectively two instructions in series to perform the feed. Effectively ST/Infinion have integrated the feed and interrupt protection into a single instruction. Note that the later family also implements Oscillator watchdogs on many of the family members. Robert " 'Freedom' has no meaning of itself. There are always restrictions, be they legal, genetic, or physical. If you don't believe me, try to chew a radio signal. " -- Kelvin Throop, III http://www.aeolusdevelopment.com/

derbaier wrote: > --- In lpc2000@lpc2..., Tom Walsh <tom@o...> wrote: > > > > I had expected to do something similar with the LPC2000 parts. Feed > the > > watchdog with 0xaa from the interrupt layer and then feed it with 0x55 > > from the foreground. It was expected that even if the watchdog > received > > something like this, it would still be fine: > > > > 0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 .... > > > > > > What value is a watchdog that requires you to carefully feed it, it > > sounds as if the watchdog is very fragile? I noticed that Philips > > Applications people studiously stayed out of this thread about the > > watchdog! I suspect that the watchdog is severly broken and they did > > not want to comment on it? > > > > > > Regards, > > > > TomW > > > > -- > > Tom Walsh - WN3L - Embedded Systems Consultant > > http://openhardware.net, http://cyberiansoftware.com > > "Windows? No thanks, I have work to do..." > > ---------------- > > > It sounds like you have described a really robust watchdog timer!! > It detects and stops fragile software. > > It's function is NOT to make life easy for the software developer, it > is to make the system more reliable for the end user by forcing good > software and hardware design. The watchdog kick should always be > cosecutive instructions to reduce the possibility of random events > from kicking the timer. > > At least, that was the philosophy of the ASIC teams that I have worked > with. > > --Dave The absolute majority of watchdogs, internal or external, function by toggling a bit from time to time. Where in the code you do this toggling is something you have to investigate for each project. The common point in any case is that if the program pass a certain point in the flow we allow the application to live for n milliseconds more. The 0xAA, 0x55 sequence is nothing else either. It may look safer then toggling a bit but false watchdog triggers have never been a problem for an extended time during my years as an embedded developer. The chance that your crashed program start to toggle a watchdog bit or write a 0xAA, 0x55 sequence is just minimal and in that light I think Tom's reasoning is fully correct and would not make the watchdogs functionality less good. At least IMHO. But requiring the 0xAA,0x55 sequence is acceptable to of course. The need to mask interrupts while doing it is not. Thats really bad. /Ake -- --- Ake Hedman (YAP - Yet Another Programmer) eurosource, Brattbergavägen 17, 820 50 LOS, Sweden Phone: (46) 657 413430 Cellular: (46) 73 84 84 102 Company home: http://www.eurosource.se Kryddor/Te/Kaffe: http://www.brattberg.com Personal homepage: http://www.eurosource.se/akhe Automated home: http://www.vscp.org

--- In lpc2000@lpc2..., Tom Walsh <tom@o...> wrote: > > I had expected to do something similar with the LPC2000 parts. Feed the > watchdog with 0xaa from the interrupt layer and then feed it with 0x55 > from the foreground. It was expected that even if the watchdog received > something like this, it would still be fine: > > 0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 .... > What value is a watchdog that requires you to carefully feed it, it > sounds as if the watchdog is very fragile? I noticed that Philips > Applications people studiously stayed out of this thread about the > watchdog! I suspect that the watchdog is severly broken and they did > not want to comment on it? > Regards, > > TomW > > -- > Tom Walsh - WN3L - Embedded Systems Consultant > http://openhardware.net, http://cyberiansoftware.com > "Windows? No thanks, I have work to do..." > ---------------- > It sounds like you have described a really robust watchdog timer!! It detects and stops fragile software. It's function is NOT to make life easy for the software developer, it is to make the system more reliable for the end user by forcing good software and hardware design. The watchdog kick should always be cosecutive instructions to reduce the possibility of random events from kicking the timer. At least, that was the philosophy of the ASIC teams that I have worked with. --Dave

At 01:09 AM 12/11/05 -0500, Tom Walsh wrote: >I've been following this thread with interest as I expected to use the >watchdog in my application. The old design used a MAX690 where you >would toggle the input to keep the watchdog happy. This worked well for >us as the foreground software would set the pin high and the interrupt >routines would reset it low. This ensured that both sections of the >program were operating okay. Foreground did non-time-critical tasks >while the background serviced realtime critical tasks & communications >over RS485. > >I had expected to do something similar with the LPC2000 parts. Feed the >watchdog with 0xaa from the interrupt layer and then feed it with 0x55 >from the foreground. It was expected that even if the watchdog received >something like this, it would still be fine: > >0xaa, 0xaa, 0xaa, 0x55, 0xaa, 0xaa, 0xaa, 0xaa, 0x55 .... >What value is a watchdog that requires you to carefully feed it, it >sounds as if the watchdog is very fragile? Actually, that is rather the point. The idea is to prevent being able to accidentally satisfy the watchdog with writes from multiple places in the program. Watchdogs are supposed to be somewhat twitchy. Even so an internal watchdog doesn't usually catch events like loss of oscillator. The Philips watchdog also appears to allow the trip time to be reprogrammed during operation which is a big weakness. BTW this is not an unusual requirement for microcontroller based watchdogs, at least in my experience. You may find (if you haven't read it already) Jack Ganssle's article on watchdogs interesting. http://www.ganssle.com/watchdogs.pdf >I noticed that Philips >Applications people studiously stayed out of this thread about the >watchdog! I suspect that the watchdog is severly broken and they did >not want to comment on it? Actually, from these accounts the watchdog is fine. The documentation needs to be updated though. The line referring the watchdog register space probably needs to be updated to match the wording used for the PLLFEED register which explicitly refers to back to back VPB writes being needed. I expect the two peripherals are using identical logic. I wouldn't mind that period hole being fixed though, and maybe even seeing a minimum time between feeds enforced. Robert " 'Freedom' has no meaning of itself. There are always restrictions, be they legal, genetic, or physical. If you don't believe me, try to chew a radio signal. " -- Kelvin Throop, III http://www.aeolusdevelopment.com/