EmbeddedRelated.com
Forums

how to debug reset caused by WDTCTL security key violation

Started by Xiaohui Liu October 28, 2012
On Sun, 11 Nov 2012 21:40:02 -0500, Xiaohui Liu wrote:

>Thanks for your kindly reply. My comments are inline.
>
>On Sun, Nov 11, 2012 at 5:28 PM, Jon Kirwan wrote:
>
>> **
>> On Sun, 11 Nov 2012 17:05:56 -0500, Xiaohui Liu wrote:
>>
>> >Sorry about the confusion, I mean the watchdog is *disabled* during
>> >initialization, not *enabled*. So case (1) of watchdog timer expiration
>> >cannot happen.
>>
>> You earlier mentioned that you logically believe there must
>> be a security key violation causing the reset. I followed
>> your logic and, assuming all of the code was yours, I'd agree
>> with the conclusion. The fact is, though, that you have other
>> software in your system and you haven't discussed whether or
>> not you've gone through all of that code as well to ensure
>> that your statements about not turning on the WDT are valid.
>> It's possible it is happening in code you didn't write, given
>> that you include such code in your application.
>>
>> Either way, why haven't you halted upon discovery of a
>> security key violation PUC, stopped immediately, and examined
>> RAM for the old stack information?
>>
>How to achieve this, namely, "halted upon discovery of a security key
>violation PUC"?

What will happen is a PUC; you can't avoid that. Who cares?
Make sure that your reset vector points to code you control.
Write an IF statement (assembly code is best here) that
checks to see if this was a PUC reset, or not. If not, go
ahead and do the usual stuff. If not, execute some NOPs.
Place a breakpoint there. The debugger should be able to stop
at that point. That will be a "halt upon discovery of" for
you. At that point, go examine registers and RAM and dump all
of it to the PC for later examination.

>> It should still be there
>> (though of course upon reset you won't necessary have the
>> stack register to examine, you can still go look at where you
>> know the old stack data to reside.) You can set the stack
>> information so that it is possible to determine the extent of
>> the stack at the time of WDT PUC and if you know enough about
>> the activation frames used, I suspect you can work out where
>> things were at by dumping out that data and examining it
>> manually.
>>
>> Is there a reason you haven't tried this? Do you have access
>> to the source code for the parts you didn't write, also, so
>> you can check for direct WDT references? How many places in
>> your code do you use indirection where it may result in this
>> problem? Have you "instrumented" that code so you compare
>> with the WDT address of interest before attempting an access?
>>
>For the reason mentioned above.
>Yes, I have access to every line of the code.
>The problem is that somewhere WDT address of interest, i.e., WDTCTL
>register, is accessed unintentionally because, e.g., array index is out of
>bound. And I'm having difficulty locating where this access occurs.

See above. And why not instrument ever indirect reference as
I mentioned, too, while you are at it?

Jon

Beginning Microcontrollers with the MSP430

On Sun, 11 Nov 2012 21:31:31 -0800, I wrote:

>If not, execute some NOPs.

I mean, "If so, execute some NOPs."

Sorry about that.

Jon
just to be 100% clear.. Did You modify the startup code to not enable the wd timer?

By default it is ENABLEDby many of the compilers and you must make sure it is disabled by removing the code in the compiler startup code (NOT YOUR CODE).. It is not enough not to just not enable it in your code. This is why many of the code examples start by disabling the WD timer.. When you have alot of variables and data to be copied/initialized the timer can time out before it ever gets to the user code.


>________________________________
>From: Xiaohui Liu
>To: m...
>Sent: Sunday, November 11, 2012 5:05 PM
>Subject: Re: [msp430] Re: how to debug reset caused by WDTCTL security key violation
>
>Sorry about the confusion, I mean the watchdog is *disabled* during
>initialization, not *enabled*. So case (1) of watchdog timer expiration
>cannot happen.
>
>On Sun, Nov 11, 2012 at 4:31 PM, Joe Radomski wrote:
>
>> **
>> there is a good chance that all the initialization is taking too long.. in
>> that case you have to keep the watchdog disabled in the startup code..
>> >________________________________
>> >From: sinotrinity
>> >To: m...
>> >Sent: Sunday, November 11, 2012 4:15 PM
>> >Subject: [msp430] Re: how to debug reset caused by WDTCTL security key
>> violation
>>
>> >
>> >
>> >Hi,
>> >
>> >The watchdog is enabled during initialization.
>> >
>> >Any suggestion on how to locate this bug? I've been wrestling with this
>> bug for the past few weeks and still not found it. I'd really appreciate if
>> you can help.
>> >
>> >--- In mailto:msp430%40yahoogroups.com, Joe Radomski
>> wrote:
>> >>
>> >> you may not have enabled the watchdog, but most versions of the startup
>> code enable the watchdog timer by default.. even if the first line of your
>> code disabled it, it might not even get to your code before reseting.. this
>> can happen if you have alot of variables that have to be cleared and
>> initialized or data to be copied..
>> >> br /> >> >> In this case you need to modify the startup code (cstart.asm) to not
>> enable the wd timer and recompile..
>> >>
>> >>
>> >>
>> >> >________________________________
>> >> >From: Xiaohui Liu
>> >> >To: msp430
>> >> >Sent: Sunday, October 28, 2012 8:10 PM
>> >> >Subject: [msp430] how to debug reset caused by WDTCTL security key
>> violation
>> >> >
>> >> >br /> >> >> >Hi everyone,
>> >> >
>> >> >I'm working on a sensor project which uses
>> >> >TelosB<
>> http://www.memsic.com/products/wireless-sensor-networks/wireless-modules.html
>> >based
>> >> >on msp430f1611 running
>> >> >TinyOS . My program is reset some time after boot
>> >> >up. After the PUC reset, IFG1 is found with WDTIFG bit set, indicating
>> the
>> >> >watchdog timer initiates the reset. This can happen under two cases:
>> >> >1) Watchdog timer expiration when in watchdog mode only.
>> >> >But watchdog timer is never started, so this cannot happen.
>> >> >2) Watchdog timer security key violation.
>> >> >There is no place that my program explicitly writes WDTCTL (i.e.,
>> 0x0120h).
>> >> >So there must be some memory access bug in my code, which illegally
>> writes
>> >> >WDTCTL and causes security key violation.
>> >> >Is there any debug tool to help locate where this happens? Or any any
>> >> >suggestion on how I should proceed to locate the bug? My program is of
>> >> >thousands of lines, so manual check is non-trivial.
>> >> >
>> >> >Please weigh in if you have any suggestion. Thank you very much in
>> advance.
>> >> >
>> >> >More detailed information of the bug can be found
>> >> >here
>> >> >.
>> >> >
>> >> >-Xiaohui Liu
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>
>
> just to be 100% clear.. Did You modify the startup code to not enable the
> wd timer?
>
> By default it is ENABLEDby many of the compilers and you must make sure
> it is disabled by removing the code in the compiler startup code (NOT YOUR
> CODE).. It is not enough not to just not enable it in your code. This is
> why many of the code examples start by disabling the WD timer.. When you
> have alot of variables and data to be copied/initialized the timer can
> time out before it ever gets to the user code.

I think the WD is enabled by the hardware and the runtime system may, or may
not, disable it. And, as has been said before, if the runtime has a lot of
zeroing to do, main() may well not be entered before the WDT expires causing
endless resets.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
SolderCore Development Platform http://www.soldercore.com

CIL, please.

On Mon, Nov 12, 2012 at 12:31 AM, Jon Kirwan wrote:

> **
> On Sun, 11 Nov 2012 21:40:02 -0500, Xiaohui Liu wrote:
>
> >Thanks for your kindly reply. My comments are inline.
> >
> >On Sun, Nov 11, 2012 at 5:28 PM, Jon Kirwan > >wrote:
> >
> >> **
>
> >>
> >>
> >> On Sun, 11 Nov 2012 17:05:56 -0500, Xiaohui Liu wrote:
> >>
> >> >Sorry about the confusion, I mean the watchdog is *disabled* during
> >> >initialization, not *enabled*. So case (1) of watchdog timer expiration
> >> >cannot happen.
> >>
> >> You earlier mentioned that you logically believe there must
> >> be a security key violation causing the reset. I followed
> >> your logic and, assuming all of the code was yours, I'd agree
> >> with the conclusion. The fact is, though, that you have other
> >> software in your system and you haven't discussed whether or
> >> not you've gone through all of that code as well to ensure
> >> that your statements about not turning on the WDT are valid.
> >> It's possible it is happening in code you didn't write, given
> >> that you include such code in your application.
> >>
> >> Either way, why haven't you halted upon discovery of a
> >> security key violation PUC, stopped immediately, and examined
> >> RAM for the old stack information?
> >>
> >How to achieve this, namely, "halted upon discovery of a security key
> >violation PUC"?
>
> What will happen is a PUC; you can't avoid that. Who cares?
> Make sure that your reset vector points to code you control.
> Write an IF statement (assembly code is best here) that
> checks to see if this was a PUC reset, or not. If not, go
> ahead and do the usual stuff. If not, execute some NOPs.
> Place a breakpoint there. The debugger should be able to stop
> at that point. That will be a "halt upon discovery of" for
> you. At that point, go examine registers and RAM and dump all
> of it to the PC for later examination.
>
I was thinking along the same line. One difficulty is that program counter
(PC) is loaded with address contained at reset vector location (0FFFEh). So
even I point reset vector to code under my control, PC is irreversibly
erased. Ideally, I want to trace back to the instruction or statement
causing the PUC. Any suggestion on how this may be achieved? Thanks again.

>> It should still be there
> >> (though of course upon reset you won't necessary have the
> >> stack register to examine, you can still go look at where you
> >> know the old stack data to reside.) You can set the stack
> >> information so that it is possible to determine the extent of
> >> the stack at the time of WDT PUC and if you know enough about
> >> the activation frames used, I suspect you can work out where
> >> things were at by dumping out that data and examining it
> >> manually.
> >>
> >> Is there a reason you haven't tried this? Do you have access
> >> to the source code for the parts you didn't write, also, so
> >> you can check for direct WDT references? How many places in
> >> your code do you use indirection where it may result in this
> >> problem? Have you "instrumented" that code so you compare
> >> with the WDT address of interest before attempting an access?
> >>
> >For the reason mentioned above.
> >Yes, I have access to every line of the code.
> >The problem is that somewhere WDT address of interest, i.e., WDTCTL
> >register, is accessed unintentionally because, e.g., array index is out of
> >bound. And I'm having difficulty locating where this access occurs.
>
> See above. And why not instrument ever indirect reference as
> I mentioned, too, while you are at it?
>
> Jon
>
>
>


Please see in line.

On Mon, Nov 12, 2012 at 9:51 AM, Joe Radomski wrote:

> **
> just to be 100% clear.. Did You modify the startup code to not enable the
> wd timer?
The default startup code disables the watchdog timer. I do NOT change
it. I'm sure the program enters the user code because reset occurs in user
code.

> By default it is ENABLED by many of the compilers and you must make sure
> it is disabled by removing the code in the compiler startup code (NOT YOUR
> CODE).. It is not enough not to just not enable it in your code. This is
> why many of the code examples start by disabling the WD timer.. When you
> have alot of variables and data to be copied/initialized the timer can time
> out before it ever gets to the user code.
>
> >________________________________
> >From: Xiaohui Liu
> >To: m...
> >Sent: Sunday, November 11, 2012 5:05 PM
> >Subject: Re: [msp430] Re: how to debug reset caused by WDTCTL security
> key violation
>
> >
> >Sorry about the confusion, I mean the watchdog is *disabled* during
> >initialization, not *enabled*. So case (1) of watchdog timer expiration
>
> >cannot happen.
> >
> >On Sun, Nov 11, 2012 at 4:31 PM, Joe Radomski
> wrote:
> >
> >> **
>
> >>
> >>
> >> there is a good chance that all the initialization is taking too long..
> in
> >> that case you have to keep the watchdog disabled in the startup code..
> >>
> >>
> >> >________________________________
> >> >From: sinotrinity
> >> >To: m...
> >> >Sent: Sunday, November 11, 2012 4:15 PM
> >> >Subject: [msp430] Re: how to debug reset caused by WDTCTL security key
> >> violation
> >>
> >> >
> >> >
> >> >Hi,
> >> >
> >> >The watchdog is enabled during initialization.
> >> >
> >> >Any suggestion on how to locate this bug? I've been wrestling with this
> >> bug for the past few weeks and still not found it. I'd really
> appreciate if
> >> you can help.
> >> >
> >> >--- In mailto:msp430%40yahoogroups.com, Joe Radomski
> >> wrote:
> >> >>
> >> >> you may not have enabled the watchdog, but most versions of the
> startup
> >> code enable the watchdog timer by default.. even if the first line of
> your
> >> code disabled it, it might not even get to your code before reseting..
> this
> >> can happen if you have alot of variables that have to be cleared and
> >> initialized or data to be copied..
> >> >> br /> > >> >> In this case you need to modify the startup code (cstart.asm) to not
> >> enable the wd timer and recompile..
> >> >>
> >> >>
> >> >>
> >> >> >________________________________
> >> >> >From: Xiaohui Liu
> >> >> >To: msp430
> >> >> >Sent: Sunday, October 28, 2012 8:10 PM
> >> >> >Subject: [msp430] how to debug reset caused by WDTCTL security key
> >> violation
> >> >> >
> >> >> >br /> > >> >> >Hi everyone,
> >> >> >
> >> >> >I'm working on a sensor project which uses
> >> >> >TelosB<
> >>
> http://www.memsic.com/products/wireless-sensor-networks/wireless-modules.html
> >> >based
> >> >> >on msp430f1611 running
> >> >> >TinyOS . My program is reset some time after
> boot
> >> >> >up. After the PUC reset, IFG1 is found with WDTIFG bit set,
> indicating
> >> the
> >> >> >watchdog timer initiates the reset. This can happen under two cases:
> >> >> >1) Watchdog timer expiration when in watchdog mode only.
> >> >> >But watchdog timer is never started, so this cannot happen.
> >> >> >2) Watchdog timer security key violation.
> >> >> >There is no place that my program explicitly writes WDTCTL (i.e.,
> >> 0x0120h).
> >> >> >So there must be some memory access bug in my code, which illegally
> >> writes
> >> >> >WDTCTL and causes security key violation.
> >> >> >Is there any debug tool to help locate where this happens? Or any
> any
> >> >> >suggestion on how I should proceed to locate the bug? My program is
> of
> >> >> >thousands of lines, so manual check is non-trivial.
> >> >> >
> >> >> >Please weigh in if you have any suggestion. Thank you very much in
> >> advance.
> >> >> >
> >> >> >More detailed information of the bug can be found
> >> >> >here
> >> >> >.
> >> >> >
> >> >> >-Xiaohui Liu
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
> >
>
> >
> >
> >
> >
On Mon, 12 Nov 2012 14:13:13 -0500, you wrote:

>I was thinking along the same line. One difficulty is that program counter
>(PC) is loaded with address contained at reset vector location (0FFFEh). So
>even I point reset vector to code under my control, PC is irreversibly
>erased. Ideally, I want to trace back to the instruction or statement
>causing the PUC. Any suggestion on how this may be achieved? Thanks again.

Yes. Look at the stack activation frames. This will give you
a lot of information. Not EVERYTHING you want. But ALMOST
everything. You just need to use your brain, is all. The
stack will not tell you the last PC address, but it will tell
you everything that happened up until. This excludes a LOT,
making reasonable and probable assumptions. You are looking
for any clues. And this will give you many such clues.

I've done this many times on many processors going back many
years. It's productive.

Jon
On Mon, 12 Nov 2012 15:21:34 -0000, Paul wrote:

>I think the WD is enabled by the hardware and the runtime system may, or may
>not, disable it.

It is enabled -- the documentation is quite clear on that
point, both in words and in state machine diagrams. But that
can be quickly disabled in the crt0 (or whatever it is
called) code, if for some reason waiting until main() is
called is a problem. The OP said that it is disabled
somewhere -- and I believe it is a reasonable assumption that
the OP knows it actually is before the next PUC. It's too
easy to check that.

Jon
Hi Xiaohui,

a litte suggestion:.

Xiaohui Liu :

> 2) Watchdog timer security key violation.

Probably stack overflow. Fill your RAM with a simple pattern (e.g.
0xDeadBeef), put a breakpoint at reset (before the RAM is initialized by the
startupcode) and after the next break after reset take a look in the RAM
space. If there is no 0xDeadBeef anymore you have a problem. :-)

M.

--- In m..., Matthias Weingart wrote:
>
> Hi Xiaohui,
>
> a litte suggestion:.
>
> Xiaohui Liu :
>
> > 2) Watchdog timer security key violation.
>
> Probably stack overflow. Fill your RAM with a simple pattern (e.g.
> 0xDeadBeef), put a breakpoint at reset (before the RAM is initialized by the
> startupcode) and after the next break after reset take a look in the RAM
> space. If there is no 0xDeadBeef anymore you have a problem. :-)
>
> M.

Or you could use 0XDecafBad (as seen in some Stellaris Examples) :)