EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Lock up through power cycling.

Started by Kipton Moravec February 20, 2007
I have a prototype board that has a MSP430F1610, that locks up after
running about 30 hours. When I cycle power it stays locked up, until I
reload the software using the IAR tool set.

I do not have a lot of visual indication on this board. I have one LED
that turns on when it is time to send data, and turns off when the send
window is over. It turns on when a TimerB compare matches a certain
value, and turns off when another TimerB compare matches another value.
And it is interrupt driven.

When the GPS is working it resets the A timer and B timer to zero every
5 seconds. If the GPS is not working the A timer and B timer roll over
every 16 seconds (The timers are counting at 4096 Hz) It gives me a
visual indication that I have good GPS Data. If it blinks once every 5
seconds the GPS is working, if every 16 seconds the GPS has not locked
on to enough satellites.

Every time it has locked up the LED stays on, and no data comes out.

If I cycle power it stays locked up with the LED on.

At the very beginning of the initialization, I check if the flash data
in matches my constants and if it does not I rewrite about 20 words
starting at 1080h. That is the only place I write to FLASH.

It seems like Flash is getting changed somehow. But I do not know for
certain, but that is the only thing I can think of. On the next compile
I will comment out the subroutine that writes to 1080h flash.

Beyond that I have no clue what could be causing it to lock up, or how
to find it. It has locked up twice, both times after running over 24
hours continuously. Both times I had to reprogram it to start it back
up.

Does anyone have any suggestions what I should do, or what I should look
for?

Thanks,
Kip
--
Kipton Moravec

Beginning Microcontrollers with the MSP430

Hello Kip,

Could there be any section in your code that might initiate a spurious
flash write? I am a hardware guy so my first look would be at the
supply voltage. Is the SVS/BOR configured properly (chapter 6 of
family spec)? If so, do you have a digital scope that features nifty
trigger qualifiers so you can set it to go off and log when the supply
voltage falls outside a window? Tektronix TDS series or something like
that.

I don't know what else is on your circuit board but could there be
anything that can generate spikes beyond of what the bypass capacitors
can muffle?

Regards, Joerg

http://www.analogconsultants.com/
Kip,

> If I cycle power it stays locked up with the LED on.
>
> At the very beginning of the initialization, I check if the flash data
> in matches my constants and if it does not I rewrite about 20 words
> starting at 1080h. That is the only place I write to FLASH.
>
> It seems like Flash is getting changed somehow. But I do not know for
> certain, but that is the only thing I can think of. On the next compile
> I will comment out the subroutine that writes to 1080h flash.
>
> Beyond that I have no clue what could be causing it to lock up, or how
> to find it. It has locked up twice, both times after running over 24
> hours continuously. Both times I had to reprogram it to start it back
> up.
>
> Does anyone have any suggestions what I should do, or what I should
> look for?

The best thing to do is when it locks up, verify the contents of the flash
against the program you flashed into it. If the programs are the same then
you have no flash corruption. I assume you can connect to a "failed"
running system with your toolset, so you should be able to diagnose what is
happening immediately after reset on a failed system and, therefore, track
down your problem.

The ability to effectively debug is a requirement of a software engineer.
;-)

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors
On Tue, 2007-02-20 at 17:37 +0000, Paul Curtis wrote:
> Kip,
>
> > If I cycle power it stays locked up with the LED on.
> >
> > At the very beginning of the initialization, I check if the flash data
> > in matches my constants and if it does not I rewrite about 20 words
> > starting at 1080h. That is the only place I write to FLASH.
> >
> > It seems like Flash is getting changed somehow. But I do not know for
> > certain, but that is the only thing I can think of. On the next compile
> > I will comment out the subroutine that writes to 1080h flash.
> >
> > Beyond that I have no clue what could be causing it to lock up, or how
> > to find it. It has locked up twice, both times after running over 24
> > hours continuously. Both times I had to reprogram it to start it back
> > up.
> >
> > Does anyone have any suggestions what I should do, or what I should
> > look for?
>
> The best thing to do is when it locks up, verify the contents of the flash
> against the program you flashed into it. If the programs are the same then
> you have no flash corruption. I assume you can connect to a "failed"
> running system with your toolset, so you should be able to diagnose what is
> happening immediately after reset on a failed system and, therefore, track
> down your problem.

It did not connect, when I tried to connect to a running system using
the IAR toolset. My only option was to reload the software through the
debugger. It seems to run fine for over 24 hours. It has failed twice
this way so far. It takes a long time to fail which is why it has not
been seen before.

I am running from a 12V power supply, and the board has a switching
power supply to drop it to 3.3V. The chips are well decoupled, all chips
have at least a .1 uF decoupling cap, and the processor and GPS have .1,
1, and 10 uF ceramic capacitors for decoupling and the board has two 100
uF Electrolytic Caps for bulk capacitance. The board has both a 3.3V and
ground plane, which measure as being pretty quiet < 20 mV ripple. The
board is about 6 square inches and draws 20 to 30 mA. I do not think it
is power or I would think it would show up sooner.
>
> The ability to effectively debug is a requirement of a software engineer.
> ;-)

Gee thanks, since I have been programming since 1974, and have a BSE in
Computer Engineering (which was half HW and half SW at the time) in
1983, I do know a few things about debugging. :) But when it fails and I
have no way to examine what happened, I have to either ask what others
have seen or experienced, or try to remember what has happened on other
processors in the past (i.e. my experience).

Right now my experience tells me to look at power (if I was looking at
someone elses design), but I tend to over design power sections so I do
not have power problems later.

>
> --
> Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
> CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors
>
>
> Yahoo! Groups Links
--
Kipton Moravec
Kip,

> > > Does anyone have any suggestions what I should do, or what I should
> > > look for?
> >
> > The best thing to do is when it locks up, verify the contents of the
> flash
> > against the program you flashed into it. If the programs are the
> same then
> > you have no flash corruption. I assume you can connect to a "failed"
> > running system with your toolset, so you should be able to diagnose
> what is
> > happening immediately after reset on a failed system and, therefore,
> track
> > down your problem.
>
> It did not connect, when I tried to connect to a running system using
> the IAR toolset. My only option was to reload the software through the
> debugger. It seems to run fine for over 24 hours. It has failed twice
> this way so far. It takes a long time to fail which is why it has not
> been seen before.

Are you using the RST pin as an NMI? Have you accidentally configured this
somehow? I assume you've not stomped on the SVS registers?

> > The ability to effectively debug is a requirement of a software
> engineer.
> > ;-)
>
> Gee thanks, since I have been programming since 1974, and have a BSE in
> Computer Engineering (which was half HW and half SW at the time) in
> 1983, I do know a few things about debugging. :) But when it fails and
> I
> have no way to examine what happened, I have to either ask what others
> have seen or experienced, or try to remember what has happened on other
> processors in the past (i.e. my experience).

I don't understand why you can't connect to the running CPU; this seems
counter to my experience and knowledge. If you can reprogram the device
over JTAG without swapping the chip you sure can read out the program over
JTAG. I've done it many times. If you can do this, then you can see if
flash is corrupt. I'm sure Anders can tell you how to do this.

-- Paul.
Hi,

In the past I've forgotten to lock flash after writing it - hate to
admit that but it may be something you'd like to check.

Check your flash timing generator?

Check your supply voltage when writing to flash? I know there is a lower
limit...

Maybe whack a CRC over your data it and check it on startup and after
you program your data? This would help you narrow things down a bit. I
had a data corruption problem before and this is how I tracked it down.

Have you tried bound checking the data you are reading from flash?

Maybe spit your data out the serial port?

Maybe enable the watchdog and spit the data out when you watchdog?

Too bad it only happens every 24hours...

Hope this helps,

Michael

________________________________

From: m... [mailto:m...] On Behalf
Of Kipton Moravec
Sent: Tuesday, February 20, 2007 7:24 AM
To: m...
Subject: [msp430] Lock up through power cycling.

I have a prototype board that has a MSP430F1610, that locks up after
running about 30 hours. When I cycle power it stays locked up, until I
reload the software using the IAR tool set.

I do not have a lot of visual indication on this board. I have one LED
that turns on when it is time to send data, and turns off when the send
window is over. It turns on when a TimerB compare matches a certain
value, and turns off when another TimerB compare matches another value.
And it is interrupt driven.

When the GPS is working it resets the A timer and B timer to zero every
5 seconds. If the GPS is not working the A timer and B timer roll over
every 16 seconds (The timers are counting at 4096 Hz) It gives me a
visual indication that I have good GPS Data. If it blinks once every 5
seconds the GPS is working, if every 16 seconds the GPS has not locked
on to enough satellites.

Every time it has locked up the LED stays on, and no data comes out.

If I cycle power it stays locked up with the LED on.

At the very beginning of the initialization, I check if the flash data
in matches my constants and if it does not I rewrite about 20 words
starting at 1080h. That is the only place I write to FLASH.

It seems like Flash is getting changed somehow. But I do not know for
certain, but that is the only thing I can think of. On the next compile
I will comment out the subroutine that writes to 1080h flash.

Beyond that I have no clue what could be causing it to lock up, or how
to find it. It has locked up twice, both times after running over 24
hours continuously. Both times I had to reprogram it to start it back
up.

Does anyone have any suggestions what I should do, or what I should look
for?

Thanks,
Kip
--
Kipton Moravec >
Thanks for the suggestions.

Currently we are running off of a 12V power supply. And have a switching
power supply generating the 3.3V to run the processor.

There is nothing generating spikes that I am aware of. There is a 1610,
a compass, accelerometer, 2 logic level serial ports and 1 USB port, a
modem, and a GPS Module.

The board has a lot of capacitors. Everything is bypassed by a .1 uF and
the processor and GPS are bypassed with .1, 1, and 10 uF capacitors in
parallel, and there are 2 100 uF electrolytic bulk capacitors. The
board is 4 layers with a 3.3V power plane and a Ground Plane. I do not
see any noise on the planes. I usually overdo the power supply because
that is often the source of gremlins.
On Tue, 2007-02-20 at 16:21 +0000, Joerg Schulze-Clewing wrote:
> Hello Kip,
>
> Could there be any section in your code that might initiate a spurious
> flash write? I am a hardware guy so my first look would be at the
> supply voltage. Is the SVS/BOR configured properly (chapter 6 of
> family spec)? If so, do you have a digital scope that features nifty
> trigger qualifiers so you can set it to go off and log when the supply
> voltage falls outside a window? Tektronix TDS series or something like
> that.
>
> I don't know what else is on your circuit board but could there be
> anything that can generate spikes beyond of what the bypass capacitors
> can muffle?
>
> Regards, Joerg

--
Kipton Moravec
I'm a paranoid type guy so I usually start by eliminating everything.
I'd try dropping a bit of code that just toggles an LED and see if it
locks up. Then start adding back stuff. I like to try and find the
simplest version that exhibits the problem. Makes it a little more
difficult with a 24 hour average fail time. From your description
though, it sounds like something with the flash write routine. I'd
spend some time looking through those. Then I'd find someone else to
look through them. Preferable someone that doesn't take anything for
granted. One thing I didn't see mentioned was the clock. Could your
clock be failing? As another option, how about the stack or a bad
array.

I've heard that many people are using .01 uF bypass caps now instead of
.1. Every data sheet says use .1 but the explanation was it was a hold
over from slower chips. This probably isn't your problem though.

muzzey

________________________________

From: m... [mailto:m...] On Behalf
Of Kipton Moravec
Sent: Tuesday, February 20, 2007 3:26 PM
To: m...
Subject: Re: [msp430] Re: Lock up through power cycling.

Thanks for the suggestions.

Currently we are running off of a 12V power supply. And have a switching
power supply generating the 3.3V to run the processor.

There is nothing generating spikes that I am aware of. There is a 1610,
a compass, accelerometer, 2 logic level serial ports and 1 USB port, a
modem, and a GPS Module.

The board has a lot of capacitors. Everything is bypassed by a .1 uF and
the processor and GPS are bypassed with .1, 1, and 10 uF capacitors in
parallel, and there are 2 100 uF electrolytic bulk capacitors. The
board is 4 layers with a 3.3V power plane and a Ground Plane. I do not
see any noise on the planes. I usually overdo the power supply because
that is often the source of gremlins.

On Tue, 2007-02-20 at 16:21 +0000, Joerg Schulze-Clewing wrote:
> Hello Kip,
>
> Could there be any section in your code that might initiate a spurious
> flash write? I am a hardware guy so my first look would be at the
> supply voltage. Is the SVS/BOR configured properly (chapter 6 of
> family spec)? If so, do you have a digital scope that features nifty
> trigger qualifiers so you can set it to go off and log when the supply
> voltage falls outside a window? Tektronix TDS series or something like
> that.
>
> I don't know what else is on your circuit board but could there be
> anything that can generate spikes beyond of what the bypass capacitors
> can muffle?
>
> Regards, Joerg

--
Kipton Moravec >
Hi Kip,

Be careful with having such large caps around like 100 uF, and with the accumulated capacitance on
the rails. If there were a brown out or P/S problemas, you will have very high in rush currents to
deal with, that can be a source of Gremlins alone...

>From what I've followed on this thread, it sounds to me too that you might have a stray issue with
the flash writing.

Best Regards,
Kris

-----Original Message-----
From: m... [mailto:m...] On Behalf Of Kipton Moravec
Sent: Wednesday, 21 February 2007 8:26 AM
To: m...
Subject: Re: [msp430] Re: Lock up through power cycling.

Thanks for the suggestions.

Currently we are running off of a 12V power supply. And have a switching
power supply generating the 3.3V to run the processor.

There is nothing generating spikes that I am aware of. There is a 1610,
a compass, accelerometer, 2 logic level serial ports and 1 USB port, a
modem, and a GPS Module.

The board has a lot of capacitors. Everything is bypassed by a .1 uF and
the processor and GPS are bypassed with .1, 1, and 10 uF capacitors in
parallel, and there are 2 100 uF electrolytic bulk capacitors. The
board is 4 layers with a 3.3V power plane and a Ground Plane. I do not
see any noise on the planes. I usually overdo the power supply because
that is often the source of gremlins.
On Tue, 2007-02-20 at 16:21 +0000, Joerg Schulze-Clewing wrote:
> Hello Kip,
>
> Could there be any section in your code that might initiate a spurious
> flash write? I am a hardware guy so my first look would be at the
> supply voltage. Is the SVS/BOR configured properly (chapter 6 of
> family spec)? If so, do you have a digital scope that features nifty
> trigger qualifiers so you can set it to go off and log when the supply
> voltage falls outside a window? Tektronix TDS series or something like
> that.
>
> I don't know what else is on your circuit board but could there be
> anything that can generate spikes beyond of what the bypass capacitors
> can muffle?
>
> Regards, Joerg

--
Kipton Moravec

Yahoo! Groups Links
Kip,

Though power issues or FLASH corruption sounds plausible, I'm going to throw
out another Idea, if only to muddy the waters.

When I've had things happen only after running for a long time, it has
tended to be caused by values being put on a stack and not being removed
properly. Maybe a byte here or there, but it builds up. Eventually, you run
out of memory or make an illegal write. Or you overwrite a counter and
you're stuck in a loop.

Try checking the size of the stack and the extent of your memory usage when
you start out. Then check after an hour or so, then five hours. if it keeps
to the same size, you have some other problem.

Good luck!

Rachel Adamec

Norristown, PA, USA

Memfault Beyond the Launch