EmbeddedRelated.com
Forums

SOS: Data corruption of MSP 430 Flash

Started by maple_1982 March 16, 2008
Hi, Hugh
It's a pity that I am not in US, otherwise I would like to talk with
you about this problem in Starbucks of San Diego:)
This problem is serious potentially. Now about one thousand products
have been used in field, but only one board met this problem, but I
am not sure whether the others would come out with this problem in
future. If the code flash was corrupted, we would have to take back
the board and reload the code. The FAE has no way to repair this.
That would cause lots of trouble.
I have read your suggestions several times and will check my code
then. I would give a detailed description after the review. One
problem is that it's hard to reproduce the case.
Thanks so much to your reply. I have never met such a good group in
which there are so many kind-hearted and experienced engineers. Hope
we can help each other and progress together.

--- In m..., Hugh Molesworth wrote:
>
> I sent this email to your address earlier, but it got bounced.
>

Beginning Microcontrollers with the MSP430

Based on your description, I think the failures are most likely caused
by one or more of the following reasons:

(a) High temperature exposure to the device after it is erased and
programmed. This includes soldering, storage, shipping, etc.

(b) Marginal timing and power used during the erase/program process.

(c) Marginal flash retention time of those particular failed chips.

Note that all of the above probably can pass your QC tests and show up
as failures at a later time.

I think the marginal flash read feature is very valuable to minimize
the above problems. Unfortunately the chip you are using does not
support that feature.

--- In m..., "maple_1982" wrote:
>
> Hi, Stuart
>
> Thanks very much for your reply.
> Now the flash allocation is like this:
>
> 0x1000-0x1100 Info Flash for parameters
> 0x2500-0x4000 Bootloader
> 0x4000-0xFFFF Firmware
>
> While firmware is running, if we want to upgrade the firmware,
> we would send a command to set a update flag stored in info
> flash, and then reset the board. It would enter bootloader and
> check this flag, if it has been set to UPDATE mode, it would
> begin the update process and receive data via data , and then
> program it to flash. So the code won't write itself.
>
> I'll look for the possibility that flash program function
> writes data to a wrong address, which might cause the data
> corruption.
>
> Besides, now the corruption appears in both info flash and
> code flash (both bootloader and firmware).
>

My experience says when weird things happen, first check the code. If it
is truly random, then check the power supply. Monitor power and ground
with an oscilloscope as close to the processor as you can get.

Since you don't think it is a code bug (remember always check code
first) tack a 10 - 100 MFD capacitor between the power and ground lines
of the processors (at each power and ground).

If the problem goes away you know it is a power issue.

I usually place a large (10-100) MFD cap near any processor, in addition
to whatever the data sheet recommends. I also make sure my power lines
and ground lines for the processor (if it has multiple power and ground
lines) are all tied together under the processor.

I was called in to assist fixing a project where the Junior Electrical
Engineer handed off the schematics and netlist to the layout person and
did not review anything until the 2 layer board came back populated and
he was trying to test it. All the grounds were connected together, but
it was like one big tree branch and some components next to each other
had 4 inches of ground trace connecting them. There were no multiple
connections of ground. The processor had multiple ground pins and there
was 2-4 inches of copper trace between some of them. The point is that
you could also have a layout problem if it is an electrical problem. It
is an easy problem to have with a 2 layer board.

Kip

On Mon, 2008-03-17 at 11:38 -0500, Dan Muzzey wrote:
> Long story short....
>
>
>
> We have had trouble using the MSP430 program space to store data. Some
> testing revealed that there would be an occasional bad flash write or
> bad erase. Not very often but it could happen. In order to combat this
> we have increased the bypass capacitance on Vcc so that we now use a
> couple .01uF and a 1uF directly on the VCC pins. Additionally we added
> a CRC to the data and a verify routine to make sure that we actually
> wrote what we intended to write.
>
>
>
> Several different people have reviewed the code without finding any
> errors that could cause the problem. The hardware has been reviewed
> several times without any clues.
>
>
>
> Dan
>
>
>
> ________________________________
>
> From: m... [mailto:m...] On Behalf
> Of Stuart_Rubin
> Sent: Monday, March 17, 2008 11:28 AM
> To: m...
> Subject: [msp430] Re: SOS: Data corruption of MSP 430 Flash
>
>
>
> It sounds like your bootloader scheme with the flag is reasonable. Do
> you have built-in protections against overwriting the bootloader?
>
> Another issue may be your clock speed. The Users Guide is very
> prescriptive about the flash write clock (usually a derivative of
> MCLK) as well as VCC, as the previous responder mentioned. Being
> marginally beyond the limits may explain why you have problems with
> one device, but not all of them.
>
> Stuart
>
> --- In m... ,
> "maple_1982" wrote:
> >
> > Hi, Stuart
> >
> > Thanks very much for your reply.
> > Now the flash allocation is like this:
> >
> > 0x1000-0x1100 Info Flash for parameters
> > 0x2500-0x4000 Bootloader
> > 0x4000-0xFFFF Firmware
> >
> > While firmware is running, if we want to upgrade the firmware, we
> > would send a command to set a update flag stored in info flash, and
> > then reset the board. It would enter bootloader and check this flag,
> > if it has been set to UPDATE mode, it would begin the update process
> > and receive data via data , and then program it to flash.
> > So the code won't write itself.
> >
> > I'll look for the possibility that flash program function writes data
> > to a wrong address, which might cause the data corruption.
> >
> > Besides, now the corruption appears in both info flash and code flash
> > (both bootloader and firmware).
> >
> > --- In m... ,
> "Stuart_Rubin"
> > wrote:
> > >
> > > Does your code do any flash writing of its own? It's virtually
> > > impossible to accidentally write to flash on this processor.
> > >
> > > If you do have flash-writing capabilities, you probably have a bug
> > > causing the wrong address to be written. It may be hard to
> > correlate
> > > the data you're reading with the corrupted data in flash because you
> > > can only "write" 1's to 0's and not 0's to 1's (without erasing the
> > > segment).
> > >
> > > We have done very strenuous "highly accelerated life testing" and
> > > other stress-tests on the flash and have never found corrupted
> > flash.
> > >
> > > As the previous tester suggested, redundancy or other error-
> > detection
> > > information is a good tool (CRC, etc.), but it sounds like your
> > > product may already be deployed.
> > >
> > > Some other hints:
> > > Don't keep code in the same segments as areas that you write to.
> > > Then, customize your flash-writing routines to double-check the
> > erase
> > > range for valid addresses.
> > > Encode all data with CRC, checksum, or some kind or error check.
> > >
> > > Stuart
> > >
> >
>
>
>
These days I review the code and found an error of the flash
initialization code. It uses SMCLK as clock source of Flash
module,while is 6MHz, while the division factor is 1. As we know, the
flash clock frequency of F169 should be between 257K and 476Khz.
I think that may be one factor that causes this problem. However,
I've tried to reproduce this problem, but failed. No such error
occurs.
However, I think it's necessary to modify the code to avoid such risk.
--- In m..., "maple_1982" wrote:
>
> Hi, Stuart
>
> Thanks very much for your reply.
> Now the flash allocation is like this:
>
> 0x1000-0x1100 Info Flash for parameters
> 0x2500-0x4000 Bootloader
> 0x4000-0xFFFF Firmware
>
> While firmware is running, if we want to upgrade the firmware, we
> would send a command to set a update flag stored in info flash, and
> then reset the board. It would enter bootloader and check this
flag,
> if it has been set to UPDATE mode, it would begin the update
process
> and receive data via data , and then program it to flash.
> So the code won't write itself.
>
> I'll look for the possibility that flash program function writes
data
> to a wrong address, which might cause the data corruption.
>
> Besides, now the corruption appears in both info flash and code
flash
> (both bootloader and firmware).
>
> --- In m..., "Stuart_Rubin"
> wrote:
> >
> > Does your code do any flash writing of its own? It's virtually
> > impossible to accidentally write to flash on this processor.
> >
> > If you do have flash-writing capabilities, you probably have a bug
> > causing the wrong address to be written. It may be hard to
> correlate
> > the data you're reading with the corrupted data in flash because
you
> > can only "write" 1's to 0's and not 0's to 1's (without erasing
the
> > segment).
> >
> > We have done very strenuous "highly accelerated life testing" and
> > other stress-tests on the flash and have never found corrupted
> flash.
> >
> > As the previous tester suggested, redundancy or other error-
> detection
> > information is a good tool (CRC, etc.), but it sounds like your
> > product may already be deployed.
> >
> > Some other hints:
> > Don't keep code in the same segments as areas that you write to.
> > Then, customize your flash-writing routines to double-check the
> erase
> > range for valid addresses.
> > Encode all data with CRC, checksum, or some kind or error check.
> >
> > Stuart
>