EmbeddedRelated.com
Forums
Memfault Beyond the Launch

My MSP430F5528 CPU gets stuck several times a day

Started by veraleh September 30, 2012
Hello Group,

I have a very weird problem.
I am using an MSP430F5528 CPU, running the FLL at 8MHz (fixed, we don't change it).
The system has a 3.0V LDO with 250mA current limitation.
From time to time (4 to 12 times in 24 hours), the CPU will become stuck and only an external NMI will be able to reset the device.

We have an external circuit that acts like a watchdog, if there is no edge change from the CPU (an GPIO line from the MSP to this circuit) after 8 seconds (and we change it twice every second), it will send a RESET pulse to the MSP.

We have a "nasty" radio connected to the circuit, powered directly from the LDO, that uses 27mA for receive, 124mA peak current for transmit, and since it uses a 24MHz crystal for the PLL, when it just powers up, it can have a large in-rush current... in IDLE mode (most of the time) it will consume 15uA. The radio is an 802.15.4 radio.

Now the weird thing is:
1. If we are running with the DEBUGGER (IAR 5.30.1), it will not happen.
2. If we are running from an external power supply (even with a very-very large capacitor for stabilization), we'll have the RESET events.

More info:
1. We use the suggested code from TI in order to find out the cause of the RESET event, it is always NMI. If we generate (by intent) a watchdog reset or a watchdog violation key, we see it in the RESET cause. We also have a mechanism in place to locate and report a stack-overrun.
2. The code is written in C, no C++, no heap.
3. The system goes to LPM1 and LPM3 accordingly (this is a very low power system, the average core consumption (without the radio) is around 20uA) and will wake up 16 times a second for an internal scheduler or when an HW interrupt is received (UART / SPI / radio / ADC / etc.)
4. We use a SMALL data model (only the lower 64KB of the flash, 16bit registers and addresses), and no compiler optimizations.
5. An earlier product, based on MSP430F2370 and a different radio never had these problems. The code was ported and alternated from this product.

So how to catch the cause for the RESET when it will not happen in the debugger? Does this ring any bell to anyone?

- Ohad.

Beginning Microcontrollers with the MSP430

Never having worked with the 5528 I can only generalise.

The NMI is not a dead end, write a proper NMI handler that attempts to
gather some system info and then attempts to recover the system. You can
use theNMI_ISR to do all sorts of things, like store the NMI and
watchdog flags registers in flash before you let it force a reset.

That alone might help, but there are a few other things you can try, for
example disconnect the radio and see if it still happens. store the
system time immediately before powering up the radio, every time it's
used, again use flash, in a reserved cyclic segment. Try storing time
when the NMI occurs.

not failing on the debugger suggests it might be power related, if you
have power from the debugger available, but not necessarily so. Put a
data logger on your power and see if it dips at all. Are you using the
SVS? if so try disabling the PORON bit and see what happens, or disable
the SVS entirely.

Al

On 1/10/2012 3:28 AM, veraleh wrote:
> Hello Group,
>
> I have a very weird problem.
> I am using an MSP430F5528 CPU, running the FLL at 8MHz (fixed, we don't change it).
> The system has a 3.0V LDO with 250mA current limitation.
> >From time to time (4 to 12 times in 24 hours), the CPU will become stuck and only an external NMI will be able to reset the device.
>
> We have an external circuit that acts like a watchdog, if there is no edge change from the CPU (an GPIO line from the MSP to this circuit) after 8 seconds (and we change it twice every second), it will send a RESET pulse to the MSP.
>
> We have a "nasty" radio connected to the circuit, powered directly from the LDO, that uses 27mA for receive, 124mA peak current for transmit, and since it uses a 24MHz crystal for the PLL, when it just powers up, it can have a large in-rush current... in IDLE mode (most of the time) it will consume 15uA. The radio is an 802.15.4 radio.
>
> Now the weird thing is:
> 1. If we are running with the DEBUGGER (IAR 5.30.1), it will not happen.
> 2. If we are running from an external power supply (even with a very-very large capacitor for stabilization), we'll have the RESET events.
>
> More info:
> 1. We use the suggested code from TI in order to find out the cause of the RESET event, it is always NMI. If we generate (by intent) a watchdog reset or a watchdog violation key, we see it in the RESET cause. We also have a mechanism in place to locate and report a stack-overrun.
> 2. The code is written in C, no C++, no heap.
> 3. The system goes to LPM1 and LPM3 accordingly (this is a very low power system, the average core consumption (without the radio) is around 20uA) and will wake up 16 times a second for an internal scheduler or when an HW interrupt is received (UART / SPI / radio / ADC / etc.)
> 4. We use a SMALL data model (only the lower 64KB of the flash, 16bit registers and addresses), and no compiler optimizations.
> 5. An earlier product, based on MSP430F2370 and a different radio never had these problems. The code was ported and alternated from this product.
>
> So how to catch the cause for the RESET when it will not happen in the debugger? Does this ring any bell to anyone?
>
> - Ohad.
>
>
Thanks,

Here is some more info:
1. All the interrupts that I am not using are assigned and will output a debug trace to a terminal (using the UART of the MSP). Including SYSNMI_VECTOR you suggested to use. We don't see any call to these interrupts. The CPU simply gets stuck and only way to get "free" is to apply a RESET pulse (5 msec) on the RST/NMI pin of the MSP.
2. We assume this is a power issue, and see different statistics with different power connections.
3. I will take a look at the SVS register and management.
4. We see power dips of up to 300mV for a duration of a few microseconds. This is the LDO trying to keep up with a change in the requested current.
5. Disconnecting the Radio will render the product useless. Some code should also be changed. But it's worth a try.

- Ohad.

--- In m..., Onestone wrote:
>
> Never having worked with the 5528 I can only generalise.
>
> The NMI is not a dead end, write a proper NMI handler that attempts to
> gather some system info and then attempts to recover the system. You can
> use theNMI_ISR to do all sorts of things, like store the NMI and
> watchdog flags registers in flash before you let it force a reset.
>
> That alone might help, but there are a few other things you can try, for
> example disconnect the radio and see if it still happens. store the
> system time immediately before powering up the radio, every time it's
> used, again use flash, in a reserved cyclic segment. Try storing time
> when the NMI occurs.
>
> not failing on the debugger suggests it might be power related, if you
> have power from the debugger available, but not necessarily so. Put a
> data logger on your power and see if it dips at all. Are you using the
> SVS? if so try disabling the PORON bit and see what happens, or disable
> the SVS entirely.
>
> Al
>
> On 1/10/2012 3:28 AM, veraleh wrote:
> > Hello Group,
> >
> > I have a very weird problem.
> > I am using an MSP430F5528 CPU, running the FLL at 8MHz (fixed, we don't change it).
> > The system has a 3.0V LDO with 250mA current limitation.
> > >From time to time (4 to 12 times in 24 hours), the CPU will become stuck and only an external NMI will be able to reset the device.
> >
> > We have an external circuit that acts like a watchdog, if there is no edge change from the CPU (an GPIO line from the MSP to this circuit) after 8 seconds (and we change it twice every second), it will send a RESET pulse to the MSP.
> >
> > We have a "nasty" radio connected to the circuit, powered directly from the LDO, that uses 27mA for receive, 124mA peak current for transmit, and since it uses a 24MHz crystal for the PLL, when it just powers up, it can have a large in-rush current... in IDLE mode (most of the time) it will consume 15uA. The radio is an 802.15.4 radio.
> >
> > Now the weird thing is:
> > 1. If we are running with the DEBUGGER (IAR 5.30.1), it will not happen.
> > 2. If we are running from an external power supply (even with a very-very large capacitor for stabilization), we'll have the RESET events.
> >
> > More info:
> > 1. We use the suggested code from TI in order to find out the cause of the RESET event, it is always NMI. If we generate (by intent) a watchdog reset or a watchdog violation key, we see it in the RESET cause. We also have a mechanism in place to locate and report a stack-overrun.
> > 2. The code is written in C, no C++, no heap.
> > 3. The system goes to LPM1 and LPM3 accordingly (this is a very low power system, the average core consumption (without the radio) is around 20uA) and will wake up 16 times a second for an internal scheduler or when an HW interrupt is received (UART / SPI / radio / ADC / etc.)
> > 4. We use a SMALL data model (only the lower 64KB of the flash, 16bit registers and addresses), and no compiler optimizations.
> > 5. An earlier product, based on MSP430F2370 and a different radio never had these problems. The code was ported and alternated from this product.
> >
> > So how to catch the cause for the RESET when it will not happen in the debugger? Does this ring any bell to anyone?
> >
> > - Ohad.
> >
> >
> >
> >
> >
> >
> >
> >
To me 300mV is an unacceptable power dip. I have a GSM design where the
cell modem draws up to 2A, the maximum dip is 50mV, even that is not as
good as I'd like.

I suggested you used a true NMI recovery process rather than a simple
die and reset, but yes outputting to the UART may work, depending on
what went wrong. The same goes for writing Flash. Writing RAM is more
reliable if you can freeze further RAM use.

When it locks up can you attach IAR to a running system? then stop the
CPU. This may give an indication of what process it is in when it fails.

I'm not suggesting that you permanently disconnect the Rf system, but if
you suspect it is responsible for the faults then this is the obvious
place to attack the problem.

Firstly turn it off completely but simulate running it as far as you
can, perhaps use the UART in place of the RF. If no failuers occur then
try cycling the power on and off at a far higher rate than normal, and
see if that causes the problem. Next do rapid power on/transmit/off
cycles, and, if possible set the reciver up so that it alarm if a packet
is missed. Just use the shortest possible transaction.

Why do you assume it is a power issue? The SVS should catch it if it is.
Do you have any LED indicators on the board? When no ICE was available
an LED would always do if you got creative enough.

Al
On 1/10/2012 5:04 AM, veraleh wrote:
> Thanks,
>
> Here is some more info:
> 1. All the interrupts that I am not using are assigned and will output a debug trace to a terminal (using the UART of the MSP). Including SYSNMI_VECTOR you suggested to use. We don't see any call to these interrupts. The CPU simply gets stuck and only way to get "free" is to apply a RESET pulse (5 msec) on the RST/NMI pin of the MSP.
> 2. We assume this is a power issue, and see different statistics with different power connections.
> 3. I will take a look at the SVS register and management.
> 4. We see power dips of up to 300mV for a duration of a few microseconds. This is the LDO trying to keep up with a change in the requested current.
> 5. Disconnecting the Radio will render the product useless. Some code should also be changed. But it's worth a try.
>
> - Ohad.
>
> --- In m..., Onestone wrote:
>> Never having worked with the 5528 I can only generalise.
>>
>> The NMI is not a dead end, write a proper NMI handler that attempts to
>> gather some system info and then attempts to recover the system. You can
>> use theNMI_ISR to do all sorts of things, like store the NMI and
>> watchdog flags registers in flash before you let it force a reset.
>>
>> That alone might help, but there are a few other things you can try, for
>> example disconnect the radio and see if it still happens. store the
>> system time immediately before powering up the radio, every time it's
>> used, again use flash, in a reserved cyclic segment. Try storing time
>> when the NMI occurs.
>>
>> not failing on the debugger suggests it might be power related, if you
>> have power from the debugger available, but not necessarily so. Put a
>> data logger on your power and see if it dips at all. Are you using the
>> SVS? if so try disabling the PORON bit and see what happens, or disable
>> the SVS entirely.
>>
>> Al
>>
>> On 1/10/2012 3:28 AM, veraleh wrote:
>>> Hello Group,
>>>
>>> I have a very weird problem.
>>> I am using an MSP430F5528 CPU, running the FLL at 8MHz (fixed, we don't change it).
>>> The system has a 3.0V LDO with 250mA current limitation.
>>> >From time to time (4 to 12 times in 24 hours), the CPU will become stuck and only an external NMI will be able to reset the device.
>>>
>>> We have an external circuit that acts like a watchdog, if there is no edge change from the CPU (an GPIO line from the MSP to this circuit) after 8 seconds (and we change it twice every second), it will send a RESET pulse to the MSP.
>>>
>>> We have a "nasty" radio connected to the circuit, powered directly from the LDO, that uses 27mA for receive, 124mA peak current for transmit, and since it uses a 24MHz crystal for the PLL, when it just powers up, it can have a large in-rush current... in IDLE mode (most of the time) it will consume 15uA. The radio is an 802.15.4 radio.
>>>
>>> Now the weird thing is:
>>> 1. If we are running with the DEBUGGER (IAR 5.30.1), it will not happen.
>>> 2. If we are running from an external power supply (even with a very-very large capacitor for stabilization), we'll have the RESET events.
>>>
>>> More info:
>>> 1. We use the suggested code from TI in order to find out the cause of the RESET event, it is always NMI. If we generate (by intent) a watchdog reset or a watchdog violation key, we see it in the RESET cause. We also have a mechanism in place to locate and report a stack-overrun.
>>> 2. The code is written in C, no C++, no heap.
>>> 3. The system goes to LPM1 and LPM3 accordingly (this is a very low power system, the average core consumption (without the radio) is around 20uA) and will wake up 16 times a second for an internal scheduler or when an HW interrupt is received (UART / SPI / radio / ADC / etc.)
>>> 4. We use a SMALL data model (only the lower 64KB of the flash, 16bit registers and addresses), and no compiler optimizations.
>>> 5. An earlier product, based on MSP430F2370 and a different radio never had these problems. The code was ported and alternated from this product.
>>>
>>> So how to catch the cause for the RESET when it will not happen in the debugger? Does this ring any bell to anyone?
>>>
>>> - Ohad.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
Guys, thanks!

Well, I think disabling the SVS is a good idea.
I reviewed some of the documentation (again) and it turns out that the SVS high side (for the DVCC) is initialized to normal-performance by default, and as such, it will suffer from fast power surges on the DVcc.
"Full-performance mode might be considered in applications in which the decoupling of the external power supply cannot adequately prevent fast spikes on DVCC from occurring, or when the application has a particular intolerance to failure. In such cases, full-performance mode provides an additional layer of protection"

Full performance mode takes more power and this is probably the reason we didn't go there to begin with. On the other hand, since we have an external mechanism to reset the device if the power drops too low + we have an external LDO, we might as well disable the SVS and SVM all together.

I took the precaution of both disabling the SVS and SVM (high and low) + setting the VCORE to level 2 (default level is 0).
we'll see (24 hours from now) if this helps or not.
Should be o.k. - the batteries are 3.6V, the LDO is 3.0V and the VDCC requires 2.2V and VCore 1.8V. So we don't really need these modules.

The MSP documentation also states that when the debugger is connected, the TEST pin of the JTAG might be logic HIGH or logic LOW, which in turn changes the real-time performance of the device.
This can explain why we don't see the same behavior (device gets stuck) when the debugger is active.

The trivial test of removing the radio indicates that the problem is indeed a power issue, as the device does not RESET when the radio is absent.

Stopping the MSP on the fly to check what the register state is - not trivial. there is an errata for the family, stating that the CPU might go crazy if we do that during a sleep period (and this is what our device does most of the time).

For those of you who want to check the documentation regarding this issue - open SLAU208H and read section 2.2.8 "SVS/SVM Performance Modes and Wakeup Times" - read it carefully...
- Ohad.

--- In m..., Onestone wrote:
>
> To me 300mV is an unacceptable power dip. I have a GSM design where the
> cell modem draws up to 2A, the maximum dip is 50mV, even that is not as
> good as I'd like.
>
> I suggested you used a true NMI recovery process rather than a simple
> die and reset, but yes outputting to the UART may work, depending on
> what went wrong. The same goes for writing Flash. Writing RAM is more
> reliable if you can freeze further RAM use.
>
> When it locks up can you attach IAR to a running system? then stop the
> CPU. This may give an indication of what process it is in when it fails.
>
> I'm not suggesting that you permanently disconnect the Rf system, but if
> you suspect it is responsible for the faults then this is the obvious
> place to attack the problem.
>
> Firstly turn it off completely but simulate running it as far as you
> can, perhaps use the UART in place of the RF. If no failuers occur then
> try cycling the power on and off at a far higher rate than normal, and
> see if that causes the problem. Next do rapid power on/transmit/off
> cycles, and, if possible set the reciver up so that it alarm if a packet
> is missed. Just use the shortest possible transaction.
>
> Why do you assume it is a power issue? The SVS should catch it if it is.
> Do you have any LED indicators on the board? When no ICE was available
> an LED would always do if you got creative enough.
>
> Al
> On 1/10/2012 5:04 AM, veraleh wrote:
> > Thanks,
> >
> > Here is some more info:
> > 1. All the interrupts that I am not using are assigned and will output a debug trace to a terminal (using the UART of the MSP). Including SYSNMI_VECTOR you suggested to use. We don't see any call to these interrupts. The CPU simply gets stuck and only way to get "free" is to apply a RESET pulse (5 msec) on the RST/NMI pin of the MSP.
> > 2. We assume this is a power issue, and see different statistics with different power connections.
> > 3. I will take a look at the SVS register and management.
> > 4. We see power dips of up to 300mV for a duration of a few microseconds. This is the LDO trying to keep up with a change in the requested current.
> > 5. Disconnecting the Radio will render the product useless. Some code should also be changed. But it's worth a try.
> >
> > - Ohad.
> >
> > --- In m..., Onestone wrote:
> >> Never having worked with the 5528 I can only generalise.
> >>
> >> The NMI is not a dead end, write a proper NMI handler that attempts to
> >> gather some system info and then attempts to recover the system. You can
> >> use theNMI_ISR to do all sorts of things, like store the NMI and
> >> watchdog flags registers in flash before you let it force a reset.
> >>
> >> That alone might help, but there are a few other things you can try, for
> >> example disconnect the radio and see if it still happens. store the
> >> system time immediately before powering up the radio, every time it's
> >> used, again use flash, in a reserved cyclic segment. Try storing time
> >> when the NMI occurs.
> >>
> >> not failing on the debugger suggests it might be power related, if you
> >> have power from the debugger available, but not necessarily so. Put a
> >> data logger on your power and see if it dips at all. Are you using the
> >> SVS? if so try disabling the PORON bit and see what happens, or disable
> >> the SVS entirely.
> >>
> >> Al
> >>
> >>
> >>
> >> On 1/10/2012 3:28 AM, veraleh wrote:
> >>> Hello Group,
> >>>
> >>> I have a very weird problem.
> >>> I am using an MSP430F5528 CPU, running the FLL at 8MHz (fixed, we don't change it).
> >>> The system has a 3.0V LDO with 250mA current limitation.
> >>> >From time to time (4 to 12 times in 24 hours), the CPU will become stuck and only an external NMI will be able to reset the device.
> >>>
> >>> We have an external circuit that acts like a watchdog, if there is no edge change from the CPU (an GPIO line from the MSP to this circuit) after 8 seconds (and we change it twice every second), it will send a RESET pulse to the MSP.
> >>>
> >>> We have a "nasty" radio connected to the circuit, powered directly from the LDO, that uses 27mA for receive, 124mA peak current for transmit, and since it uses a 24MHz crystal for the PLL, when it just powers up, it can have a large in-rush current... in IDLE mode (most of the time) it will consume 15uA. The radio is an 802.15.4 radio.
> >>>
> >>> Now the weird thing is:
> >>> 1. If we are running with the DEBUGGER (IAR 5.30.1), it will not happen.
> >>> 2. If we are running from an external power supply (even with a very-very large capacitor for stabilization), we'll have the RESET events.
> >>>
> >>> More info:
> >>> 1. We use the suggested code from TI in order to find out the cause of the RESET event, it is always NMI. If we generate (by intent) a watchdog reset or a watchdog violation key, we see it in the RESET cause. We also have a mechanism in place to locate and report a stack-overrun.
> >>> 2. The code is written in C, no C++, no heap.
> >>> 3. The system goes to LPM1 and LPM3 accordingly (this is a very low power system, the average core consumption (without the radio) is around 20uA) and will wake up 16 times a second for an internal scheduler or when an HW interrupt is received (UART / SPI / radio / ADC / etc.)
> >>> 4. We use a SMALL data model (only the lower 64KB of the flash, 16bit registers and addresses), and no compiler optimizations.
> >>> 5. An earlier product, based on MSP430F2370 and a different radio never had these problems. The code was ported and alternated from this product.
> >>>
> >>> So how to catch the cause for the RESET when it will not happen in the debugger? Does this ring any bell to anyone?
> >>>
> >>> - Ohad.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
O.K., So the problem is solved.
Thank you for suggesting to get rid of the PMM (SVS,SVM) in the MSP430. On top of that, it might have been a case where running @ 8MHz and VCore level 0 is too marginal (although supported in the specification).

So shutting down the SVS and SVM and setting the VCore to level 2 solves the problem and the product is now stable.

-Ohad.

--- In m..., "veraleh" wrote:
>
> Guys, thanks!
>
> Well, I think disabling the SVS is a good idea.
> I reviewed some of the documentation (again) and it turns out that the SVS high side (for the DVCC) is initialized to normal-performance by default, and as such, it will suffer from fast power surges on the DVcc.
> "Full-performance mode might be considered in applications in which the decoupling of the external power supply cannot adequately prevent fast spikes on DVCC from occurring, or when the application has a particular intolerance to failure. In such cases, full-performance mode provides an additional layer of protection"
>
> Full performance mode takes more power and this is probably the reason we didn't go there to begin with. On the other hand, since we have an external mechanism to reset the device if the power drops too low + we have an external LDO, we might as well disable the SVS and SVM all together.
>
> I took the precaution of both disabling the SVS and SVM (high and low) + setting the VCORE to level 2 (default level is 0).
> we'll see (24 hours from now) if this helps or not.
> Should be o.k. - the batteries are 3.6V, the LDO is 3.0V and the VDCC requires 2.2V and VCore 1.8V. So we don't really need these modules.
>
> The MSP documentation also states that when the debugger is connected, the TEST pin of the JTAG might be logic HIGH or logic LOW, which in turn changes the real-time performance of the device.
> This can explain why we don't see the same behavior (device gets stuck) when the debugger is active.
>
> The trivial test of removing the radio indicates that the problem is indeed a power issue, as the device does not RESET when the radio is absent.
>
> Stopping the MSP on the fly to check what the register state is - not trivial. there is an errata for the family, stating that the CPU might go crazy if we do that during a sleep period (and this is what our device does most of the time).
>
> For those of you who want to check the documentation regarding this issue - open SLAU208H and read section 2.2.8 "SVS/SVM Performance Modes and Wakeup Times" - read it carefully...
> - Ohad.
>
> --- In m..., Onestone wrote:
> >
> > To me 300mV is an unacceptable power dip. I have a GSM design where the
> > cell modem draws up to 2A, the maximum dip is 50mV, even that is not as
> > good as I'd like.
> >
> > I suggested you used a true NMI recovery process rather than a simple
> > die and reset, but yes outputting to the UART may work, depending on
> > what went wrong. The same goes for writing Flash. Writing RAM is more
> > reliable if you can freeze further RAM use.
> >
> > When it locks up can you attach IAR to a running system? then stop the
> > CPU. This may give an indication of what process it is in when it fails.
> >
> > I'm not suggesting that you permanently disconnect the Rf system, but if
> > you suspect it is responsible for the faults then this is the obvious
> > place to attack the problem.
> >
> > Firstly turn it off completely but simulate running it as far as you
> > can, perhaps use the UART in place of the RF. If no failuers occur then
> > try cycling the power on and off at a far higher rate than normal, and
> > see if that causes the problem. Next do rapid power on/transmit/off
> > cycles, and, if possible set the reciver up so that it alarm if a packet
> > is missed. Just use the shortest possible transaction.
> >
> > Why do you assume it is a power issue? The SVS should catch it if it is.
> > Do you have any LED indicators on the board? When no ICE was available
> > an LED would always do if you got creative enough.
> >
> > Al
> >
> >
> > On 1/10/2012 5:04 AM, veraleh wrote:
> > > Thanks,
> > >
> > > Here is some more info:
> > > 1. All the interrupts that I am not using are assigned and will output a debug trace to a terminal (using the UART of the MSP). Including SYSNMI_VECTOR you suggested to use. We don't see any call to these interrupts. The CPU simply gets stuck and only way to get "free" is to apply a RESET pulse (5 msec) on the RST/NMI pin of the MSP.
> > > 2. We assume this is a power issue, and see different statistics with different power connections.
> > > 3. I will take a look at the SVS register and management.
> > > 4. We see power dips of up to 300mV for a duration of a few microseconds. This is the LDO trying to keep up with a change in the requested current.
> > > 5. Disconnecting the Radio will render the product useless. Some code should also be changed. But it's worth a try.
> > >
> > > - Ohad.
> > >
> > > --- In m..., Onestone wrote:
> > >> Never having worked with the 5528 I can only generalise.
> > >>
> > >> The NMI is not a dead end, write a proper NMI handler that attempts to
> > >> gather some system info and then attempts to recover the system. You can
> > >> use theNMI_ISR to do all sorts of things, like store the NMI and
> > >> watchdog flags registers in flash before you let it force a reset.
> > >>
> > >> That alone might help, but there are a few other things you can try, for
> > >> example disconnect the radio and see if it still happens. store the
> > >> system time immediately before powering up the radio, every time it's
> > >> used, again use flash, in a reserved cyclic segment. Try storing time
> > >> when the NMI occurs.
> > >>
> > >> not failing on the debugger suggests it might be power related, if you
> > >> have power from the debugger available, but not necessarily so. Put a
> > >> data logger on your power and see if it dips at all. Are you using the
> > >> SVS? if so try disabling the PORON bit and see what happens, or disable
> > >> the SVS entirely.
> > >>
> > >> Al
> > >>
> > >>
> > >>
> > >> On 1/10/2012 3:28 AM, veraleh wrote:
> > >>> Hello Group,
> > >>>
> > >>> I have a very weird problem.
> > >>> I am using an MSP430F5528 CPU, running the FLL at 8MHz (fixed, we don't change it).
> > >>> The system has a 3.0V LDO with 250mA current limitation.
> > >>> >From time to time (4 to 12 times in 24 hours), the CPU will become stuck and only an external NMI will be able to reset the device.
> > >>>
> > >>> We have an external circuit that acts like a watchdog, if there is no edge change from the CPU (an GPIO line from the MSP to this circuit) after 8 seconds (and we change it twice every second), it will send a RESET pulse to the MSP.
> > >>>
> > >>> We have a "nasty" radio connected to the circuit, powered directly from the LDO, that uses 27mA for receive, 124mA peak current for transmit, and since it uses a 24MHz crystal for the PLL, when it just powers up, it can have a large in-rush current... in IDLE mode (most of the time) it will consume 15uA. The radio is an 802.15.4 radio.
> > >>>
> > >>> Now the weird thing is:
> > >>> 1. If we are running with the DEBUGGER (IAR 5.30.1), it will not happen.
> > >>> 2. If we are running from an external power supply (even with a very-very large capacitor for stabilization), we'll have the RESET events.
> > >>>
> > >>> More info:
> > >>> 1. We use the suggested code from TI in order to find out the cause of the RESET event, it is always NMI. If we generate (by intent) a watchdog reset or a watchdog violation key, we see it in the RESET cause. We also have a mechanism in place to locate and report a stack-overrun.
> > >>> 2. The code is written in C, no C++, no heap.
> > >>> 3. The system goes to LPM1 and LPM3 accordingly (this is a very low power system, the average core consumption (without the radio) is around 20uA) and will wake up 16 times a second for an internal scheduler or when an HW interrupt is received (UART / SPI / radio / ADC / etc.)
> > >>> 4. We use a SMALL data model (only the lower 64KB of the flash, 16bit registers and addresses), and no compiler optimizations.
> > >>> 5. An earlier product, based on MSP430F2370 and a different radio never had these problems. The code was ported and alternated from this product.
> > >>>
> > >>> So how to catch the cause for the RESET when it will not happen in the debugger? Does this ring any bell to anyone?
> > >>>
> > >>> - Ohad.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
Hi Ohad, in my opinion this isn't a solution, it's simply a kludge or a
bandaid that hides the real issue, which would seem to be poor power
supply design. The SVS and SVM are there for a reason. I would seriously
look at developing a better power supply before commercially shipping this.

Al

On 4/10/2012 6:50 PM, veraleh wrote:
> O.K., So the problem is solved.
> Thank you for suggesting to get rid of the PMM (SVS,SVM) in the MSP430. On top of that, it might have been a case where running @ 8MHz and VCore level 0 is too marginal (although supported in the specification).
>
> So shutting down the SVS and SVM and setting the VCore to level 2 solves the problem and the product is now stable.
>
> -Ohad.
>
> --- In m..., "veraleh" wrote:
>> Guys, thanks!
>>
>> Well, I think disabling the SVS is a good idea.
>> I reviewed some of the documentation (again) and it turns out that the SVS high side (for the DVCC) is initialized to normal-performance by default, and as such, it will suffer from fast power surges on the DVcc.
>> "Full-performance mode might be considered in applications in which the decoupling of the external power supply cannot adequately prevent fast spikes on DVCC from occurring, or when the application has a particular intolerance to failure. In such cases, full-performance mode provides an additional layer of protection"
>>
>> Full performance mode takes more power and this is probably the reason we didn't go there to begin with. On the other hand, since we have an external mechanism to reset the device if the power drops too low + we have an external LDO, we might as well disable the SVS and SVM all together.
>>
>> I took the precaution of both disabling the SVS and SVM (high and low) + setting the VCORE to level 2 (default level is 0).
>> we'll see (24 hours from now) if this helps or not.
>> Should be o.k. - the batteries are 3.6V, the LDO is 3.0V and the VDCC requires 2.2V and VCore 1.8V. So we don't really need these modules.
>>
>> The MSP documentation also states that when the debugger is connected, the TEST pin of the JTAG might be logic HIGH or logic LOW, which in turn changes the real-time performance of the device.
>> This can explain why we don't see the same behavior (device gets stuck) when the debugger is active.
>>
>> The trivial test of removing the radio indicates that the problem is indeed a power issue, as the device does not RESET when the radio is absent.
>>
>> Stopping the MSP on the fly to check what the register state is - not trivial. there is an errata for the family, stating that the CPU might go crazy if we do that during a sleep period (and this is what our device does most of the time).
>>
>> For those of you who want to check the documentation regarding this issue - open SLAU208H and read section 2.2.8 "SVS/SVM Performance Modes and Wakeup Times" - read it carefully...
>> - Ohad.
>>
>> --- In m..., Onestone wrote:
>>> To me 300mV is an unacceptable power dip. I have a GSM design where the
>>> cell modem draws up to 2A, the maximum dip is 50mV, even that is not as
>>> good as I'd like.
>>>
>>> I suggested you used a true NMI recovery process rather than a simple
>>> die and reset, but yes outputting to the UART may work, depending on
>>> what went wrong. The same goes for writing Flash. Writing RAM is more
>>> reliable if you can freeze further RAM use.
>>>
>>> When it locks up can you attach IAR to a running system? then stop the
>>> CPU. This may give an indication of what process it is in when it fails.
>>>
>>> I'm not suggesting that you permanently disconnect the Rf system, but if
>>> you suspect it is responsible for the faults then this is the obvious
>>> place to attack the problem.
>>>
>>> Firstly turn it off completely but simulate running it as far as you
>>> can, perhaps use the UART in place of the RF. If no failuers occur then
>>> try cycling the power on and off at a far higher rate than normal, and
>>> see if that causes the problem. Next do rapid power on/transmit/off
>>> cycles, and, if possible set the reciver up so that it alarm if a packet
>>> is missed. Just use the shortest possible transaction.
>>>
>>> Why do you assume it is a power issue? The SVS should catch it if it is.
>>> Do you have any LED indicators on the board? When no ICE was available
>>> an LED would always do if you got creative enough.
>>>
>>> Al
>>>
>>>
>>> On 1/10/2012 5:04 AM, veraleh wrote:
>>>> Thanks,
>>>>
>>>> Here is some more info:
>>>> 1. All the interrupts that I am not using are assigned and will output a debug trace to a terminal (using the UART of the MSP). Including SYSNMI_VECTOR you suggested to use. We don't see any call to these interrupts. The CPU simply gets stuck and only way to get "free" is to apply a RESET pulse (5 msec) on the RST/NMI pin of the MSP.
>>>> 2. We assume this is a power issue, and see different statistics with different power connections.
>>>> 3. I will take a look at the SVS register and management.
>>>> 4. We see power dips of up to 300mV for a duration of a few microseconds. This is the LDO trying to keep up with a change in the requested current.
>>>> 5. Disconnecting the Radio will render the product useless. Some code should also be changed. But it's worth a try.
>>>>
>>>> - Ohad.
>>>>
>>>> --- In m..., Onestone wrote:
>>>>> Never having worked with the 5528 I can only generalise.
>>>>>
>>>>> The NMI is not a dead end, write a proper NMI handler that attempts to
>>>>> gather some system info and then attempts to recover the system. You can
>>>>> use theNMI_ISR to do all sorts of things, like store the NMI and
>>>>> watchdog flags registers in flash before you let it force a reset.
>>>>>
>>>>> That alone might help, but there are a few other things you can try, for
>>>>> example disconnect the radio and see if it still happens. store the
>>>>> system time immediately before powering up the radio, every time it's
>>>>> used, again use flash, in a reserved cyclic segment. Try storing time
>>>>> when the NMI occurs.
>>>>>
>>>>> not failing on the debugger suggests it might be power related, if you
>>>>> have power from the debugger available, but not necessarily so. Put a
>>>>> data logger on your power and see if it dips at all. Are you using the
>>>>> SVS? if so try disabling the PORON bit and see what happens, or disable
>>>>> the SVS entirely.
>>>>>
>>>>> Al
>>>>>
>>>>>
>>>>>
>>>>> On 1/10/2012 3:28 AM, veraleh wrote:
>>>>>> Hello Group,
>>>>>>
>>>>>> I have a very weird problem.
>>>>>> I am using an MSP430F5528 CPU, running the FLL at 8MHz (fixed, we don't change it).
>>>>>> The system has a 3.0V LDO with 250mA current limitation.
>>>>>> >From time to time (4 to 12 times in 24 hours), the CPU will become stuck and only an external NMI will be able to reset the device.
>>>>>>
>>>>>> We have an external circuit that acts like a watchdog, if there is no edge change from the CPU (an GPIO line from the MSP to this circuit) after 8 seconds (and we change it twice every second), it will send a RESET pulse to the MSP.
>>>>>>
>>>>>> We have a "nasty" radio connected to the circuit, powered directly from the LDO, that uses 27mA for receive, 124mA peak current for transmit, and since it uses a 24MHz crystal for the PLL, when it just powers up, it can have a large in-rush current... in IDLE mode (most of the time) it will consume 15uA. The radio is an 802.15.4 radio.
>>>>>>
>>>>>> Now the weird thing is:
>>>>>> 1. If we are running with the DEBUGGER (IAR 5.30.1), it will not happen.
>>>>>> 2. If we are running from an external power supply (even with a very-very large capacitor for stabilization), we'll have the RESET events.
>>>>>>
>>>>>> More info:
>>>>>> 1. We use the suggested code from TI in order to find out the cause of the RESET event, it is always NMI. If we generate (by intent) a watchdog reset or a watchdog violation key, we see it in the RESET cause. We also have a mechanism in place to locate and report a stack-overrun.
>>>>>> 2. The code is written in C, no C++, no heap.
>>>>>> 3. The system goes to LPM1 and LPM3 accordingly (this is a very low power system, the average core consumption (without the radio) is around 20uA) and will wake up 16 times a second for an internal scheduler or when an HW interrupt is received (UART / SPI / radio / ADC / etc.)
>>>>>> 4. We use a SMALL data model (only the lower 64KB of the flash, 16bit registers and addresses), and no compiler optimizations.
>>>>>> 5. An earlier product, based on MSP430F2370 and a different radio never had these problems. The code was ported and alternated from this product.
>>>>>>
>>>>>> So how to catch the cause for the RESET when it will not happen in the debugger? Does this ring any bell to anyone?
>>>>>>
>>>>>> - Ohad.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Memfault Beyond the Launch