Hi Group,

First post so I hope you can help me. I have a battery powered product that uses an MSP430F1481 as a pulse counter, the code is written entirely in assembly. The MSP spends 95% of its time in LPM3 with only low frequency pulses on a couple of digital GPIO inputs and a 1Hz Real-Time-Clock interrupt waking the CPU up from LPM3. The Real-Time-Cock is provided by the watchdog configured in interval timer mode (Timer A and Timer B are already in use for another task). The CPU is being clocked by the DCO at 1.2MHz (approx). The MSP is never turned off or deliberately reset and the system is designed to run continuously for many years.

Typically once per day the MSP is waking up a GSM module on the same PCB and the MSP and the GSM module do serial comms for 2 to 3 minutes while the GSM module is connected on the GSM network. Generally this setup is fine and dandy, however, there appears to be a growing phenomenon of product returning from the field with a locked-up, partially locked-up or reset MSP.

There are some clues as to what is going on. The MSP maintains an event log in several spare Flash sectors where it records non-routine events. One event in particular that seems to be a recurring feature of the problem is a UART 0 overrun error. UART 0 is used to communicate with the GSM module at 2400bps. The overrun error always occurs around 20 seconds after turning on the GSM module and coincides with the GSM network TX and RX activity beginning.

The product has previously suffered from GSM interference causing significant dips on the power supply. VCC is normally 2.8V, to logic match the GSM module, and the dips have been observed going as low as 1.8V, i.e. at the minimum operating voltage! It is believed that a recent modification to the circuit has fixed this issue and no VCC dips are observed using a scope on modified examples.

Nevertheless, the issue of the MSP locking up or resetting persists. Sometimes it is apparent from the event log that interrupt level code is still running, apparently normally, and yet the foreground (main) thread is either not coming out of LPM3 or is stuck in an infinite loop.

I am aware that the VCC of 2.8V is close to the minimum Flash write and erase voltage of 2.7V, however I have not observed problems writing or erasing Flash since the VCC dips were eliminated. The program in Flash also appears verify correctly so I don't believe the code is being corrupted in the Flash.

A few ideas I have had to improve system robustness, yet to be put into operation, are;
1. Add interrupt handlers for all interrupt sources (even though all unused interrupts are disabled and thus should not be generating interrupts anyway).
2. Record in event log whether reset was PUC or POR by testing oscillator fault flag immediately after reset (OFIFG is set by POR, unchanged by PUC and no clock in use is capable of setting OFIFG).
3. Record in event log the stack pointer after reset.
4. Change function of RST/NMI pin to NMI and record any occurrences of NMI interrupt in event log.

I am really hoping some of you may have some ideas about what may be causing the problems I have described or have suggestions for improving the system immunity.

Thanks.

Beginning Microcontrollers with the MSP430

Reply by Augusto Einsfeldt ●November 19, 20092009-11-19

Your scheduled actions to do are correct. Actually, you cannot leave
any non used interrupt vectors without proper handling. You can also
use these handlings to record in the log.
Hardware reset can happen if you use an external reset chip. Most of
them are prone to generate reset in the presence of RFI. You did not
mention the solution you have to that VCC dip. Due to the high
current drain by these GSM modules you need different power supply
rails, one to GSM and one to MSP. High speed dips can still happen
and it is very difficult to filter. Then this would produce resets as
well.
It seems that you are not using watchdog feature as it should be.
You should use the watchdog timer not just to count time but also to
check if the system is stable (like verifying RAM integrity in some
critical areas, stack level, and other ocurrences). When the
processor is stuck outside the main loop the watchdog should detect
it (checking from where it awake to run the watchdog timer handler).
Failure to handle in/out of LPM3 should also be checked using some
dinamic information that would change on every sleep/wake.
When you can cover all possibilities your system is trully robust
and can recover from almost any failure state.
An external, noise proof, watchdog would give the final touch to
solve cases when the MSP430 cannot handle an watchdog timer
interrupt.
Nice to know you made it in assembly. I think it is easier to make a
fail proof design when using assembly instead of C.
best regards,
-Augusto
On Qui 19/11/09 11:02 , r...@technolog.com sent:
Hi Group,
First post so I hope you can help me. I have a battery powered
product that uses an MSP430F1481 as a pulse counter, the code is
written entirely in assembly. The MSP spends 95% of its time in LPM3
with only low frequency pulses on a couple of digital GPIO inputs and
a 1Hz Real-Time-Clock interrupt waking the CPU up from LPM3. The
Real-Time-Cock is provided by the watchdog configured in interval
timer mode (Timer A and Timer B are already in use for another task).
The CPU is being clocked by the DCO at 1.2MHz (approx). The MSP is
never turned off or deliberately reset and the system is designed to
run continuously for many years.
Typically once per day the MSP is waking up a GSM module on the same
PCB and the MSP and the GSM module do serial comms for 2 to 3 minutes
while the GSM module is connected on the GSM network. Generally this
setup is fine and dandy, however, there appears to be a growing
phenomenon of product returning from the field with a locked-up,
partially locked-up or reset MSP.
There are some clues as to what is going on. The MSP maintains an
event log in several spare Flash sectors where it records non-routine
events. One event in particular that seems to be a recurring feature
of the problem is a UART 0 overrun error. UART 0 is used to
communicate with the GSM module at 2400bps. The overrun error always
occurs around 20 seconds after turning on the GSM module and
coincides with the GSM network TX and RX activity beginning.
The product has previously suffered from GSM interference causing
significant dips on the power supply. VCC is normally 2.8V, to logic
match the GSM module, and the dips have been observed going as low as
1.8V, i.e. at the minimum operating voltage! It is believed that a
recent modification to the circuit has fixed this issue and no VCC
dips are observed using a scope on modified examples.
Nevertheless, the issue of the MSP locking up or resetting persists.
Sometimes it is apparent from the event log that interrupt level code
is still running, apparently normally, and yet the foreground (main)
thread is either not coming out of LPM3 or is stuck in an infinite
loop.
I am aware that the VCC of 2.8V is close to the minimum Flash write
and erase voltage of 2.7V, however I have not observed problems
writing or erasing Flash since the VCC dips were eliminated. The
program in Flash also appears verify correctly so I don't believe the
code is being corrupted in the Flash.
A few ideas I have had to improve system robustness, yet to be put
into operation, are;
1. Add interrupt handlers for all interrupt sources (even though all
unused interrupts are disabled and thus should not be generating
interrupts anyway).
2. Record in event log whether reset was PUC or POR by testing
oscillator fault flag immediately after reset (OFIFG is set by POR,
unchanged by PUC and no clock in use is capable of setting OFIFG).
3. Record in event log the stack pointer after reset.
4. Change function of RST/NMI pin to NMI and record any occurrences
of NMI interrupt in event log.
I am really hoping some of you may have some ideas about what may be
causing the problems I have described or have suggestions for
improving the system immunity.
Thanks.

Reply by Hugh Molesworth ●November 19, 20092009-11-19

Looks like you have some useful changes to look at, and having
handlers for every possible interrupt (enabled or not) is really
mandatory. I expect you know you can't just use a generic trap
interrupt for all unused interrupts, since some interrupts must be
specifically cleared in software otherwise they will continue to
interrupt on exit from the handler and will therefore lock up the MSP
(thrashing). If you have a spare i/o pin it is often useful to toggle
this pin within all interrupts while testing to see if when a hang-up
occurs it is indeed caused by an incorrectly handled interrupt.

Not using a true watchdog is no longer acceptable in a reliable
design; either an external watchdog reset chip or the internal
watchdog are really mandatory. At the minimum, try to free up the
internal watchdog timer and use it as a true watchdog function.

Be very aware of PUC/POR parameters. If V(min) is not met on a
brown-out (or low spike in Vcc) the cpu will hang forever. This is a
frequent problem in most MSP430F1xx designs and generally can only be
avoided by an external reset chip. A low-going spike can easily drop
below the cpu core operating voltage but fail to get below the
necessary 200mV to ensure proper reset. Also note Vpor is temperature
dependent. You can often demonstrate this problem by simply removing
and replacing the battery and waiting some variable time between the
two until you get the lock up. Once locked up in this mode you cannot
clear the lock-up without 1)external watchdog chip or 2) auto crowbar
circuit or 3) forcing Vcc below V(min).

Use Comparator_A to monitor Vcc and force an interrupt to generate a
watchdog or access fault if it drops too low. Consider (in future)
using a device with a proper SVS.

Reserve an area of ram that is not cleared on startup; use it to
maintain reset counters.

Consider refreshing function registers (at infrequent intervals);
writing a register once then never again for the many years that the
cpu is powered is placing undue faith in the static memory (latch) of
that register not to lose that setting, and this is impacted by
ambient temperature and pressure, local magnetic and electric fields
and incident radiation both from outside and from within the chip.

Trap NMI, oscillator fault and flash access violation faults and
ensure some kind of reset; increment counters in RAM to track this (not flash).

Consider migrating to a MSP430F2xxx part; they have much-improved
brown-out and reset handling (and they are often cheaper!).

Meantime, how are you handling the UART overruns? If you post code
for that area (or email it to me directly), I will have a look.
Receiver overruns are common when a device such as a GSM modem powers
up, since they often generate undefined outputs or break conditions
on power-up. Some generate total garbage, and are known to cause
problems with attached devices. The software should be designed to
handle such totally random (and lengthy) spurious transmissions.

Hugh

At 05:02 AM 11/19/2009, you wrote:
>Hi Group,
>
>First post so I hope you can help me. I have a battery powered
>product that uses an MSP430F1481 as a pulse counter, the code is
>written entirely in assembly. The MSP spends 95% of its time in
>LPM3 with only low frequency pulses on a couple of digital GPIO
>inputs and a 1Hz Real-Time-Clock interrupt waking the CPU up from
>LPM3. The Real-Time-Cock is provided by the watchdog configured in
>interval timer mode (Timer A and Timer B are already in use for
>another task). The CPU is being clocked by the DCO at 1.2MHz
>(approx). The MSP is never turned off or deliberately reset and the
>system is designed to run continuously for many years.
>
>Typically once per day the MSP is waking up a GSM module on the same
>PCB and the MSP and the GSM module do serial comms for 2 to 3
>minutes while the GSM module is connected on the GSM
>network. Generally this setup is fine and dandy, however, there
>appears to be a growing phenomenon of product returning from the
>field with a locked-up, partially locked-up or reset MSP.
>
>There are some clues as to what is going on. The MSP maintains an
>event log in several spare Flash sectors where it records
>non-routine events. One event in particular that seems to be a
>recurring feature of the problem is a UART 0 overrun error. UART 0
>is used to communicate with the GSM module at 2400bps. The overrun
>error always occurs around 20 seconds after turning on the GSM
>module and coincides with the GSM network TX and RX activity beginning.
>
>The product has previously suffered from GSM interference causing
>significant dips on the power supply. VCC is normally 2.8V, to
>logic match the GSM module, and the dips have been observed going as
>low as 1.8V, i.e. at the minimum operating voltage! It is believed
>that a recent modification to the circuit has fixed this issue and
>no VCC dips are observed using a scope on modified examples.
>
>Nevertheless, the issue of the MSP locking up or resetting
>persists. Sometimes it is apparent from the event log that
>interrupt level code is still running, apparently normally, and yet
>the foreground (main) thread is either not coming out of LPM3 or is
>stuck in an infinite loop.
>
>I am aware that the VCC of 2.8V is close to the minimum Flash write
>and erase voltage of 2.7V, however I have not observed problems
>writing or erasing Flash since the VCC dips were eliminated. The
>program in Flash also appears verify correctly so I don't believe
>the code is being corrupted in the Flash.
>
>A few ideas I have had to improve system robustness, yet to be put
>into operation, are;
>1. Add interrupt handlers for all interrupt sources (even though all
>unused interrupts are disabled and thus should not be generating
>interrupts anyway).
>2. Record in event log whether reset was PUC or POR by testing
>oscillator fault flag immediately after reset (OFIFG is set by POR,
>unchanged by PUC and no clock in use is capable of setting OFIFG).
>3. Record in event log the stack pointer after reset.
>4. Change function of RST/NMI pin to NMI and record any occurrences
>of NMI interrupt in event log.
>
>I am really hoping some of you may have some ideas about what may be
>causing the problems I have described or have suggestions for
>improving the system immunity.
>
>Thanks.
>

Reply by Mike Stovall ●November 19, 20092009-11-19

FYI
TI Appnote
SLYA014A - May 2000

----- Original Message -----
From: r...@technolog.com
To: m...
Sent: Thursday, November 19, 2009 5:02 AM
Subject: [msp430] MSP430F1481 hang ups

Hi Group,

First post so I hope you can help me. I have a battery powered product that uses an MSP430F1481 as a pulse counter, the code is written entirely in assembly. The MSP spends 95% of its time in LPM3 with only low frequency pulses on a couple of digital GPIO inputs and a 1Hz Real-Time-Clock interrupt waking the CPU up from LPM3. The Real-Time-Cock is provided by the watchdog configured in interval timer mode (Timer A and Timer B are already in use for another task). The CPU is being clocked by the DCO at 1.2MHz (approx). The MSP is never turned off or deliberately reset and the system is designed to run continuously for many years.

Typically once per day the MSP is waking up a GSM module on the same PCB and the MSP and the GSM module do serial comms for 2 to 3 minutes while the GSM module is connected on the GSM network. Generally this setup is fine and dandy, however, there appears to be a growing phenomenon of product returning from the field with a locked-up, partially locked-up or reset MSP.

There are some clues as to what is going on. The MSP maintains an event log in several spare Flash sectors where it records non-routine events. One event in particular that seems to be a recurring feature of the problem is a UART 0 overrun error. UART 0 is used to communicate with the GSM module at 2400bps. The overrun error always occurs around 20 seconds after turning on the GSM module and coincides with the GSM network TX and RX activity beginning.

The product has previously suffered from GSM interference causing significant dips on the power supply. VCC is normally 2.8V, to logic match the GSM module, and the dips have been observed going as low as 1.8V, i.e. at the minimum operating voltage! It is believed that a recent modification to the circuit has fixed this issue and no VCC dips are observed using a scope on modified examples.

Nevertheless, the issue of the MSP locking up or resetting persists. Sometimes it is apparent from the event log that interrupt level code is still running, apparently normally, and yet the foreground (main) thread is either not coming out of LPM3 or is stuck in an infinite loop.

I am aware that the VCC of 2.8V is close to the minimum Flash write and erase voltage of 2.7V, however I have not observed problems writing or erasing Flash since the VCC dips were eliminated. The program in Flash also appears verify correctly so I don't believe the code is being corrupted in the Flash.

A few ideas I have had to improve system robustness, yet to be put into operation, are;
1. Add interrupt handlers for all interrupt sources (even though all unused interrupts are disabled and thus should not be generating interrupts anyway).
2. Record in event log whether reset was PUC or POR by testing oscillator fault flag immediately after reset (OFIFG is set by POR, unchanged by PUC and no clock in use is capable of setting OFIFG).
3. Record in event log the stack pointer after reset.
4. Change function of RST/NMI pin to NMI and record any occurrences of NMI interrupt in event log.

I am really hoping some of you may have some ideas about what may be causing the problems I have described or have suggestions for improving the system immunity.

Thanks.

Reply by Joe Radomski ●November 20, 20092009-11-20

first off enable the watchdog.. any failsafe device should have a watchdog running.. you can add an external device or repurpose one of your timers to do double duty and then use the wd for its intended use...

a suggestion is to make a timer based on the 32khz clock so that you can use it for RTC and also as a system timer.. I do this frequently.. depending on the responsiveness I need I either set a 1ms or 10 ms resolution.. then just use a variable to track "ticks" for other events

upgrade to a 2xx part they are much better with margnal power.. a 1xx device really demands an external voltage monitor, the 2xx parts do this reliably when enabled.. they are also cheaper! the money that you spend on a redesign should be recouped pretty quickly..

--- On Thu, 11/19/09, r...@technolog.com wrote:
From: r...@technolog.com
Subject: [msp430] MSP430F1481 hang ups
To: m...
Date: Thursday, November 19, 2009, 8:02 AM

Hi Group,

First post so I hope you can help me. I have a battery powered product that uses an MSP430F1481 as a pulse counter, the code is written entirely in assembly. The MSP spends 95% of its time in LPM3 with only low frequency pulses on a couple of digital GPIO inputs and a 1Hz Real-Time-Clock interrupt waking the CPU up from LPM3. The Real-Time-Cock is provided by the watchdog configured in interval timer mode (Timer A and Timer B are already in use for another task). The CPU is being clocked by the DCO at 1.2MHz (approx). The MSP is never turned off or deliberately reset and the system is designed to run continuously for many years.

Typically once per day the MSP is waking up a GSM module on the same PCB and the MSP and the GSM module do serial comms for 2 to 3 minutes while the GSM module is connected on the GSM network. Generally this setup is fine and dandy, however, there appears to be a growing phenomenon of product returning from the field with a locked-up, partially locked-up or reset MSP.

There are some clues as to what is going on. The MSP maintains an event log in several spare Flash sectors where it records non-routine events. One event in particular that seems to be a recurring feature of the problem is a UART 0 overrun error. UART 0 is used to communicate with the GSM module at 2400bps. The overrun error always occurs around 20 seconds after turning on the GSM module and coincides with the GSM network TX and RX activity beginning.

The product has previously suffered from GSM interference causing significant dips on the power supply. VCC is normally 2.8V, to logic match the GSM module, and the dips have been observed going as low as 1.8V, i.e. at the minimum operating voltage! It is believed that a recent modification to the circuit has fixed this issue and no VCC dips are observed using a scope on modified examples.

Nevertheless, the issue of the MSP locking up or resetting persists. Sometimes it is apparent from the event log that interrupt level code is still running, apparently normally, and yet the foreground (main) thread is either not coming out of LPM3 or is stuck in an infinite loop.

I am aware that the VCC of 2.8V is close to the minimum Flash write and erase voltage of 2.7V, however I have not observed problems writing or erasing Flash since the VCC dips were eliminated. The program in Flash also appears verify correctly so I don't believe the code is being corrupted in the Flash.

A few ideas I have had to improve system robustness, yet to be put into operation, are;
1. Add interrupt handlers for all interrupt sources (even though all unused interrupts are disabled and thus should not be generating interrupts anyway).
2. Record in event log whether reset was PUC or POR by testing oscillator fault flag immediately after reset (OFIFG is set by POR, unchanged by PUC and no clock in use is capable of setting OFIFG).
3. Record in event log the stack pointer after reset.
4. Change function of RST/NMI pin to NMI and record any occurrences of NMI interrupt in event log.

I am really hoping some of you may have some ideas about what may be causing the problems I have described or have suggestions for improving the system immunity.

Thanks.

Reply by Joerg ●November 20, 20092009-11-20

Your line breaks don't works, somehow. So let me answer on top:

Good pointers from the other posters. If all that still fails let me
give some suggestions.

You said that "it is believed" the 1.8V drop is no issue anymore. Are
you 100% sure? Such a drop is not normal, sounds like some regulator
going out of whack. Low-dropout (LDO) versions are known to be finicky.

Your statement "... occurs around 20 seconds after turning on the GSM
module and coincides with the GSM network TX and RX activity ..." Unless
your supply is collapsing under the GSM transmit stage demand that
sounds like RF interference. GSM is pretty hard on circuitry, especially
during the negotiating sequence with the next cell tower. Essentially
anything that has a forward-biased diode path or base-emitter junction
in there is suspect. Since the MSP430 is mostly CMOS (as far as I know)
you might want to take a look at circuits around it, including any
regulators.

--
Regards, Joerg

http://www.analogconsultants.com/
r...@technolog.com wrote:
> Hi Group,
>
> First post so I hope you can help me. I have a battery powered product that uses an MSP430F1481 as a pulse counter, the code is written entirely in assembly. The MSP spends 95% of its time in LPM3 with only low frequency pulses on a couple of digital GPIO inputs and a 1Hz Real-Time-Clock interrupt waking the CPU up from LPM3. The Real-Time-Cock is provided by the watchdog configured in interval timer mode (Timer A and Timer B are already in use for another task). The CPU is being clocked by the DCO at 1.2MHz (approx). The MSP is never turned off or deliberately reset and the system is designed to run continuously for many years.
>
> Typically once per day the MSP is waking up a GSM module on the same PCB and the MSP and the GSM module do serial comms for 2 to 3 minutes while the GSM module is connected on the GSM network. Generally this setup is fine and dandy, however, there appears to be a growing phenomenon of product returning from the field with a locked-up, partially locked-up or reset MSP.
>
> There are some clues as to what is going on. The MSP maintains an event log in several spare Flash sectors where it records non-routine events. One event in particular that seems to be a recurring feature of the problem is a UART 0 overrun error. UART 0 is used to communicate with the GSM module at 2400bps. The overrun error always occurs around 20 seconds after turning on the GSM module and coincides with the GSM network TX and RX activity beginning.
>
> The product has previously suffered from GSM interference causing significant dips on the power supply. VCC is normally 2.8V, to logic match the GSM module, and the dips have been observed going as low as 1.8V, i.e. at the minimum operating voltage! It is believed that a recent modification to the circuit has fixed this issue and no VCC dips are observed using a scope on modified examples.
>
> Nevertheless, the issue of the MSP locking up or resetting persists. Sometimes it is apparent from the event log that interrupt level code is still running, apparently normally, and yet the foreground (main) thread is either not coming out of LPM3 or is stuck in an infinite loop.
>
> I am aware that the VCC of 2.8V is close to the minimum Flash write and erase voltage of 2.7V, however I have not observed problems writing or erasing Flash since the VCC dips were eliminated. The program in Flash also appears verify correctly so I don't believe the code is being corrupted in the Flash.
>
> A few ideas I have had to improve system robustness, yet to be put into operation, are;
> 1. Add interrupt handlers for all interrupt sources (even though all unused interrupts are disabled and thus should not be generating interrupts anyway).
> 2. Record in event log whether reset was PUC or POR by testing oscillator fault flag immediately after reset (OFIFG is set by POR, unchanged by PUC and no clock in use is capable of setting OFIFG).
> 3. Record in event log the stack pointer after reset.
> 4. Change function of RST/NMI pin to NMI and record any occurrences of NMI interrupt in event log.
>
> I am really hoping some of you may have some ideas about what may be causing the problems I have described or have suggestions for improving the system immunity.
>
> Thanks.
>

Reply by rwil...@technolog.com ●November 23, 20092009-11-23

Many thanks to everyone who has responded, I am very grateful for all the useful advice dispensed.

To respond specifically to the question that was raised on the Vcc dips and the steps that have been taken to eliminate them;

Vcc is regulated by a Maxim MAX8880 adjustable LDO Linear Regulator. The o/p is adjusted to the desired value by 2 feedback resistors which divide the o/p voltage to a FB pin. It provides power for the MSP and associated IO that interfaces with the GSM module. The GSM module takes power directly from the battery and has its own internal regulation down to the same voltage as the MSP.

We had seen some Vcc pick-up in the first version of the product, which was eventually solved by decoupling the resistor in the ‘top leg’ of the voltage divider with a 47pF capacitor – so we knew the FB circuit of the regulator was prone to picking up the GSM TX burst. This helped us identify a solution when the pick-up returned on the current product version (which has a much stronger antenna).

Using various combinations of decoupling capacitors had no effect, so we decreased the impedance of the FB loop by dropping the resistor values if the FB resistors.
Previous values were 510k (top leg) and 390k (bottom leg). These were dropped to 130k and 100k respectively.
This helped enormously, and removing the 47pF from across the top leg resistor finished the job.

We cannot now record any dips on Vcc, whereas before we were seeing massive drops in Vcc corresponding directly in timing and duration to the GSM TX burst.

Regarding the responses that suggest modifications to the hardware design, unfortunately it is very difficult to make changes, partly because the product is largely encapsulated, partly because it is subject to approvals and would require expensive re-certification. Firmware changes are the only realistic option for existing product.

We may be able to look at moving to the 2xx series for new product as I understand it is a pin-for-pin drop in replacement (some SFR changes aside). This has been looked at before purely from a cost benefit perspective but was rejected due to the firmware modifications that were required and that our existing production line programming software isn’t compatible with the 2xx family. There may be more factors in favour of changing now.

Using the watchdog for its intended purpose is also a possibility as I believe the work currently being done by both timers can be achieved with just timer B and some extra instructions, it will just be less elegant and slightly less efficient – a small price to pay for greater system robustness. This will free timer A for RTC duty and free the watchdog timer for watchdog duty.

A couple of posters mentioned inherent unreliability of the 1xx series on marginal power and in the POR circuit. Is there any official TI documentation regarding this issue? I would really like to understand it better.

The issue of the GSM module outputting garbage on the serial port during start up is an interesting one. Is it true to say that garbage received by the MSP can directly lead to overrun errors? With the baud rate generator set to a specific rate, is the UART actually capable of generating interrupts with what it believes to be valid serial characters faster than it normally would, thereby leading to overrun errors? I only see overruns, not framing errors.

Reply by Mike Stovall ●November 23, 20092009-11-23

Please see this latch-up application from TI from 2000. Sorry about the
font size. I had the comparator circuit causing me latch-up in a MSP430F437.
http://focus.ti.com/lit/an/slya014a/slya014a.pdf

----- Original Message -----
From:
To:
Sent: Monday, November 23, 2009 7:58 AM
Subject: [msp430] Re: MSP430F1481 hang ups
> Many thanks to everyone who has responded, I am very grateful for all the
> useful advice dispensed.
>
> To respond specifically to the question that was raised on the Vcc dips
> and the steps that have been taken to eliminate them;
>
> Vcc is regulated by a Maxim MAX8880 adjustable LDO Linear Regulator. The
> o/p is adjusted to the desired value by 2 feedback resistors which divide
> the o/p voltage to a FB pin. It provides power for the MSP and associated
> IO that interfaces with the GSM module. The GSM module takes power
> directly from the battery and has its own internal regulation down to the
> same voltage as the MSP.
>
> We had seen some Vcc pick-up in the first version of the product, which
> was eventually solved by decoupling the resistor in the ~top legT of
> the voltage divider with a 47pF capacitor " so we knew the FB circuit of
> the regulator was prone to picking up the GSM TX burst. This helped us
> identify a solution when the pick-up returned on the current product
> version (which has a much stronger antenna).
>
> Using various combinations of decoupling capacitors had no effect, so we
> decreased the impedance of the FB loop by dropping the resistor values if
> the FB resistors.
> Previous values were 510k (top leg) and 390k (bottom leg). These were
> dropped to 130k and 100k respectively.
> This helped enormously, and removing the 47pF from across the top leg
> resistor finished the job.
>
> We cannot now record any dips on Vcc, whereas before we were seeing
> massive drops in Vcc corresponding directly in timing and duration to the
> GSM TX burst.
>
> Regarding the responses that suggest modifications to the hardware design,
> unfortunately it is very difficult to make changes, partly because the
> product is largely encapsulated, partly because it is subject to approvals
> and would require expensive re-certification. Firmware changes are the
> only realistic option for existing product.
>
> We may be able to look at moving to the 2xx series for new product as I
> understand it is a pin-for-pin drop in replacement (some SFR changes
> aside). This has been looked at before purely from a cost benefit
> perspective but was rejected due to the firmware modifications that were
> required and that our existing production line programming software
> isnTt compatible with the 2xx family. There may be more factors in
> favour of changing now.
>
> Using the watchdog for its intended purpose is also a possibility as I
> believe the work currently being done by both timers can be achieved with
> just timer B and some extra instructions, it will just be less elegant and
> slightly less efficient " a small price to pay for greater system
> robustness. This will free timer A for RTC duty and free the watchdog
> timer for watchdog duty.
>
> A couple of posters mentioned inherent unreliability of the 1xx series on
> marginal power and in the POR circuit. Is there any official TI
> documentation regarding this issue? I would really like to understand it
> better.
>
> The issue of the GSM module outputting garbage on the serial port during
> start up is an interesting one. Is it true to say that garbage received
> by the MSP can directly lead to overrun errors? With the baud rate
> generator set to a specific rate, is the UART actually capable of
> generating interrupts with what it believes to be valid serial characters
> faster than it normally would, thereby leading to overrun errors? I only
> see overruns, not framing errors.
>
>

Reply by Hugh Molesworth ●November 23, 20092009-11-23

You have an interesting problem, since I feel it
is so typical of instruments out in the field
which lack some detailed foresight, or
knowledge-based experience, in the design phase
followed by a testing phase which does not
identify the nascent problem areas. I mean no
offence by that remark, and often simple budget
constraints are the root cause. You need an
engineer who has suffered many failures over the
years to know how to avoid them :-)

The MAX8880 LDO has a poor power supply rejection
ratio at higher frequencies, such as those
encountered when GSM modems are driven from the
same battery. To help obviate this ensure that
the LDO output capacitors have extremely low
values of ESR. Not much help for units in the field, unfortunately.

So back to firmware solutions. The problem looks
like a cpu stall of some sort, but interestingly
interrupts must still be running since you report
that you can see results of overrun counters
incrementing as a result of the receiver
overruns. Were this a simple reset lock-up issue
due to brown-outs (known to be a problem on the
1xxx series), then interrupts would not be able
to increment these counters since the cpu core
would have halted completely. As you surmise, it
is unlikely that GSM-modem transmitted garbage is
the cause of the overruns, since then one would
expect framing errors as well, which you report
that you don't see. Note here that since we have
not had sight of any code, we assume that you are
correctly handling both receiver overruns and
received frame errors, and often in code I have
reviewed this is not the case :-)

So we have a cpu still running, but apparently
stalled in a foreground task (not an interrupt,
or at least not a low-priority interrupt with GIE
reset) but not stopping interrupts. Either the
stall is intermittent with interrupts temporarily
disabled by code so that the receiver interrupt
cannot get access to empty the receiver
registers, or a higher-priority interrupt is
thrashing such that again the receiver interrupt
cannot get access; a thrashing lower-priority
interrupt would not cause this issue since it
would not block the receiver interrupt. Now
USART0 - which you indicated is source of the
problem with overruns - can only be properly
eclipsed on the F1481 by the watchdog timer,
comparator A, Timer B7, NMI group (NMI, Flash,
Oscillator fault) or various forms of reset. So, which of these are enabled?

Now we have two options, foreground stall with
GIE reset or a high-priority interrupt thrashing.
There is a third option, where the cpu is stalled
with GIE set so that the USART0 interrupts are
running, but the code fails when the foreground
code does not process received data buffers such
that they fill and the receiver interrupt then
(erroneously) allows a receiver overrun condition
to occur because it incorrectly stops emptying
the USART Rx register. Only an inspection of your
code will reveal that :-) Actually of course
there is a fourth option as well, the RTC simply
never brings the cpu core out of sleep.

So what do I mean by a "stall"? This is the point
where you look at all pending conditionals that
wait for a flag or a hardware bit is some such
condition that you know "must" happen in a short
period, and the answer is often that it simply
doesn't. To protect against this, and ensure much
more robust code, use time-outs on all pending
bit tests or acks from GSM modems or acks from
firmware modules such that when a bit or flag or
condition doesn't happen the ensuing timeout will
properly process the error, rather than just
hanging and freezing the instrument while still
recording things like GSM modem serial port
overruns. Is it a pain to code all these run-time
tests? Sure, but look what pain not coding them
can cause ... as for maybe simply not waking up
on the watchdog timer, put a fall-back wakeup in
timers A and/or B as protection, and log
occurrence counts of this error. Can't do that
'cos timers are off? Drive a capacitor on a port
pin and use the charge or discharge to generate a port interrupt as a backup.

So why would a bit or whatever not do the
expected thing? Here are some known hardware bugs
in the F1481, and let's start with the bug that
can "lose" a hardware handshake from a GSM modem
(are you using either P1 or P2 to handshake with
the modem or anything else in your design? :-)
What happens when the handshake doesn't? ...)

PORT3 - Bug description:
Module: PORT1/2, Function: Port interrupts can get lost
Port interrupts can get lost if they occur during
CPU access of the P1IFG and P2IFG registers.
Workaround:
None

Here are some UART bugs on the MSP430F1xxx parts
(including the F1481 which you are using),
including "Unpredictable program execution"!!:

US13 - Bug description:
Module: USART0, USART1, Function: Unpredictable program execution
USART interrupts requested by URXS can result in
unpredictable program execution if this request
is not served within two bit times of the received data.
Workaround:
Ensure that the interrupt service routine is
entered within two bit times of the received data.

US15 - Bug description:
Module: USART0, USART1, Function: UART receive with two stop bits
USART hardware does not detect a missing second
stop bit when SPB = 1. The Framing Error Flag
(FE) will not be set under this condition and
erroneous data reception may occur.
Workaround:
None (Configure USART for a single stop bit, SPB = 0)

Hugh

At 07:58 AM 11/23/2009, you wrote:
>Many thanks to everyone who has responded, I am
>very grateful for all the useful advice dispensed.
>
>To respond specifically to the question that was
>raised on the Vcc dips and the steps that have been taken to eliminate them;
>
>Vcc is regulated by a Maxim MAX8880 adjustable
>LDO Linear Regulator. The o/p is adjusted to
>the desired value by 2 feedback resistors which
>divide the o/p voltage to a FB pin. It provides
>power for the MSP and associated IO that
>interfaces with the GSM module. The GSM module
>takes power directly from the battery and has
>its own internal regulation down to the same voltage as the MSP.
>
>We had seen some Vcc pick-up in the first
>version of the product, which was eventually
>solved by decoupling the resistor in the ‘top
>leg’ of the voltage divider with a 47pF
>capacitor – so we knew the FB circuit of the
>regulator was prone to picking up the GSM TX
>burst. This helped us identify a solution when
>the pick-up returned on the current product
>version (which has a much stronger antenna).
>
>Using various combinations of decoupling
>capacitors had no effect, so we decreased the
>impedance of the FB loop by dropping the resistor values if the FB resistors.
>Previous values were 510k (top leg) and 390k
>(bottom leg). These were dropped to 130k and 100k respectively.
>This helped enormously, and removing the 47pF
>from across the top leg resistor finished the job.
>
>We cannot now record any dips on Vcc, whereas
>before we were seeing massive drops in Vcc
>corresponding directly in timing and duration to the GSM TX burst.
>
>Regarding the responses that suggest
>modifications to the hardware design,
>unfortunately it is very difficult to make
>changes, partly because the product is largely
>encapsulated, partly because it is subject to
>approvals and would require expensive
>re-certification. Firmware changes are the only
>realistic option for existing product.
>
>We may be able to look at moving to the 2xx
>series for new product as I understand it is a
>pin-for-pin drop in replacement (some SFR
>changes aside). This has been looked at before
>purely from a cost benefit perspective but was
>rejected due to the firmware modifications that
>were required and that our existing production
>line programming software isn’t compatible
>with the 2xx family. There may be more factors in favour of changing now.
>
>Using the watchdog for its intended purpose is
>also a possibility as I believe the work
>currently being done by both timers can be
>achieved with just timer B and some extra
>instructions, it will just be less elegant and
>slightly less efficient – a small price to pay
>for greater system robustness. This will free
>timer A for RTC duty and free the watchdog timer for watchdog duty.
>
>A couple of posters mentioned inherent
>unreliability of the 1xx series on marginal
>power and in the POR circuit. Is there any
>official TI documentation regarding this
>issue? I would really like to understand it better.
>
>The issue of the GSM module outputting garbage
>on the serial port during start up is an
>interesting one. Is it true to say that garbage
>received by the MSP can directly lead to overrun
>errors? With the baud rate generator set to a
>specific rate, is the UART actually capable of
>generating interrupts with what it believes to
>be valid serial characters faster than it
>normally would, thereby leading to overrun
>errors? I only see overruns, not framing errors.
>
>

Reply by Joerg ●November 23, 20092009-11-23

r...@technolog.com wrote:
> Many thanks to everyone who has responded, I am very grateful for all the useful advice dispensed.
>
> To respond specifically to the question that was raised on the Vcc dips and the steps that have been taken to eliminate them;
>
> Vcc is regulated by a Maxim MAX8880 adjustable LDO Linear Regulator. The o/p is adjusted to the desired value by 2 feedback resistors which divide the o/p voltage to a FB pin. It provides power for the MSP and associated IO that interfaces with the GSM module. The GSM module takes power directly from the battery and has its own internal regulation down to the same voltage as the MSP.
>

Careful with that MAX8880. Digikey shows no-stock which does not
surprise me.

Personally I stay away from LDO regulators because many exhibit
stability issues which are often scantily documented. Anyhow, with GSM
interference much of the solution is in the layout. If you ever re-do
that have an RF expert look it over. That should also include reviewing
interconnects, anything that has a length of more than a few tenths of
an inch.

[...]

--
Regards, Joerg

http://www.analogconsultants.com/

Previous12 Next

MSP430F1481 hang ups

Beginning Microcontrollers with the MSP430

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group