EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Illegal command reboot fails

Started by Matti Ruusunen October 21, 2003
Hejsan folks,

I have crashed into situation where compiler produces code that in one instance
works and in the other does not. The difference between the codes is a string
inside a sprintf-command. This shows up elsewhere in function by that watchdog
reset stops working/reboot sequence fails in illegal command.(!!!!) 

This difference between the codes is very trivial, should not affect the
procedure itself, and therefore this malfunction is something I'd certainly
not expect. 

I cannot think of any reasonable explanation for, why chip fails to reboot from
illegal command but does reboot fine from external /RESET signal. 

The chip is F149 and I did check the errata sheets (for no avail). The behavior
may be related to compiler but still I cannot see any logical error.

Does anyone of you have a clue? 

I have earlier seen misbehavior in (un)certain situations where UART1 receive
register is set if IE register was wrote upon. 

But for this one I don't see any work arounds. 

Regards,
Matti Ruusunen

Beginning Microcontrollers with the MSP430

Heips,

I cannot give that straight hand. :| I had your idea in mind and decided to
check it out.
I placed a LED output (i.e. portdir and portout settings) right on to first line
of main_ and some loop to slow it down. Hence, if reboot goes from initial
memory settings to code execution there should be a short led blink.


I found out that what I expected to be NO LED versus A LED BLINK is actually: 

continuous LED BLINKs versus 2-3 LED BLINKS.

Hence, the chip _never_ actually booted up cleanly. Well, I am astonished by
this nice behavior but now this POR versus PUC reset is at least traceable. :)
Actually now this debugging looks so trivial that I guess I am not really up to
industry standard, as I was severely frozen by it for several days. :-(

I will mail you all later about my findinds, if any.

Regards,
matti Ruusunen

> -----Alkupernen viesti-----
> Lett Anders Lindgren [mailto:andersl@ande...]
> Letetty: 21. lokakuuta 2003 15:43
> Vastaanottaja: Matti Ruusunen
> Aihe: Re: [msp430] Illegal command reboot fails
> 
> 
> 
> Hi!
> 
> Could you post some more information about the sprintf problem,
> without any kind of source code it is difficult find the cause.
> 
> My guess is that the sprintf is writing to some kind of memory that it
> is not allowed to touch.  This could happend if you have allocated a
> buffer that is too small, or if you use a pointer that points to dead
> memory somehow.
> 
>     -- Anders
> -- 
> Disclaimer: Opinions expressed in this posting are strictly my own and
> not necessarily those of my employer.
> 

--- In msp430@msp4..., "Matti Ruusunen"
<matti.ruusunen@i...>
wrote:
> I have crashed into situation where compiler produces code that in
one instance works and in the other does not. The difference between
the codes is a string inside a sprintf-command. This shows up
elsewhere in function by that watchdog reset stops working/reboot
sequence fails in illegal command.(!!!!) 

Well, I noticed something very odd in IAR MSP430 C v2.10A -- rater
than unloading the stack immediately after a call to sprintf() (which
places sprintf()'s arguments on the stack), this version (and not
v1.26B) unloads the stack at the end of the function (!).

That means that multiple calls to sprintf() inside a function will
keep "deepening" the stack, only to be unloaded at the end of the
function.

Perhaps this is what's affecting your code?

--Andrew


Matti, my best guess is that you have a lot of
         strcat((char *)txbuf, str);
and possibly one of them, the txbuf has not been initialized to a good 
state. You have lots of txbuf[0] but the function standard_low_current() 
has a straight call to strcat((char *)txbuf, str); and if txbuf is not 
initialized correctly, anything can happen.

This sort of random behavior definitely points to a random memory write 
error, which unfortunately is very difficult to nail down. Sorry I can't be

of more help, but I think more info is needed.

At 03:25 PM 10/21/2003 +0300, Matti Ruusunen wrote:
>Hejsan folks,
>
>I have crashed into situation where compiler produces code that in one 
>instance works and in the other does not. The difference between the codes 
>is a string inside a sprintf-command. This shows up elsewhere in function 
>by that watchdog reset stops working/reboot sequence fails in illegal 
>command.(!!!!)
>
>This difference between the codes is very trivial, should not affect the 
>procedure itself, and therefore this malfunction is something I'd 
>certainly not expect.
>
>I cannot think of any reasonable explanation for, why chip fails to reboot 
>from illegal command but does reboot fine from external /RESET signal.
>
>The chip is F149 and I did check the errata sheets (for no avail). The 
>behavior may be related to compiler but still I cannot see any logical
error.
>
>Does anyone of you have a clue?
>
>I have earlier seen misbehavior in (un)certain situations where UART1 
>receive register is set if IE register was wrote upon.
>
>But for this one I don't see any work arounds.
>
>Regards,
>Matti Ruusunen
>
>
>.
>
>
>
>">http://docs.yahoo.com/info/terms/

// richard (This email is for mailing lists. To reach me directly, please 
use richard@rich...) 


Matti,

I am not sure what compiler you are using, but I have found the IAR compiler 
uses a ton of RAM when running sprintf.  When I did not allocate enough 
stack, it started to work in a strange way.  I found the problem in C-spy 
where it created strange strings.  Try increasing the available stack space 
if you can to see if it makes a difference.

Lou


>I have crashed into situation where compiler
produces code that in one 
>instance works and in the other does not. The difference between the codes 
>is a string inside a sprintf-command. This shows up elsewhere in function 
>by that watchdog reset stops working/reboot sequence fails in illegal 
>command.(!!!!)
>
>This difference between the codes is very trivial, should not affect the 
>procedure itself, and therefore this malfunction is something I'd
certainly 
>not expect.

_________________________________________________________________
Surf and talk on the phone at the same time with broadband Internet access. 
Get high-speed for as low as $29.95/month (depending on the local service 
providers in your area).  https://broadband.msn.com


I was first in process of telling in details about findings but then felt that
this is unnecessary. Please ignore the earlier mail. It was unmature. I felt I
would check few more things but now that it went to the mailing list, I put up
my findings.


I still don't know how this sensitivity to crash started and I have not yet
tried whether this is possible to carry out otherwise. But for the piece of code
I have the following happened and I guess I can only use term 'on certain
situations'.


Situation:

There has been a PUC. The chip (MSP430F149) tries to recover from it and begins
execute code.

The following setups are in main() with the results:


1) If TBCTL register is not cleared BEFORE watchdog (WDTCTL) register is set
(i.e. watchdog stopped) the chip keeps crashing.

2)
if TBCTL is cleared but TBCCTL0 and TBCCTL1 were not BEFORE the watchdog
(WDTCTL), then there were double crashes.

3)
if TBCTL and TBCCTL0 are cleared but not TBCCTL1 BEFORE the watchdog (WDTCTL),
then there were double crashes. (yes, i tried this one also ;)

4) 
If TBCTL and TBCCTL0 and TBCCTL1 are cleared (again before WDTCTL) then the PUC
and recovery from it are succesful.

If before PUC an IFG of an TimerB register was set, then this IFG was also not
cleared in PUC. Also the interrupts of timerB are not masked automaticly in PUC.
Result was that the interrupt was serviced right after PUC (cannot point exact
timing).


The work around all this is to clear up timerB registers before setting WDTCTL.
Doing this afterwards crashes the chip.


I don't remember reading of this behavior from the user manual.


Regards,
Matti Ruusunen

Well, I have to continue the yesterday mail (now being more wiser again). I
believe this is so close to truth that there is no reason for me to study it any
further but remember it for future cases .. and CERTAINLY AVOID IT. :-)

The fact is: If timerB was employed before PUC, it is fully armed after PUC too.
If it has interrupt capacity, it is then after PUC requesting these interrupts.
But also in reboot all interrupts are globally masked and hence this requesting
does not show up in code execution. Also if timerB is taken into use in reboot
sequence or at least nullified, then things go ok. If it is for instance
employed only occasionally and the registers are not nulled in the reboot .. 
then there is this disaster.

In my case there were several things that made my chip sensitive to this feature
to become disasterous.

In the code:

1)  timerB interrupt vectors are occupied by function pointers. This makes code
execution quite painless and makes it easier to read and understand. In reboot
these pointers are nulled (like the rest of the memory too).

2) The code employes a few custom made library functions. In one of them the
interrupts were enabled. This went unnoticed - mainly because it was largely
intended. Only after debugging the source code this remained in shadows and led
into false directions.

3) If interrupts are enabled before the function pointers are defined, the code
execution goes into undefined area (0x0000?). This caused crash.

Another peculiarity in code was that were 2 _EINTs, one in library function and
one in main(). This is because of my lousy coding style and the prototype
status. :)


Now, initially. My problem was that a tiny one space ' ' difference in
SPRINTF-command far away from main() clause and timerB and everything caused a
huge crash, if there was a PUC. I still cannot explain it because when I hunted
it down, I did not touch main- nor initialization routines. 

An explanation could be that in the crash -case the compiler stopped optimizing
the 1st _EINT away.


In any case, in future I will not expect the IE-bits be masked automatically but
do that explicitely by hand. :)


Regards,
Matti Ruusunen



Memfault Beyond the Launch