EmbeddedRelated.com
Forums
Memfault Beyond the Launch

AVR ATmega644 mysterious reset ?

Started by Unknown May 25, 2007
Hello All,

I'm having a hard time struggling with a reset of my application. The 
Atmega644 is used for an RFID application and handles communication + 
protocol and all RFID operations including an anticollision scheme. The 
anticollision is implemented via recursive call's.

The application is written in CodeVisionAVR C compiler vers. 1.25.2.

I'm debugging in AVRstudio vers. 4.12 build 460 using JTAG mkII ICE.

The problem is an uncontrolled reset of the ATmega644. The reset always 
occurs when I loads the anticollision to a maximum by makeing the RFID 
reader handle way to many tags in the read area.

Now the mysterious part is:
*
I have removed all Watchdog enabling/resetting to eliminate the possibility 
of a watchdog reset. I have verified that the watchdog registers are not 
written during execution.

*
I have disabled the "Brown-out enable" fuse.

*
I verified that the application not by accident jumps to the undefined part 
of the program memory and just runs until it returns to 0x0000 (Reset 
Vector). Further more all volatile memory is cleared when the problem 
occurs - stating that my problem is a reset for sure!

*
I have made a "Reset Source checking" testing on the first five bits of the 
MCUSR register which descripes the cause of the last reset. I'm breakning 
the program execution at 0x0000 so I can check this register first thing 
after a reset. But when my problem occurs the register setting is always 
0x00. When ressting the device other ways the register indicates the right 
reasson e.g. JTAG reset or External Reset.

A possibility could be a stack overflow caused by the recursivity - but that 
should not generate a uC reset?

I have tried getting closer to the bug using EEprom debug variables and so, 
but the antocollision algorithm is very timing strict which just made this 
aproach corrupting the application.

Any ideas on how I'll be getting closer to solve this anoying bug? Or anyone 
have an idea why I see this problem?

Best Regards

--
Morten M. J.
Ba.Sci.EE

(this is also posted on avrfreaks.net) 


> Now the mysterious part is: > * > I have removed all Watchdog enabling/resetting to eliminate the > possibility of a watchdog reset. I have verified that the watchdog > registers are not written during execution. > > * > I have disabled the "Brown-out enable" fuse. > > * > I verified that the application not by accident jumps to the undefined > part of the program memory and just runs until it returns to 0x0000 (Reset > Vector). Further more all volatile memory is cleared when the problem > occurs - stating that my problem is a reset for sure! > > * > I have made a "Reset Source checking" testing on the first five bits of > the MCUSR register which descripes the cause of the last reset. I'm > breakning the program execution at 0x0000 so I can check this register > first thing after a reset. But when my problem occurs the register setting > is always 0x00. When ressting the device other ways the register indicates > the right reasson e.g. JTAG reset or External Reset. > > A possibility could be a stack overflow caused by the recursivity - but > that should not generate a uC reset? > > I have tried getting closer to the bug using EEprom debug variables and > so, but the antocollision algorithm is very timing strict which just made > this aproach corrupting the application. > > Any ideas on how I'll be getting closer to solve this anoying bug? Or > anyone have an idea why I see this problem? > > Best Regards >
A stack overflow will do the most strange things.....
On May 25, 10:11 am, "Morten M J=F8rgensen" <n...@fake.mail.com> wrote:
> I verified that the application not by accident jumps to the undefined pa=
rt
> of the program memory and just runs until it returns to 0x0000 (Reset > Vector). Further more all volatile memory is cleared when the problem > occurs - stating that my problem is a reset for sure!
Nope, that just means that you have probably run a major portion of your startup again. If you started from your 2nd, 3rd, or 4th instruction of your startup would the observable results be any different than a full reset? Startup usually contains code for other basic setup (such as stack position) before memory clearing. Since much of that is already done you wouldn't notice if it had bee skipped.
> A possibility could be a stack overflow caused by the recursivity - but t=
hat
> should not generate a uC reset?
No but it could still jump to your start location. All that has to happen is the return address on the stack gets overwritten with the address of the start vector and then on return you do something similar to a reset missing only the HW side effects.
> Any ideas on how I'll be getting closer to solve this anoying bug? Or any=
one
> have an idea why I see this problem?
Limit the number of tags you'll process. Sneak up on the number that starts causing a problem. It may be easier to diagnose with a minimal case. do NOT ignore odd behaviour at quantities below that required to cause the failure, they may be early signs of the root cause and since you may still have a partially operating system they might be easier to diagnose. And take a good look at what ever memory usage you have on a per tag basis. If you are using dynamic memory allocation particularly something from the *alloc family there is a good chance the heap and stack are colliding, and if you are then you probably should switch to something more robust. Robert
On 25/05/2007 Morten M Jxrgensen wrote:

> Hello All, > > I'm having a hard time struggling with a reset of my application. The > Atmega644 is used for an RFID application and handles communication + > protocol and all RFID operations including an anticollision scheme. > The anticollision is implemented via recursive call's. > > The application is written in CodeVisionAVR C compiler vers. 1.25.2. > > I'm debugging in AVRstudio vers. 4.12 build 460 using JTAG mkII ICE. > > The problem is an uncontrolled reset of the ATmega644. The reset > always occurs when I loads the anticollision to a maximum by makeing > the RFID reader handle way to many tags in the read area. > > Now the mysterious part is: > * > I have removed all Watchdog enabling/resetting to eliminate the > possibility of a watchdog reset. I have verified that the watchdog > registers are not written during execution. > > * > I have disabled the "Brown-out enable" fuse. > > * > I verified that the application not by accident jumps to the > undefined part of the program memory and just runs until it returns > to 0x0000 (Reset Vector). Further more all volatile memory is cleared > when the problem occurs - stating that my problem is a reset for sure! > > * > I have made a "Reset Source checking" testing on the first five bits > of the MCUSR register which descripes the cause of the last reset. > I'm breakning the program execution at 0x0000 so I can check this > register first thing after a reset. But when my problem occurs the > register setting is always 0x00. When ressting the device other ways > the register indicates the right reasson e.g. JTAG reset or External > Reset. > > A possibility could be a stack overflow caused by the recursivity - > but that should not generate a uC reset? > > I have tried getting closer to the bug using EEprom debug variables > and so, but the antocollision algorithm is very timing strict which > just made this aproach corrupting the application. > > Any ideas on how I'll be getting closer to solve this anoying bug? Or > anyone have an idea why I see this problem? > > Best Regards
The source of a reset can be found in the MCUSR register. Read it on startup and then reset it. If you are not getting a 'real' reset but just a jump to the reset vector, the register will remain cleared. -- John B
"Morten M J&#4294967295;rgensen" <neax@fake.mail.com> wrote in message
news:4656ee6d$0$199$edfadb0f@dread11.news.tele.dk...
> I verified that the application not by accident jumps to the undefined
part
> of the program memory and just runs until it returns to 0x0000 (Reset > Vector). Further more all volatile memory is cleared when the problem > occurs - stating that my problem is a reset for sure!
Nope.
> * > I have made a "Reset Source checking" testing on the first five bits of
the
> MCUSR register which descripes the cause of the last reset. I'm breakning > the program execution at 0x0000 so I can check this register first thing > after a reset. But when my problem occurs the register setting is always > 0x00. When ressting the device other ways the register indicates the right > reasson e.g. JTAG reset or External Reset. > > A possibility could be a stack overflow caused by the recursivity - but
that
> should not generate a uC reset?
Nope, but it can cause a return to 0000. Try to imagine this: You run a recursive routine, that, at some point tests something and keeps calling itself until some result variable yields 0. The moment your stack, wich also contains return addresses, grows into the data area and your routine decides the test result is 0 and stores this value....on the stack. You have just overwritten your return address with 0, the routine ends, a return is executed to.... 0. Bingo! It appears as if your processor resets but none of the Reset Source bits is set. Try to figure out the stack needs of your routine, implement a recursion iteration counter as a global variable and test/check/display this variable at reset, before the RAM gets zeroed. This should give you a pretty sure evidence if and when your stack overflows. You should ALWAYS limit the number of iterations of a recursive process. Meindert

Memfault Beyond the Launch