EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Virtual Watchdog Timeout

Started by "dan...@gmail.com [rabbit-semi]" November 6, 2015
I have very annoying Virtual Watchdog Timeout issues sometimes in my RCM6700 software that otherwise works perfectly. But every once in awhile I will load a build onto a RCM (that works fine with zero WDT on another RCM in the programmer), and it will just give me VWDT errors over and over and just crash (very annoying when the RCM is installed in a enclosure and it has to be taken apart to get to the card)

I have a crash handler, but it gives me the address of the crash handler for the error (it seems) not the address of the ostensibly responsible code. Is there any way to figure out where exactly the software is getting stuck from the address the runtimeErrorHandler spits out?

My error handler prototype is:

void myRuntimeErrHandler(int error_code, int more_info,int xpc_val, int address)

Any help with this would be greatly appreciated. I'm loosing faith in these RCM processors fast with their inability to report what is actually gone wrong even when you write a error handler to report just that to you!
Daniel,

Can you provide some more information? You need to combine the XPC value and address to get the physical address of where the exception occurred. Look in your MAP file to track down the function that caused the problem. For example, I modified the ErrorHandling/Define_error_handler.c sample to throw exceptions from a function compiled to xmem, and got the following in the Stdio window:

Domain error (run time exception -710) occurred at 003:e2e9

- Execution being passed back to program...

Range error (run time exception -711) occurred at 004:eaa6

- Execution being passed back to program...

And when I look into the MAP file for the sample, I have the following:

0003:e2bf 180 acos \MATH.C 1765



0004:ea8f 138 fmod \MATH.C 2227

And sure enough, when I look at the program I see that those are the functions called to demonstrate exception handling.

You might also want to generate an LST file when you compile your program, so you can see exactly what code exists near a given address.

Hope that helps you to narrow down the problem. If you’d like, you can email me off-list with the MAP for your program and a dump of the runtime exceptions being reported, and I’ll help you to track down the failures.

-Tom

From: r... [mailto:r...]
Sent: Friday, November 06, 2015 9:04 AM
To: r...
Subject: [rabbit-semi] Virtual Watchdog Timeout

I have very annoying Virtual Watchdog Timeout issues sometimes in my RCM6700 software that otherwise works perfectly. But every once in awhile I will load a build onto a RCM (that works fine with zero WDT on another RCM in the programmer), and it will just give me VWDT errors over and over and just crash (very annoying when the RCM is installed in a enclosure and it has to be taken apart to get to the card)

I have a crash handler, but it gives me the address of the crash handler for the error (it seems) not the address of the ostensibly responsible code. Is there any way to figure out where exactly the software is getting stuck from the address the runtimeErrorHandler spits out?

My error handler prototype is:

void myRuntimeErrHandler(int error_code, int more_info,int xpc_val, int address)

Any help with this would be greatly appreciated. I'm loosing faith in these RCM processors fast with their inability to report what is actually gone wrong even when you write a error handler to report just that to you!
Tom, thank you for your reply!

Can you help me understand the relationship between the addresses being thrown as the exceptions and the MAP file? You had address e2e9, but then the MAP file had e2bf as the address for the acos function that was causing the exception. I find this to generally be the problem I see. No real relationship between the address that gets thrown and anything meaningful to look at in the map file...

I have an error handler, and its throwing virtual watchdog timeouts, but its VERY hard to figure out where the execution of the program really is when its getting stuck.

The original problem I was posting about I did wind up fixing, that was related to a network socket pointer that was getting overwritten in a funky way and making the http_handler() get stuck in one of its functions. But identifying that was a painstaking process of me walking through the code with breakpoints and watches, and debugging strings in stdio. I would love to be able to read my run time exception handler messages and get a little more useful info out of them.

Thanks
Daniel,

Note that it wasn't just that part of the address, it includes the XPC value (or LXPC, I believe, on Rabbit 4000 and later). The Rabbit uses segmented memory, so to figure out the physical address you need to overlap the bottom 4 bits of the XPC with the address. So looking at 04:e2e9, you're actually at physical address 0x122E9.

By viewing the MAP file, I determined that the address 04:e2bf is the start of the function acos(). The next entry in the MAP file was an address that came after the exception location, so I can be fairly confident that the exception occurred INSIDE the acos() function.

The exception happened at 04:e2e9, 42 bytes into the function. If I had configured Dynamic C to generate a LST file for the compiled program, I could look into it and find the exact line of C code that was throwing the exception. The LST file shows the C code and all of the assembly generated for it.

Hope that helps.

You are correct, that it can be very difficult to debug an issue where some value is overwritten, since the program doesn't crash until the part that makes use of the corrupted value.

-Tom
On Nov 19, 2015, at 7:54 AM, d...@gmail.com [rabbit-semi] wrote:
> Can you help me understand the relationship between the addresses being thrown as the exceptions and the MAP file? You had address e2e9, but then the MAP file had e2bf as the address for the acos function that was causing the exception. I find this to generally be the problem I see. No real relationship between the address that gets thrown and anything meaningful to look at in the map file...

The 2024 Embedded Online Conference