Forums

Interpreting SRR1 and OOPS

Started by Bill October 23, 2006
I am getting the OOPS message that follows and have been having a very
difficult time determining what is causing it.  According to "PowerPC
Microprocessor Family: The Programming Environments for 32-Bit
Microprocessors", "When an exception occurs, bits 1-4 and 10-15 of SRR1
are loaded with exception specific information."

SRR1 is 00089032, so bits 1-4 are 0000 and bits 10-15 are 001000.
Unfortunately, I cannot find anywhere what the "exception specific
information" contained in these bits is.

Any information on this exception or interpreting an OOPS message in
general on PPC would be greatly appreciated.



Eclipse # Machine check in kernel mode.
Caused by SRR0=0xC0005D28
Caused by (from SRR1=89032): Machine check signal
Oops: machine check, sig: 7
NIP: C3095218 XER: 00000000 LR: C30951BC SP: C015E240 REGS: c015e190
TRAP: 0200    Not tainted
MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c015c470[0] 'swapper' Last syscall: 120
last math c1db4000 last altivec 00000000
GPR00: 00000000 C015E240 C015C470 C32E6EB8 00001032 000000C6 0000008C
00000000
GPR08: C3110000 C36EF000 C310FA94 C0269600 00000175 1010E944 01FFD000
00000001
GPR16: FFFFFFFF 00000000 00000000 01FF7A0C 00001032 00000002 00000002
C3110000
GPR24: 00000001 C01B0000 C0140000 C0140000 00000002 00000002 00000000
00010000
Call backtrace:
C30951BC C30A81BC C001D25C C001D008 C0006D0C C0005B20 C00071D0
C00071EC C0003948 C01705D8 000035F0
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

Please specify what PowerPC processor is involved.
For instance: if it is MPC603e (or G2) than SSR1 bit 12 indicates
"Machine check signal caused exception" for vector 0x200 which is the
exception in your case.

David Gabbay
DoGav Systems

On Mon, 23 Oct 2006 10:31:05 -0700, Bill wrote:

> I am getting the OOPS message that follows and have been having a very > difficult time determining what is causing it. According to "PowerPC > Microprocessor Family: The Programming Environments for 32-Bit > Microprocessors", "When an exception occurs, bits 1-4 and 10-15 of SRR1 > are loaded with exception specific information." > > SRR1 is 00089032, so bits 1-4 are 0000 and bits 10-15 are 001000. > Unfortunately, I cannot find anywhere what the "exception specific > information" contained in these bits is.
See the chapter on exception processing, chapter 6.
> > Any information on this exception or interpreting an OOPS message in > general on PPC would be greatly appreciated. >
Machine check exception is described in 6.4.2 in my copy: <quote> SRR1 Bit 30 is loaded from MSR[RI] if the processor is in a recoverable state. Otherwise cleared. The setting of all other SRR1 bits is implementation-dependent. </quote> So you may need to look at the user manual of your CPU.
> > > Eclipse # Machine check in kernel mode. > Caused by SRR0=0xC0005D28 > Caused by (from SRR1=89032): Machine check signal > Oops: machine check, sig: 7 > NIP: C3095218 XER: 00000000 LR: C30951BC SP: C015E240 REGS: c015e190 > TRAP: 0200 Not tainted > MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 > TASK = c015c470[0] 'swapper' Last syscall: 120 > last math c1db4000 last altivec 00000000 > GPR00: 00000000 C015E240 C015C470 C32E6EB8 00001032 000000C6 0000008C > 00000000 > GPR08: C3110000 C36EF000 C310FA94 C0269600 00000175 1010E944 01FFD000 > 00000001 > GPR16: FFFFFFFF 00000000 00000000 01FF7A0C 00001032 00000002 00000002 > C3110000 > GPR24: 00000001 C01B0000 C0140000 C0140000 00000002 00000002 00000000 > 00010000 > Call backtrace: > C30951BC C30A81BC C001D25C C001D008 C0006D0C C0005B20 C00071D0 > C00071EC C0003948 C01705D8 000035F0 > Kernel panic: Aiee, killing interrupt handler! > In interrupt handler - not syncing
Rob
MPC8248.

I looked at section 6.4.2 but did not find it very helpful.  My
register settings do not match those listed.  I have:

POW 0    FP  0    BE  0    DR    1
ILE 0    ME  1    FE1 0    RI    1
EE  1    FE0 0    IP  0    LE    0
PR  0    SE  0    IR  1

>From mpc603e UM (the core in your case):
0-11 Cleared 12 core_mcp-Machine check signal caused exception Check the SIU's register TESCR1 (offset 0x10040) for the specific cause. David Gabbay DoGav Systems
Should I add printing the value of this register to the OOPS message?
Is there a better way to read that register before a crash?


dg@dogav.net wrote:
> >From mpc603e UM (the core in your case): > 0-11 Cleared > 12 core_mcp-Machine check signal caused exception > Check the SIU's register TESCR1 (offset 0x10040) for the specific > cause. > > David Gabbay > DoGav Systems
I would print it
David

Reading the TESCR1 revealed a PCI machine check.  Then, reading the ESR
showed that there was a PCI read data parity error, which had gone
undetected because the parity error response bit in the PCI Bus Command
Register was set to 0. Once this bit was set to 1, the presense of the
parity error was confirmed.

Thank you very much.  Now we know what is causing the oops and can go
about fixing it.



dg@dogav.net wrote:
> I would print it > David
dg@dogav.net wrote:
> > I would print it
This is totally meaningless. Google is not usenet - it is only a poor imitation of an interface to the system. Read the links in my sig. below. -- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson More details at: <http://cfaj.freeshell.org/google/> Also see <http://www.safalra.com/special/googlegroupsreply/>