EmbeddedRelated.com
Forums
Memfault State of IoT Report

Killing MCU by watching memory below 0x4000'0000 !?!

Started by xray450 June 26, 2006
Recently I searched a bug in my software. Doing so I observed a
unexplainable behaviour of the LPC.

Some background information first: I'm working on a LPC2292 (Rev. A,
Bootloader 1.64).
Only internal Flash / SRAM is used (external bus interface
disabled). Program is running from flash.
Debugger: Lauterbach Trace32 on PowerTrace device.
Flash programming and debugging over JTAG.

Let's go: In my debugging session I opened a memory dump window to
watch the RAM at 0x4000'0000, because I assumed a stack overflow.
By mistake I scrolled the window to an address lower than
0x4000'0000 (round about 0x3F...... ).
There I saw flickering some bytes, although the CPU wasn't running.
I was astonished because I could not remember any memory or
registers at this location!
Still having the dump window open, the debugger suddenly behaved
sort of weird. It reported the flash content had been changed and
miscellaneous JTAG errors. When trying to update the flash finally
an erase timeout occured. Retrying didn't help - no response.
Then I tried to upload the program via serial boot loader with
the philips flash tool. But the controller did also not response
("reading part ID failed")! I rarely used the serial bootloader so I
tried it on other identical boards - they all work fine.

This time I remembered some threads on this board regarding similar
problems with the bootloader. Searching for further information I
had a look on jayasooriah's homepage
(http://www.cse.unsw.edu.au/~jayas/esdk/lpc2/boot-loader.html).
Well - this guy might be 'unordinary' some way (sorry jaya...),
but one of his articles really surprised me. Philips seems to make
use of some undocumented registers starting at 0x3FFF8000 - the same
location my memory watch was!

I tried to reproduce this behavior on other brand new boards and
killed further three LPCs in an hour this way!!!

I also tried it with Keil Vision / ULINK using a simple hello world
program to make sure, that it's no fault of the Lauterbach debugger
or buggy software routine which accidentally writes to an forbidden
area or calls some IAP commands. But the same thing happend:
I simply started a debug session and opened a memory window starting
at address 0x3FFF'8000 (btw: no write access has been made to this
area!). After stopping and restarting the session 2 or 3 times the
LPC did not response anymore. I was really not amused and gave up
finally.

Summing up: the controller is unusable and can obviously be thrown
away. JTAG and serial bootloader are dead!

Is anyone out there who can confirm this problem??? This would be a
real heavy design bug! Is there any way to recover the bootloader in
flash? I wonder how the bootloader is programed during production??

Never before I destroyed a controller by software. Meanwhile I'm not
very happy with Philips, not only with it's boot loader issues. We
really think of switching to another ARM7 maker although we invested
much time in the development.

Thank you very much for your help!

Rainer





An Engineer's Guide to the LPC2100 Series

--- In l..., "xray450" wrote:
>
> Recently I searched a bug in my software. Doing so I observed a
> unexplainable behaviour of the LPC.
>
> Some background information first: I'm working on a LPC2292 (Rev. A,
> Bootloader 1.64).
> Only internal Flash / SRAM is used (external bus interface
> disabled). Program is running from flash.
> Debugger: Lauterbach Trace32 on PowerTrace device.
> Flash programming and debugging over JTAG.
>
> Let's go: In my debugging session I opened a memory dump window to
> watch the RAM at 0x4000'0000, because I assumed a stack overflow.
> By mistake I scrolled the window to an address lower than
> 0x4000'0000 (round about 0x3F...... ).
> There I saw flickering some bytes, although the CPU wasn't running.
> I was astonished because I could not remember any memory or
> registers at this location!

Chapter 2 in the User Manual has some information concerning the
unpopulated internal memory areas. I have also accidently slid a
debugger memory display window down below the internal RAM area in a
LPC2292, and seen interesting things happen. BUT, chapter 2 does tell
you what will happen if you do that. You will get "the appropriate bus
cycle abort exception". What did you have your exception handlers set
up to do??? I have never seen any physical chip damage caused by
reading one of the abort trigger memory regions, so I am curious what
you are seeing. Maybe someone else here can help figure it out?

-- Dave

--- In l..., "xray450" wrote:

> (http://www.cse.unsw.edu.au/~jayas/esdk/lpc2/boot-loader.html).
> Well - this guy might be 'unordinary' some way (sorry jaya...),
> but one of his articles really surprised me.

Thank you Rainer for being so kind :)

> Philips seems to make
> use of some undocumented registers starting at 0x3FFF8000

Ata Khah of Product Innovation at Philips Semiconductor appears to
have inadvertently disclosed the existence of these registers in his
article. Have a look at "special registers" in Figure 2 at:

http://www.arm.com/iqonline/mem_currentissue/features/8217.html

> I tried to reproduce this behavior on other brand new boards and
> killed further three LPCs in an hour this way!!!

Among other things, you can overwrite or erase any sector in flash
(including 'protected' boot sector) using these special registers.

> Summing up: the controller is unusable and can obviously be thrown
> away. JTAG and serial bootloader are dead!

This is what happens when the boot sector is trashed.

> Is anyone out there who can confirm this problem??? This would be a
> real heavy design bug! Is there any way to recover the bootloader in
> flash? I wonder how the bootloader is programed during production??

I am not sure what you need confirmed. It is easy to write a simple
program that will self destruct LPC with on-chip flash boot loader.

It is not difficult recover using the JTAG interface and reload the
boot sector if you simply erased it. However, if your boot sector is
corrupted, recovery is not so straightforward, and Philips would like
you to believe that it is impossible for CRP reasons.

> Never before I destroyed a controller by software.

AFAIK the LPC is the only microcontroller that can be destroyed by
software.

> Meanwhile I'm not
> very happy with Philips, not only with it's boot loader issues. We
> really think of switching to another ARM7 maker although we invested
> much time in the development.

There are a number of other traps you can fall into in relation to the
LPC family, quite unlike any other microcontroller in the market.

Jaya

...
> > There I saw flickering some bytes, although the CPU wasn't
running.
> > I was astonished because I could not remember any memory or
> > registers at this location!
>
> Chapter 2 in the User Manual has some information concerning the
> unpopulated internal memory areas. I have also accidently slid a
> debugger memory display window down below the internal RAM area in
a
> LPC2292, and seen interesting things happen. BUT, chapter 2 does
tell
> you what will happen if you do that. You will get "the appropriate
bus
> cycle abort exception". What did you have your exception handlers
set
> up to do??? I have never seen any physical chip damage caused by
> reading one of the abort trigger memory regions, so I am curious
what
> you are seeing. Maybe someone else here can help figure it out?
>
> -- Dave

All exception handlers are set up correctly and currently end in a
simple while(1) loop. But: The information in chapter2 is obviously
not the whole truth. Doing a read access on the area from
0x3FFF'FFFF down to 0x3???'???? does NOT rise a data abort. I have
tried a read access on 0x3FFF'FFFC just now. Don't want to try
further addresses - I already damaged 4 LPCs...
If registers in this area are actually used by the bootloader, this
would explain why no exception occurs.
It's annoying that these hidden registers are not documented (at
least I have not found any), and apparently some of these registers
have sytem critical assignments.

regards
Rainer

Your problem probably has more to do with a "data abort" exception if
you are accessing locations below 0x3FFF8000. How are you handling that
exception?

This is a dump of the beginning and end of the "undocumented registers"
showing that simply reading them does not cause an exception.

3FFF.8000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3FFF.8010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

3FFF.FFE0: 00 00 FF 34 00 00 00 00 00 00 00 00 00 00 00 00
3FFF.FFF0: 00 00 00 00 00 00 00 00 00 00 A0 04 00 00 00 00

So I write a quick one liner in Forth to scan through memory areas and
take my chances. I have my data abort simply skipping the offending
instruction so that it can ignore it and proceed. The start address and
count are passed to the SCAN function which reads and discards the first
word on every 256 byte boundary.

: SCAN ( adr cnt -- ) BOUNDS DO I @ DROP 100h +LOOP ; ok
3000.0000h 1000.0000h SCAN ok
1000.0000h 2000.0000h SCAN ok
No.5 is still alive, so I didn't kill it simply by reading memory.

*Peter*

xray450 wrote:
> Recently I searched a bug in my software. Doing so I observed a
> unexplainable behaviour of the LPC.
>
> Some background information first: I'm working on a LPC2292 (Rev. A,
> Bootloader 1.64).
> Only internal Flash / SRAM is used (external bus interface
> disabled). Program is running from flash.
> Debugger: Lauterbach Trace32 on PowerTrace device.
> Flash programming and debugging over JTAG.
>
> Let's go: In my debugging session I opened a memory dump window to
> watch the RAM at 0x4000'0000, because I assumed a stack overflow.
> By mistake I scrolled the window to an address lower than
> 0x4000'0000 (round about 0x3F...... ).
> There I saw flickering some bytes, although the CPU wasn't running.
> I was astonished because I could not remember any memory or
> registers at this location!
> Still having the dump window open, the debugger suddenly behaved
> sort of weird. It reported the flash content had been changed and
> miscellaneous JTAG errors. When trying to update the flash finally
> an erase timeout occured. Retrying didn't help - no response.
> Then I tried to upload the program via serial boot loader with
> the philips flash tool. But the controller did also not response
> ("reading part ID failed")! I rarely used the serial bootloader so I
> tried it on other identical boards - they all work fine.
>
> This time I remembered some threads on this board regarding similar
> problems with the bootloader. Searching for further information I
> had a look on jayasooriah's homepage
> (http://www.cse.unsw.edu.au/~jayas/esdk/lpc2/boot-loader.html
> ).
> Well - this guy might be 'unordinary' some way (sorry jaya...),
> but one of his articles really surprised me. Philips seems to make
> use of some undocumented registers starting at 0x3FFF8000 - the same
> location my memory watch was!
>
> I tried to reproduce this behavior on other brand new boards and
> killed further three LPCs in an hour this way!!!
>
> I also tried it with Keil Vision / ULINK using a simple hello world
> program to make sure, that it's no fault of the Lauterbach debugger
> or buggy software routine which accidentally writes to an forbidden
> area or calls some IAP commands. But the same thing happend:
> I simply started a debug session and opened a memory window starting
> at address 0x3FFF'8000 (btw: no write access has been made to this
> area!). After stopping and restarting the session 2 or 3 times the
> LPC did not response anymore. I was really not amused and gave up
> finally.
>
> Summing up: the controller is unusable and can obviously be thrown
> away. JTAG and serial bootloader are dead!
>
> Is anyone out there who can confirm this problem??? This would be a
> real heavy design bug! Is there any way to recover the bootloader in
> flash? I wonder how the bootloader is programed during production??
>
> Never before I destroyed a controller by software. Meanwhile I'm not
> very happy with Philips, not only with it's boot loader issues. We
> really think of switching to another ARM7 maker although we invested
> much time in the development.
>
> Thank you very much for your help!
>
> Rainer

--- In l..., "xray450" wrote:
> Doing a read access on the area from
> 0x3FFF'FFFF down to 0x3???'???? does NOT rise a data abort.
...
> If registers in this area are actually used by the bootloader, this
> would explain why no exception occurs.

Yes, this is the "special registers" bank that I have documented as to
their purpose based on information from a variety of sources.

> It's annoying that these hidden registers are not documented (at
> least I have not found any), and apparently some of these registers
> have sytem critical assignments.

One of the most critical things that happen when you write to
0x3fff8000 is that your RAM and ROM vanishes at run time! What
happens next is unpredictable (and often) destructive.

I am however not aware that reading these registers causes harm
although I have only read those registers that I have documented
without problems.

Jaya

On Monday 26 June 2006 14:01, derbaier wrote:
> Chapter 2 in the User Manual has some information concerning the
> unpopulated internal memory areas. I have also accidently slid a
> debugger memory display window down below the internal RAM area in a
> LPC2292, and seen interesting things happen. BUT, chapter 2 does tell
> you what will happen if you do that. You will get "the appropriate bus
> cycle abort exception". What did you have your exception handlers set
> up to do??? I have never seen any physical chip damage caused by
> reading one of the abort trigger memory regions, so I am curious what
> you are seeing. Maybe someone else here can help figure it out?
>
> -- Dave

That might be what the user manual says, but of course the user manual only
describes the view Philips wants you to have - this isn't ncessarily a bad
thing, and using a non-documented way of flash writing for example might be
even worse, but in this case it doesn't make Philips look too good.

Using the OpenOCD I looked at memory below the on-chip RAM address (<
0x40000000), and I didn't get a data abort on addresses between 0x3fff8000
and 0x40000000, but on addresses below 0x3fff8000. The device could be
resumed without a problem after these tests, so I guess nothing got damaged.
The OpenOCD executes the memory access, and if it encountered a data abort it
reports this, but doesn't do anything else, like examining the reason of an
abort.

I believe what really happened depends on what's located at these addresses.
Jaya's page describes the registers at 0x3fff8000, but of course he can't
know about any side effects reading from these addresses could have, nor what
other registers might lurk between 0x3fff8000 and 0x40000000 (and possible
aliasing due to not decoding all address bits).
The actual behaviour of the debugger might also have side effects, i.e. trying
to reconstruct stack frames etc. - it might read addresses the program never
actually referenced.

Regards,

Dominic
--- In l..., Peter Jakacki wrote:
>
> Your problem probably has more to do with a "data abort" exception
if
> you are accessing locations below 0x3FFF8000. How are you handling
that
> exception?
>
> This is a dump of the beginning and end of the "undocumented
registers"
> showing that simply reading them does not cause an exception.
>
> 3FFF.8000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 3FFF.8010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 3FFF.FFE0: 00 00 FF 34 00 00 00 00 00 00 00 00 00 00 00 00
> 3FFF.FFF0: 00 00 00 00 00 00 00 00 00 00 A0 04 00 00 00 00
>
> So I write a quick one liner in Forth to scan through memory areas
and
> take my chances. I have my data abort simply skipping the
offending
> instruction so that it can ignore it and proceed. The start
address and
> count are passed to the SCAN function which reads and discards the
first
> word on every 256 byte boundary.
>
> : SCAN ( adr cnt -- ) BOUNDS DO I @ DROP 100h +LOOP ; ok
> 3000.0000h 1000.0000h SCAN ok
> 1000.0000h 2000.0000h SCAN ok
> No.5 is still alive, so I didn't kill it simply by reading memory.

Peter,

I already figured out, that access on the area between 0x3FFF'8000
to 0x3FFF'FFFF is not raising a data abort. My problem definitely
has nothing to do with an erroneous exception handler.
All exception handlers are set up correctly (they end in a simple
while(1) loop doing nothing currently).
I also tried to read some addresses as you did. This did not lead to
a damaged MCU.
I don't know what side effects watching this area in the debugger
via JTAG has to the microcontroller. In fact NO write access has
been made by me. Also the death did not appear instantly after
watching this mem area in the debugger. It needed some flash upload
attempts (approx. 2 or 3) until the last updload failed with an
timeout. But I don't believe it's accidentally to kill 4 MCUs in
succession by simply load a 'hello world' program while having a
read-only watch window open in the debugger (I tried even two
different debuggers)!

Rainer

> > (http://www.cse.unsw.edu.au/~jayas/esdk/lpc2/boot-loader.html).
> > Well - this guy might be 'unordinary' some way (sorry jaya...),
> > but one of his articles really surprised me.
>
> Thank you Rainer for being so kind :)

You're welcome ;)

> > Philips seems to make
> > use of some undocumented registers starting at 0x3FFF8000
>
> Ata Khah of Product Innovation at Philips Semiconductor appears to
> have inadvertently disclosed the existence of these registers in
his article. Have a look at "special registers" in Figure 2 at:
>
> http://www.arm.com/iqonline/mem_currentissue/features/8217.html
>

this seems to be a password protected link...

> > I tried to reproduce this behavior on other brand new boards and
> > killed further three LPCs in an hour this way!!!
>
> Among other things, you can overwrite or erase any sector in flash
> (including 'protected' boot sector) using these special registers.

I always wondered why Philips states the boot sector is write
protected and on the other hand you can update it with the flash
tool...

...

> There are a number of other traps you can fall into in relation to
the
> LPC family, quite unlike any other microcontroller in the market.

Yes, I already had some nice hours with this micro...

Rainer

--- In l..., Dominic Rath
wrote:
...
> That might be what the user manual says, but of course the user
manual only
> describes the view Philips wants you to have - this isn't
ncessarily a bad
> thing, and using a non-documented way of flash writing for example
might be
> even worse, but in this case it doesn't make Philips look too good.
>
> Using the OpenOCD I looked at memory below the on-chip RAM address
(<
> 0x40000000), and I didn't get a data abort on addresses between
0x3fff8000
> and 0x40000000, but on addresses below 0x3fff8000. The device
could be
> resumed without a problem after these tests, so I guess nothing
got damaged.
> The OpenOCD executes the memory access, and if it encountered a
data abort it
> reports this, but doesn't do anything else, like examining the
reason of an
> abort.
>
> I believe what really happened depends on what's located at these
addresses.
> Jaya's page describes the registers at 0x3fff8000, but of course
he can't
> know about any side effects reading from these addresses could
have, nor what
> other registers might lurk between 0x3fff8000 and 0x40000000 (and
possible
> aliasing due to not decoding all address bits).
> The actual behaviour of the debugger might also have side effects,
i.e. trying
> to reconstruct stack frames etc. - it might read addresses the
program never
> actually referenced.

Dominic,

I also verified that simly reading the affected 'special purpose
registers' within software does obviously not lead to a any
corruption. But I've tried it more than once: updating the flash in
a debugging session (at least with uVision and Trace32) while
watching the affected address area in a memwatch window leads to
unrecoverable corruption of the boot sector after some successful
upload attempts. As you said, I also think of side effects caused by
the debugger.

Rainer


Memfault State of IoT Report