EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Silicon Bug?

Started by Robert Higgins August 31, 2010
I've run into a problem with a little project I've been working on.
The project is a flash loader utility for the TMS470 that runs from
RAM and programs the flash with data coming in over a serial port.
The micro is a TMS470R1A288, which is based on an ARM7TDMI core.
There is 16K of on-chip SRAM starting at address 0x00400000; the flash
loader code runs entirely from this SRAM.

The flash loader is very straightforward: it receives data over a
serial port, erases the affected flash sector(s), programs the
received data into the flash, and then verifies that the programming
took place correctly by reading the data back out of the flash and
comparing it to what was written. The first thing I noticed was that
the verification step was failing. 

In the example shown in the screen capture (at
http://i35.tinypic.com/xdgyz4.jpg), we have just erased sector 0 of
the flash, programmed in the first 16 bytes of new data, and we are in
the middle of the verification routine. 

Note: the flash library code which exhibits the bug was provided by TI
and is being used as-is with only minimal changes. AFAICT, this code
does work correctly when I have used it in other projects (on
different flavors of the TMS470).

Although the verification routine is returning FALSE, indicating
verification failure, the routine should *not* be failing; in actual
fact, the programming was entirely successful. Indeed, if I disable
the calls to the verification routine, the code I programmed into the
flash runs correctly, and spot-checks of the flash contents show that
the flash contains the correct data.

The specific line of code that is failing is highlighted in green in
the C code pane. The buff[] array contains the data we just
programmed, and j contains the first 32-bit word read out of the
newly-programmed flash. If the programming was successful, these two
values should match, and indeed they do: if you look at the watch
window, you'll see that buff[0] and j both contain the value
0xea0017ee. So if the values match, why does this test fail?

The answer lies in the disassembly pane. In the screenshot, we have
just executed the ldr instruction at address 00401F4C. r3 contains the
address of slot 0 of the buff[] array, and the ldr instruction we just
executed should take the 32-bit word from the memory address pointed
to by r3 and load it into r2. However, after executing this
instruction, r2 does not contain the expected value (0xea0017ee), but
rather 0xea000010. This causes the compare instruction at 00401F54 to
fail, and we bail out of the verification routine with a FALSE result.

Now, I could be wrong, but this doesn't appear to be a compiler bug:
the code that is being generated looks reasonable and correct.
(CrossStudio for ARM uses gcc, BTW). ISTM that this is a bug in the
microcontroller itself - it is not executing the instructions
correctly for some reason. I did check the silicon errata sheet for
the chip but nothing there seemed relevant to this issue.

So is my conclusion correct, or is there some other cause for this bug
which I'm missing? And, of course, what the hell do I do about it? How
can I trust a microcontroller that doesn't execute code correctly??
I  suspect that buff is not aligned properly to be able to do this 
comparison,
hence the reason it is only getting 2 bytes of the value correct.  Ususally 
long
values needs to be word aligned (on a 4 byte boundary) for ARM7 processors.
The compiler can work around this if it knows that the value is not word
aligned, but in the case of an array, it may not.  It looks like buff starts 
at
address 0x00403556, which is certainly not a word boundary.

The debugger may handle these accesses differently, so it shows the correct
result.

Try to put buff on a word aligned boundary, and see if this fixes your 
problem.

Mike


"Robert Higgins" <robert@nospam.bogus> wrote in message 
news:vm5p76p0s47g62uchtccoorh81sqbi6c50@4ax.com...
> I've run into a problem with a little project I've been working on. > The project is a flash loader utility for the TMS470 that runs from > RAM and programs the flash with data coming in over a serial port. > The micro is a TMS470R1A288, which is based on an ARM7TDMI core. > There is 16K of on-chip SRAM starting at address 0x00400000; the flash > loader code runs entirely from this SRAM. > > The flash loader is very straightforward: it receives data over a > serial port, erases the affected flash sector(s), programs the > received data into the flash, and then verifies that the programming > took place correctly by reading the data back out of the flash and > comparing it to what was written. The first thing I noticed was that > the verification step was failing. > > In the example shown in the screen capture (at > http://i35.tinypic.com/xdgyz4.jpg), we have just erased sector 0 of > the flash, programmed in the first 16 bytes of new data, and we are in > the middle of the verification routine. > > Note: the flash library code which exhibits the bug was provided by TI > and is being used as-is with only minimal changes. AFAICT, this code > does work correctly when I have used it in other projects (on > different flavors of the TMS470). > > Although the verification routine is returning FALSE, indicating > verification failure, the routine should *not* be failing; in actual > fact, the programming was entirely successful. Indeed, if I disable > the calls to the verification routine, the code I programmed into the > flash runs correctly, and spot-checks of the flash contents show that > the flash contains the correct data. > > The specific line of code that is failing is highlighted in green in > the C code pane. The buff[] array contains the data we just > programmed, and j contains the first 32-bit word read out of the > newly-programmed flash. If the programming was successful, these two > values should match, and indeed they do: if you look at the watch > window, you'll see that buff[0] and j both contain the value > 0xea0017ee. So if the values match, why does this test fail? > > The answer lies in the disassembly pane. In the screenshot, we have > just executed the ldr instruction at address 00401F4C. r3 contains the > address of slot 0 of the buff[] array, and the ldr instruction we just > executed should take the 32-bit word from the memory address pointed > to by r3 and load it into r2. However, after executing this > instruction, r2 does not contain the expected value (0xea0017ee), but > rather 0xea000010. This causes the compare instruction at 00401F54 to > fail, and we bail out of the verification routine with a FALSE result. > > Now, I could be wrong, but this doesn't appear to be a compiler bug: > the code that is being generated looks reasonable and correct. > (CrossStudio for ARM uses gcc, BTW). ISTM that this is a bug in the > microcontroller itself - it is not executing the instructions > correctly for some reason. I did check the silicon errata sheet for > the chip but nothing there seemed relevant to this issue. > > So is my conclusion correct, or is there some other cause for this bug > which I'm missing? And, of course, what the hell do I do about it? How > can I trust a microcontroller that doesn't execute code correctly??
On Tue, 31 Aug 2010 01:34:05 -0600, in comp.arch.embedded "Michael
Anton" <manton@nospamcompusmart.ab.ca> wrote:

>Try to put buff on a word aligned boundary, and see if this fixes your >problem.
Yup, that did the trick! Thanks for your sharp eyes and keen insight.

Memfault Beyond the Launch