OFFTOPIC?: arm-linux-gnueabi-gdb error with cortex-m3 code

So I have encountered a very odd gdb error that I cant make sense of. 
I am using version 4.4.5 of the gcc tools (arm-linux-gnueabi)
and version 7.0.1 of the gdb (arm-linux-gnueabi) debugger. I am using
stm32f103 cortex-m3 board

Basically gdb seems to be clobbering the values passed to functions. Heres 
an example:

Breakpoint 1, main () at apps/core/core_test.c:46
46        wdTemp = wdTemp;        /*dummy ins for breakpoint*/
(gdb) n
47        tclib_printf("\r%d", wdTemp);
(gdb) p wdTemp
$1 = 0
(gdb) s
tclib_printf (ptrString=0x0, wdValue=536874884) at tclib/IE_tclib.c:140
140    while ((*ptrString) != NULL)
(gdb) p strSystick
$2 = {dwMsTick = 0, dwSeconds = 1, dwMsTotal = 1000, ptrFunc = 0}
(gdb) 

ptrString should be an address in the range (0x2000 0000 to 0x2000 5000)
see the disassembly below and wdTemp passed as wdValue should be 0

a disassembly of the lines just before the call to my tclib_printf()
routine shows that r0 and r1 are initialized as needed since they are
the only two arguments to the function

20000e0a:       687b            ldr     r3, [r7, #4]
20000e0c:       607b            str     r3, [r7, #4]
20000e0e:       687b            ldr     r3, [r7, #4]
20000e10:       f640 6050       movw    r0, #3664       ; 0xe50
20000e14:       f2c2 0000       movt    r0, #8192       ; 0x2000
20000e18:       4619            mov     r1, r3
20000e1a:       f7ff f9a3       bl      20000164 <tclib_printf>
 


a disassembly of the tclib_printf() routine shows that it starts up as 
expected and does nothing special to the values passed. what gives?
I am completely stumped. The stack is at the top of memory and there is
no issue there since these parameters are passed on r0 and r1


20000164 <tclib_printf>:
20000164:       b580            push    {r7, lr}
20000166:       b086            sub     sp, #24
20000168:       af00            add     r7, sp, #0
2000016a:       6078            str     r0, [r7, #4]
2000016c:       6039            str     r1, [r7, #0]
 
I am stumped!!!
Is there something in gdb' setup or view of this object file I am 
omitting?

Reply by Tauno Voipio ●August 22, 20132013-08-22

On 22.8.13 1:01 , jackbenimble wrote:
> So I have encountered a very odd gdb error that I cant make sense of.
> I am using version 4.4.5 of the gcc tools (arm-linux-gnueabi)
> and version 7.0.1 of the gdb (arm-linux-gnueabi) debugger. I am using
> stm32f103 cortex-m3 board
>
> Basically gdb seems to be clobbering the values passed to functions. Heres
> an example:
>
> Breakpoint 1, main () at apps/core/core_test.c:46
> 46        wdTemp = wdTemp;        /*dummy ins for breakpoint*/
> (gdb) n
> 47        tclib_printf("\r%d", wdTemp);
> (gdb) p wdTemp
> $1 = 0
> (gdb) s
> tclib_printf (ptrString=0x0, wdValue=536874884) at tclib/IE_tclib.c:140
> 140    while ((*ptrString) != NULL)
> (gdb) p strSystick
> $2 = {dwMsTick = 0, dwSeconds = 1, dwMsTotal = 1000, ptrFunc = 0}
> (gdb)
>
> ptrString should be an address in the range (0x2000 0000 to 0x2000 5000)
> see the disassembly below and wdTemp passed as wdValue should be 0
>
> a disassembly of the lines just before the call to my tclib_printf()
> routine shows that r0 and r1 are initialized as needed since they are
> the only two arguments to the function
>
> 20000e0a:       687b            ldr     r3, [r7, #4]
> 20000e0c:       607b            str     r3, [r7, #4]
> 20000e0e:       687b            ldr     r3, [r7, #4]
> 20000e10:       f640 6050       movw    r0, #3664       ; 0xe50
> 20000e14:       f2c2 0000       movt    r0, #8192       ; 0x2000
> 20000e18:       4619            mov     r1, r3
> 20000e1a:       f7ff f9a3       bl      20000164 <tclib_printf>
>
>
>
> a disassembly of the tclib_printf() routine shows that it starts up as
> expected and does nothing special to the values passed. what gives?
> I am completely stumped. The stack is at the top of memory and there is
> no issue there since these parameters are passed on r0 and r1
>
>
> 20000164 <tclib_printf>:
> 20000164:       b580            push    {r7, lr}
> 20000166:       b086            sub     sp, #24
> 20000168:       af00            add     r7, sp, #0
> 2000016a:       6078            str     r0, [r7, #4]
> 2000016c:       6039            str     r1, [r7, #0]
>
> I am stumped!!!
> Is there something in gdb' setup or view of this object file I am
> omitting?


Please check that your stack is initially aligned on two-fullword 
boundary (8 bytes). The EABI specification assumes 8 byte aligned stack.

Another question is if the library code is compiled with optimization. 
Certain optimization options make the code very difficult for the 
debugger. You can check the register contents at the breakpoint (info reg).

-- 

Tauno Voipio

Reply by jackbenimble ●August 22, 20132013-08-22

Thanks for replying ... my response below.

> Please check that your stack is initially aligned on two-fullword
> boundary (8 bytes). The EABI specification assumes 8 byte aligned stack.

There is 20k worth of ram on this chip and I have set my linker script
to 
MEMORY  {
        STM32_RAM    : ORIGIN = 0x20000000,   LENGTH = (20480 - 1024)
        }

and my ivt table to


.global stm32_ivt
.equ    STM32_SRAM_BASE,0x20000000
.thumb
.extern main
.data
stm32_ivt:      .word   STM32_SRAM_BASE + (20 * 1024)
                .word   (main + 1)
                .skip   (14 * 4)
                .skip   (60 * 4)
.text


I addition I have bit 9 of the NVIC CCR (STKALIGN) bit set
(gdb) monitor mdw 0xe000ed14
0xe000ed14: 00000210 

There are no issues with void functions ... just functions that pass
arguments.


> Another question is if the library code is compiled with optimization.
> Certain optimization options make the code very difficult for the
> debugger. You can check the register contents at the breakpoint (info
> reg).


No optimization here - at least by habit whenever I use -g
CFLAGS = -g -c -Wall -nostdlib -mcpu=cortex-m3 -mlittle-endian -mthumb  -
I core/include -I tclib \
         -mabi=aapcs -O0
LDFLAGS= -nostdlib -e main  -Map flash.map -L linker -T IE_stm32.ld --
cref

Reply by George Neuner ●August 23, 20132013-08-23

On Thu, 22 Aug 2013 17:06:13 GMT, jackbenimble
<jackbenimble@mindyourhusiness.com> wrote:

>The stack is at the top of memory and there is
>no issue there since these parameters are passed on r0 and r1

>There is 20k worth of ram on this chip and I have set my linker script
>to 
>MEMORY  {
>        STM32_RAM    : ORIGIN = 0x20000000,   LENGTH = (20480 - 1024)
>        }

Are you linking in the GDB stub?  If so, you're likely blowing the
stack and corrupting your heap ... the stub itself may use up to
several KB of stack [chip and I/O dependent].

If you're not using the stub, then I'm out - I don't work with ARM and
I haven't otherwise run into this particular GDB problem.

Good luck!
George

Reply by Luis Filipe Rossi ●August 23, 20132013-08-23

Once i was working on a project with STM32F100 and was having a problem that my variables were getting corrupted randomly. After hours of assembly step the problem was at the stack pointer initialization. But that was not supposed to be happening, as the startup code was initializing it as it should. The real problem was a hardware issue. In order to have flexibility we left the option in our board to pull-up or down one of the the boot pins. For some reason, during the assembly both resistors were assembled and that was causing some problems during the boot. After removing one of the resistors the system worked as it should. Not sure if you might have this problem, but it is worth mentioning.

Regards,
On Thursday, August 22, 2013 7:01:22 AM UTC-3, jackbenimble wrote:
> So I have encountered a very odd gdb error that I cant make sense of. 
> 
> I am using version 4.4.5 of the gcc tools (arm-linux-gnueabi)
> 
> and version 7.0.1 of the gdb (arm-linux-gnueabi) debugger. I am using
> 
> stm32f103 cortex-m3 board
> 
> 
> 
> Basically gdb seems to be clobbering the values passed to functions. Heres 
> 
> an example:
> 
> 
> 
> Breakpoint 1, main () at apps/core/core_test.c:46
> 
> 46        wdTemp = wdTemp;        /*dummy ins for breakpoint*/
> 
> (gdb) n
> 
> 47        tclib_printf("\r%d", wdTemp);
> 
> (gdb) p wdTemp
> 
> $1 = 0
> 
> (gdb) s
> 
> tclib_printf (ptrString=0x0, wdValue=536874884) at tclib/IE_tclib.c:140
> 
> 140    while ((*ptrString) != NULL)
> 
> (gdb) p strSystick
> 
> $2 = {dwMsTick = 0, dwSeconds = 1, dwMsTotal = 1000, ptrFunc = 0}
> 
> (gdb) 
> 
> 
> 
> ptrString should be an address in the range (0x2000 0000 to 0x2000 5000)
> 
> see the disassembly below and wdTemp passed as wdValue should be 0
> 
> 
> 
> a disassembly of the lines just before the call to my tclib_printf()
> 
> routine shows that r0 and r1 are initialized as needed since they are
> 
> the only two arguments to the function
> 
> 
> 
> 20000e0a:       687b            ldr     r3, [r7, #4]
> 
> 20000e0c:       607b            str     r3, [r7, #4]
> 
> 20000e0e:       687b            ldr     r3, [r7, #4]
> 
> 20000e10:       f640 6050       movw    r0, #3664       ; 0xe50
> 
> 20000e14:       f2c2 0000       movt    r0, #8192       ; 0x2000
> 
> 20000e18:       4619            mov     r1, r3
> 
> 20000e1a:       f7ff f9a3       bl      20000164 <tclib_printf>
> 
>  
> 
> 
> 
> 
> 
> a disassembly of the tclib_printf() routine shows that it starts up as 
> 
> expected and does nothing special to the values passed. what gives?
> 
> I am completely stumped. The stack is at the top of memory and there is
> 
> no issue there since these parameters are passed on r0 and r1
> 
> 
> 
> 
> 
> 20000164 <tclib_printf>:
> 
> 20000164:       b580            push    {r7, lr}
> 
> 20000166:       b086            sub     sp, #24
> 
> 20000168:       af00            add     r7, sp, #0
> 
> 2000016a:       6078            str     r0, [r7, #4]
> 
> 2000016c:       6039            str     r1, [r7, #0]
> 
>  
> 
> I am stumped!!!
> 
> Is there something in gdb' setup or view of this object file I am 
> 
> omitting?

Reply by rombios ●August 23, 20132013-08-23

> Are you linking in the GDB stub?  If so, you're likely blowing the stack
> and corrupting your heap ... the stub itself may use up to several KB of
> stack [chip and I/O dependent].

Its weird because non of the parameters are on stack. As you know the
arm procedure calling convention uses r0-r3 for the first four parameters.
Somehow execution under gdb corrupts r0 and r1 (basically any parameters
passed to a function)

Heres a debugging session to highlite what I mean
the gdb (layout asm) and stepi command clearly shows r0 and r1
being initialized correctly before the call to tclib_printf (prologue as it
were)

46		tclib_printf("\r%d", wdTemp);


   |0x20000e0e <main+50>            ldr    r3, [r7, #4] 
   &#9474;0x20000e14 <main+56>            movw   r0, #3668       ; 0xe54 
   &#9474;0x20000e18 <main+60>            movt   r0, #8192       ; 0x2000   
   &#9474;0x20000e1c <main+64>            mov    r1, r3                     
   &#9474;0x20000e1e <main+66>            bl     0x20000388 <tclib_printf> 


Here is a disassembly of the first few lines of tclib_printf
   &#9474;0x20000388 <tclib_printf>       lsls   r1, r6, #26      
   &#9474;0x2000038a <tclib_printf+2>     movs   r0, #0           
   &#9474;0x2000038c <tclib_printf+4>     lsls   r1, r6, #26      
   &#9474;0x2000038e <tclib_printf+6>     movs   r0, #0           
   &#9474;0x20000390 <tclib_printf+8>     lsls   r1, r6, #26      
   &#9474;0x20000392 <tclib_printf+10>    movs   r0, #0           
Which bear NO RESEMBLANCE to the objdump -d disassembly of the out file


THIS HAS ME STOMPED. I dont know how those instructions got there. Heres 
the c code of the first few lines of tclib_printf and the objdump of the 
.out file before loading to gdb

void
tclib_printf(char *ptrString, int wdValue)
{
unsigned char   sbString[9];
int             wdTemp;

while ((*ptrString) != NULL)
        {
        wdTemp =*ptrString;
        switch((char)wdTemp)
                {
                case    '%':
                        {
                        wdTemp = *(++ptrString);
                        switch(wdTemp


arm-linux-gnueabi-objdump -d core_test.out |grep tclib_printf


20000388 <tclib_printf>:
20000388:       b580            push    {r7, lr}
2000038a:       b086            sub     sp, #24
2000038c:       af00            add     r7, sp, #0
2000038e:       6078            str     r0, [r7, #4]
20000390:       6039            str     r1, [r7, #0]
20000392:       e0f6            b.n     20000582 <tclib_printf+0x1fa>
20000394:       687b            ldr     r3, [r7, #4]
20000396:       781b            ldrb    r3, [r3, #0]
20000398:       60bb            str     r3, [r7, #8]
2000039a:       68bb            ldr     r3, [r7, #8]
2000039c:       b2db            uxtb    r3, r3
2000039e:       2b25            cmp     r3, #37 ; 0x25
200003a0:       d002            beq.n   200003a8 <tclib_printf+0x20>
200003a2:       2b5c            cmp     r3, #92 ; 0x5c
200003a4:       d073            beq.n   2000048e <tclib_printf+0x106>
200003a6:       e0b9            b.n     2000051c <tclib_printf+0x194>


objdump matches the C code. but somehow arm-linux-gnueabi-gdb has 
replaced the instructions in the code .. with manipulations of r0 and r1
that clobber their values.

I cant for the life of me figure out why this is happening ... 

So I decide to dump the binary values in memory after tclib_printf
(gdb) p tclib_printf
$1 = {void (char *, int)} 0x20000388 <tclib_printf>

(gdb) monitor mdh 0x20000388 20
0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 
            d002 2b5c d073 e0b9 
0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 
            d002 2b5c d073 e0b9 
0x200003a8: 687b f103 0301 607b 

So these start out fine!!! after the code is loaded and before gdb runs.


I set a breakpoint at line 45 again (tclib_printf) then at the breakpoint
I dump the memory again
monitor mdh 0x20000388 10 
0x20000388: 06b1 2000 06b1 2000 06b1 2000 06b1 2000 06b1 2000 


AND the instructions have changed. Now any casual observer would reach 
the conclusion that somehow/somewhere after execution I am overwriting 
these values. But I assure thats not the case. I am not doing anything 
to clobber memory. I am almost certain of that - prior to this has been
initialization or the core. To prove it

So I change the layout back to source set a breakpoint at line 140 of the
tclib_printf and

   &#9474;138     int             wdTemp; 
   &#9474;139                                  
B+>&#9474;140     while ((*ptrString) != NULL) 
   &#9474;141             {                    
   &#9474;142             wdTemp =*ptrString;  
   &#9474;143             switch((char)wdTemp) 
   &#9474;144                     {            
   &#9474;145                     case    '%': 


No issues there ... but the program will segfault on invalid parameters if
I continue. So its only the first few instructions of ANY function thats 
being clobbered ...

Stomped!
Never saw this when I was working with the arm7tdmi - but probably had 
another version of the gnu dev tools ... 

currently using
gcc 4.4.5
gdb 7.0.1
gnueabi-

Reply by George Neuner ●August 24, 20132013-08-24

On Fri, 23 Aug 2013 22:27:47 GMT, rombios <rombios@hereonearth.com>
wrote:

>> Are you linking in the GDB stub?  If so, you're likely blowing the stack
>> and corrupting your heap ... the stub itself may use up to several KB of
>> stack [chip and I/O dependent].
>
>Its weird because non of the parameters are on stack. As you know the
>arm procedure calling convention uses r0-r3 for the first four parameters.
>Somehow execution under gdb corrupts r0 and r1 (basically any parameters
>passed to a function)

Sorry, I don't work with ARM.  However, it's clear that R0 is being
loaded with the address of the format string ... are you certain that
the format string in memory is valid?

More to the point, does the code work if you just run it as a release
compile or as a debug compile but without using the debugger?

>So I decide to dump the binary values in memory after tclib_printf
>(gdb) p tclib_printf
>$1 = {void (char *, int)} 0x20000388 <tclib_printf>
>
>(gdb) monitor mdh 0x20000388 20
>0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 
>            d002 2b5c d073 e0b9 
>0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 
>            d002 2b5c d073 e0b9 
>0x200003a8: 687b f103 0301 607b 
>
>So these start out fine!!! after the code is loaded and before gdb runs.
>
>I set a breakpoint at line 45 again (tclib_printf) then at the breakpoint
>I dump the memory again
>monitor mdh 0x20000388 10 
>0x20000388: 06b1 2000 06b1 2000 06b1 2000 06b1 2000 06b1 2000 
>
>
>AND the instructions have changed. Now any casual observer would reach 
>the conclusion that somehow/somewhere after execution I am overwriting 
>these values. But I assure thats not the case. I am not doing anything 
>to clobber memory. 

That you know of.  

The bit of linker script you provided didn't specify stack or BSS
(uninitialized data) segments.  You did mention the location of your
stack, but it's generally a good idea to explicitly define the areas
you want to use for BSS, code, heap and stack in your script.

The GDB stubs I'm familiar with [not for ARM but for other chips]
allocate a pair of large static buffers (>= 1KB each) for I/O and also
use a fair amount of stack when in operation ... up to 6KB of stack on
one platform I've used.

If you don't include space for the debugger's static buffers in your
BSS segment, then even just initializing the debugger stub may corrupt
your code.  BSS data and code normally are adjacent in memory, but
where each is placed is up to the linker/loader.

Note that the compiler and/or linker will correctly size the BSS
segment, but directives in the linker script override computed values.
Since you didn't specify a BSS segment, the generated load file itself
may be bad [not corrupt per se, but lacking necessary information].
You may need to define the BSS area and specify that it be sized using
computed values [this is toolchain dependent].

And of course, if you don't allow sufficient extra space for the stack
[or better, a separate stack if possible], using the debugger may blow
the stack and corrupt adjacent memory.

Check the linker's output map file and make sure there is no overlap
between the BSS data and code segments.  Allow the program at least a
few KB of stack and then see what happens.

>... To prove it
>
>So I change the layout back to source set a breakpoint at line 140 of the
>tclib_printf and
>
>   ?138     int             wdTemp; 
>   ?139                                  
>B+>?140     while ((*ptrString) != NULL) 
>   ?141             {                    
>   ?142             wdTemp =*ptrString;  
>   ?143             switch((char)wdTemp) 
>   ?144                     {            
>   ?145                     case    '%': 
>
>
>No issues there ... but the program will segfault on invalid parameters if
>I continue. So its only the first few instructions of ANY function thats 
>being clobbered ...

That doesn't prove anything - your disassembly showed that the code
bytes corresponding to your main() function were ok.  In any event,
the C code listing will appear to be correct regardless of whether
memory has been corrupted: GDB isn't showing you a decompilation of
the code bytes in memory, it is reading from the project file(s) on
your build system.  With memory corruption, a breakpoint set on the C
code may never be hit or may break into unrecognizable assembly code.

George

Reply by jackbenimble ●August 24, 20132013-08-24

Sorry for the time waste. I have found the error after all. Wasnt gdb
so much as my script file and the location of my Interrupt Vector Table.

I had a chance to revisit this with a clear head tonight and the clue
should have been apparent as the repeating sequence of 0x200006b1 which
is the value of my stm32_nvic_unknown_isr handler and my attempt to 
rebuild it in memory before changing the vector table.

Time to revisit the linker script ...

Reply by jackbenimble ●August 24, 20132013-08-24

Sorry for the time waste. I have found the error after all. Wasnt gdb
so much as my script file and the location of my Interrupt Vector Table.

I had a chance to revisit this with a clear head tonight and the clue
should have been apparent as the repeating sequence of 0x200006b1 which
is the value of my stm32_nvic_unknown_isr handler and my attempt to 
rebuild it in memory before changing the vector table.

Time to revisit the linker script ...

Reply by Tauno Voipio ●August 24, 20132013-08-24

On 24.8.13 1:27 , rombios wrote:

> AND the instructions have changed. Now any casual observer would reach
> the conclusion that somehow/somewhere after execution I am overwriting
> these values. But I assure thats not the case. I am not doing anything
> to clobber memory. I am almost certain of that - prior to this has been
                           ^^^^^^^^^^^^^^
> initialization or the core. To prove it

If you're running from RAM, find the piece of code overwriting
the code with 0x200006b1, which seems to be a data pointer.

-- 

-Tauno

OFFTOPIC?: arm-linux-gnueabi-gdb error with cortex-m3 code

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group