EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

OFFTOPIC?: arm-linux-gnueabi-gdb error with cortex-m3 code

Started by jackbenimble August 22, 2013
So I have encountered a very odd gdb error that I cant make sense of. 
I am using version 4.4.5 of the gcc tools (arm-linux-gnueabi)
and version 7.0.1 of the gdb (arm-linux-gnueabi) debugger. I am using
stm32f103 cortex-m3 board

Basically gdb seems to be clobbering the values passed to functions. Heres 
an example:

Breakpoint 1, main () at apps/core/core_test.c:46
46        wdTemp = wdTemp;        /*dummy ins for breakpoint*/
(gdb) n
47        tclib_printf("\r%d", wdTemp);
(gdb) p wdTemp
$1 = 0
(gdb) s
tclib_printf (ptrString=0x0, wdValue=536874884) at tclib/IE_tclib.c:140
140    while ((*ptrString) != NULL)
(gdb) p strSystick
$2 = {dwMsTick = 0, dwSeconds = 1, dwMsTotal = 1000, ptrFunc = 0}
(gdb) 

ptrString should be an address in the range (0x2000 0000 to 0x2000 5000)
see the disassembly below and wdTemp passed as wdValue should be 0

a disassembly of the lines just before the call to my tclib_printf()
routine shows that r0 and r1 are initialized as needed since they are
the only two arguments to the function

20000e0a:       687b            ldr     r3, [r7, #4]
20000e0c:       607b            str     r3, [r7, #4]
20000e0e:       687b            ldr     r3, [r7, #4]
20000e10:       f640 6050       movw    r0, #3664       ; 0xe50
20000e14:       f2c2 0000       movt    r0, #8192       ; 0x2000
20000e18:       4619            mov     r1, r3
20000e1a:       f7ff f9a3       bl      20000164 <tclib_printf>
 


a disassembly of the tclib_printf() routine shows that it starts up as 
expected and does nothing special to the values passed. what gives?
I am completely stumped. The stack is at the top of memory and there is
no issue there since these parameters are passed on r0 and r1


20000164 <tclib_printf>:
20000164:       b580            push    {r7, lr}
20000166:       b086            sub     sp, #24
20000168:       af00            add     r7, sp, #0
2000016a:       6078            str     r0, [r7, #4]
2000016c:       6039            str     r1, [r7, #0]
 
I am stumped!!!
Is there something in gdb' setup or view of this object file I am 
omitting?
On 22.8.13 1:01 , jackbenimble wrote:
> So I have encountered a very odd gdb error that I cant make sense of. > I am using version 4.4.5 of the gcc tools (arm-linux-gnueabi) > and version 7.0.1 of the gdb (arm-linux-gnueabi) debugger. I am using > stm32f103 cortex-m3 board > > Basically gdb seems to be clobbering the values passed to functions. Heres > an example: > > Breakpoint 1, main () at apps/core/core_test.c:46 > 46 wdTemp = wdTemp; /*dummy ins for breakpoint*/ > (gdb) n > 47 tclib_printf("\r%d", wdTemp); > (gdb) p wdTemp > $1 = 0 > (gdb) s > tclib_printf (ptrString=0x0, wdValue=536874884) at tclib/IE_tclib.c:140 > 140 while ((*ptrString) != NULL) > (gdb) p strSystick > $2 = {dwMsTick = 0, dwSeconds = 1, dwMsTotal = 1000, ptrFunc = 0} > (gdb) > > ptrString should be an address in the range (0x2000 0000 to 0x2000 5000) > see the disassembly below and wdTemp passed as wdValue should be 0 > > a disassembly of the lines just before the call to my tclib_printf() > routine shows that r0 and r1 are initialized as needed since they are > the only two arguments to the function > > 20000e0a: 687b ldr r3, [r7, #4] > 20000e0c: 607b str r3, [r7, #4] > 20000e0e: 687b ldr r3, [r7, #4] > 20000e10: f640 6050 movw r0, #3664 ; 0xe50 > 20000e14: f2c2 0000 movt r0, #8192 ; 0x2000 > 20000e18: 4619 mov r1, r3 > 20000e1a: f7ff f9a3 bl 20000164 <tclib_printf> > > > > a disassembly of the tclib_printf() routine shows that it starts up as > expected and does nothing special to the values passed. what gives? > I am completely stumped. The stack is at the top of memory and there is > no issue there since these parameters are passed on r0 and r1 > > > 20000164 <tclib_printf>: > 20000164: b580 push {r7, lr} > 20000166: b086 sub sp, #24 > 20000168: af00 add r7, sp, #0 > 2000016a: 6078 str r0, [r7, #4] > 2000016c: 6039 str r1, [r7, #0] > > I am stumped!!! > Is there something in gdb' setup or view of this object file I am > omitting?
Please check that your stack is initially aligned on two-fullword boundary (8 bytes). The EABI specification assumes 8 byte aligned stack. Another question is if the library code is compiled with optimization. Certain optimization options make the code very difficult for the debugger. You can check the register contents at the breakpoint (info reg). -- Tauno Voipio
Thanks for replying ... my response below.

> Please check that your stack is initially aligned on two-fullword > boundary (8 bytes). The EABI specification assumes 8 byte aligned stack.
There is 20k worth of ram on this chip and I have set my linker script to MEMORY { STM32_RAM : ORIGIN = 0x20000000, LENGTH = (20480 - 1024) } and my ivt table to .global stm32_ivt .equ STM32_SRAM_BASE,0x20000000 .thumb .extern main .data stm32_ivt: .word STM32_SRAM_BASE + (20 * 1024) .word (main + 1) .skip (14 * 4) .skip (60 * 4) .text I addition I have bit 9 of the NVIC CCR (STKALIGN) bit set (gdb) monitor mdw 0xe000ed14 0xe000ed14: 00000210 There are no issues with void functions ... just functions that pass arguments.
> Another question is if the library code is compiled with optimization. > Certain optimization options make the code very difficult for the > debugger. You can check the register contents at the breakpoint (info > reg).
No optimization here - at least by habit whenever I use -g CFLAGS = -g -c -Wall -nostdlib -mcpu=cortex-m3 -mlittle-endian -mthumb - I core/include -I tclib \ -mabi=aapcs -O0 LDFLAGS= -nostdlib -e main -Map flash.map -L linker -T IE_stm32.ld -- cref
On Thu, 22 Aug 2013 17:06:13 GMT, jackbenimble
<jackbenimble@mindyourhusiness.com> wrote:

>The stack is at the top of memory and there is >no issue there since these parameters are passed on r0 and r1
>There is 20k worth of ram on this chip and I have set my linker script >to >MEMORY { > STM32_RAM : ORIGIN = 0x20000000, LENGTH = (20480 - 1024) > }
Are you linking in the GDB stub? If so, you're likely blowing the stack and corrupting your heap ... the stub itself may use up to several KB of stack [chip and I/O dependent]. If you're not using the stub, then I'm out - I don't work with ARM and I haven't otherwise run into this particular GDB problem. Good luck! George
Once i was working on a project with STM32F100 and was having a problem that my variables were getting corrupted randomly. After hours of assembly step the problem was at the stack pointer initialization. But that was not supposed to be happening, as the startup code was initializing it as it should. The real problem was a hardware issue. In order to have flexibility we left the option in our board to pull-up or down one of the the boot pins. For some reason, during the assembly both resistors were assembled and that was causing some problems during the boot. After removing one of the resistors the system worked as it should. Not sure if you might have this problem, but it is worth mentioning.

Regards,
On Thursday, August 22, 2013 7:01:22 AM UTC-3, jackbenimble wrote:
> So I have encountered a very odd gdb error that I cant make sense of. > > I am using version 4.4.5 of the gcc tools (arm-linux-gnueabi) > > and version 7.0.1 of the gdb (arm-linux-gnueabi) debugger. I am using > > stm32f103 cortex-m3 board > > > > Basically gdb seems to be clobbering the values passed to functions. Heres > > an example: > > > > Breakpoint 1, main () at apps/core/core_test.c:46 > > 46 wdTemp = wdTemp; /*dummy ins for breakpoint*/ > > (gdb) n > > 47 tclib_printf("\r%d", wdTemp); > > (gdb) p wdTemp > > $1 = 0 > > (gdb) s > > tclib_printf (ptrString=0x0, wdValue=536874884) at tclib/IE_tclib.c:140 > > 140 while ((*ptrString) != NULL) > > (gdb) p strSystick > > $2 = {dwMsTick = 0, dwSeconds = 1, dwMsTotal = 1000, ptrFunc = 0} > > (gdb) > > > > ptrString should be an address in the range (0x2000 0000 to 0x2000 5000) > > see the disassembly below and wdTemp passed as wdValue should be 0 > > > > a disassembly of the lines just before the call to my tclib_printf() > > routine shows that r0 and r1 are initialized as needed since they are > > the only two arguments to the function > > > > 20000e0a: 687b ldr r3, [r7, #4] > > 20000e0c: 607b str r3, [r7, #4] > > 20000e0e: 687b ldr r3, [r7, #4] > > 20000e10: f640 6050 movw r0, #3664 ; 0xe50 > > 20000e14: f2c2 0000 movt r0, #8192 ; 0x2000 > > 20000e18: 4619 mov r1, r3 > > 20000e1a: f7ff f9a3 bl 20000164 <tclib_printf> > > > > > > > > a disassembly of the tclib_printf() routine shows that it starts up as > > expected and does nothing special to the values passed. what gives? > > I am completely stumped. The stack is at the top of memory and there is > > no issue there since these parameters are passed on r0 and r1 > > > > > > 20000164 <tclib_printf>: > > 20000164: b580 push {r7, lr} > > 20000166: b086 sub sp, #24 > > 20000168: af00 add r7, sp, #0 > > 2000016a: 6078 str r0, [r7, #4] > > 2000016c: 6039 str r1, [r7, #0] > > > > I am stumped!!! > > Is there something in gdb' setup or view of this object file I am > > omitting?
> Are you linking in the GDB stub? If so, you're likely blowing the stack > and corrupting your heap ... the stub itself may use up to several KB of > stack [chip and I/O dependent].
Its weird because non of the parameters are on stack. As you know the arm procedure calling convention uses r0-r3 for the first four parameters. Somehow execution under gdb corrupts r0 and r1 (basically any parameters passed to a function) Heres a debugging session to highlite what I mean the gdb (layout asm) and stepi command clearly shows r0 and r1 being initialized correctly before the call to tclib_printf (prologue as it were) 46 tclib_printf("\r%d", wdTemp); |0x20000e0e <main+50> ldr r3, [r7, #4] &#9474;0x20000e14 <main+56> movw r0, #3668 ; 0xe54 &#9474;0x20000e18 <main+60> movt r0, #8192 ; 0x2000 &#9474;0x20000e1c <main+64> mov r1, r3 &#9474;0x20000e1e <main+66> bl 0x20000388 <tclib_printf> Here is a disassembly of the first few lines of tclib_printf &#9474;0x20000388 <tclib_printf> lsls r1, r6, #26 &#9474;0x2000038a <tclib_printf+2> movs r0, #0 &#9474;0x2000038c <tclib_printf+4> lsls r1, r6, #26 &#9474;0x2000038e <tclib_printf+6> movs r0, #0 &#9474;0x20000390 <tclib_printf+8> lsls r1, r6, #26 &#9474;0x20000392 <tclib_printf+10> movs r0, #0 Which bear NO RESEMBLANCE to the objdump -d disassembly of the out file THIS HAS ME STOMPED. I dont know how those instructions got there. Heres the c code of the first few lines of tclib_printf and the objdump of the .out file before loading to gdb void tclib_printf(char *ptrString, int wdValue) { unsigned char sbString[9]; int wdTemp; while ((*ptrString) != NULL) { wdTemp =*ptrString; switch((char)wdTemp) { case '%': { wdTemp = *(++ptrString); switch(wdTemp arm-linux-gnueabi-objdump -d core_test.out |grep tclib_printf 20000388 <tclib_printf>: 20000388: b580 push {r7, lr} 2000038a: b086 sub sp, #24 2000038c: af00 add r7, sp, #0 2000038e: 6078 str r0, [r7, #4] 20000390: 6039 str r1, [r7, #0] 20000392: e0f6 b.n 20000582 <tclib_printf+0x1fa> 20000394: 687b ldr r3, [r7, #4] 20000396: 781b ldrb r3, [r3, #0] 20000398: 60bb str r3, [r7, #8] 2000039a: 68bb ldr r3, [r7, #8] 2000039c: b2db uxtb r3, r3 2000039e: 2b25 cmp r3, #37 ; 0x25 200003a0: d002 beq.n 200003a8 <tclib_printf+0x20> 200003a2: 2b5c cmp r3, #92 ; 0x5c 200003a4: d073 beq.n 2000048e <tclib_printf+0x106> 200003a6: e0b9 b.n 2000051c <tclib_printf+0x194> objdump matches the C code. but somehow arm-linux-gnueabi-gdb has replaced the instructions in the code .. with manipulations of r0 and r1 that clobber their values. I cant for the life of me figure out why this is happening ... So I decide to dump the binary values in memory after tclib_printf (gdb) p tclib_printf $1 = {void (char *, int)} 0x20000388 <tclib_printf> (gdb) monitor mdh 0x20000388 20 0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 d002 2b5c d073 e0b9 0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 d002 2b5c d073 e0b9 0x200003a8: 687b f103 0301 607b So these start out fine!!! after the code is loaded and before gdb runs. I set a breakpoint at line 45 again (tclib_printf) then at the breakpoint I dump the memory again monitor mdh 0x20000388 10 0x20000388: 06b1 2000 06b1 2000 06b1 2000 06b1 2000 06b1 2000 AND the instructions have changed. Now any casual observer would reach the conclusion that somehow/somewhere after execution I am overwriting these values. But I assure thats not the case. I am not doing anything to clobber memory. I am almost certain of that - prior to this has been initialization or the core. To prove it So I change the layout back to source set a breakpoint at line 140 of the tclib_printf and &#9474;138 int wdTemp; &#9474;139 B+>&#9474;140 while ((*ptrString) != NULL) &#9474;141 { &#9474;142 wdTemp =*ptrString; &#9474;143 switch((char)wdTemp) &#9474;144 { &#9474;145 case '%': No issues there ... but the program will segfault on invalid parameters if I continue. So its only the first few instructions of ANY function thats being clobbered ... Stomped! Never saw this when I was working with the arm7tdmi - but probably had another version of the gnu dev tools ... currently using gcc 4.4.5 gdb 7.0.1 gnueabi-
On Fri, 23 Aug 2013 22:27:47 GMT, rombios <rombios@hereonearth.com>
wrote:

>> Are you linking in the GDB stub? If so, you're likely blowing the stack >> and corrupting your heap ... the stub itself may use up to several KB of >> stack [chip and I/O dependent]. > >Its weird because non of the parameters are on stack. As you know the >arm procedure calling convention uses r0-r3 for the first four parameters. >Somehow execution under gdb corrupts r0 and r1 (basically any parameters >passed to a function)
Sorry, I don't work with ARM. However, it's clear that R0 is being loaded with the address of the format string ... are you certain that the format string in memory is valid? More to the point, does the code work if you just run it as a release compile or as a debug compile but without using the debugger?
>So I decide to dump the binary values in memory after tclib_printf >(gdb) p tclib_printf >$1 = {void (char *, int)} 0x20000388 <tclib_printf> > >(gdb) monitor mdh 0x20000388 20 >0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 > d002 2b5c d073 e0b9 >0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 > d002 2b5c d073 e0b9 >0x200003a8: 687b f103 0301 607b > >So these start out fine!!! after the code is loaded and before gdb runs. > >I set a breakpoint at line 45 again (tclib_printf) then at the breakpoint >I dump the memory again >monitor mdh 0x20000388 10 >0x20000388: 06b1 2000 06b1 2000 06b1 2000 06b1 2000 06b1 2000 > > >AND the instructions have changed. Now any casual observer would reach >the conclusion that somehow/somewhere after execution I am overwriting >these values. But I assure thats not the case. I am not doing anything >to clobber memory.
That you know of. The bit of linker script you provided didn't specify stack or BSS (uninitialized data) segments. You did mention the location of your stack, but it's generally a good idea to explicitly define the areas you want to use for BSS, code, heap and stack in your script. The GDB stubs I'm familiar with [not for ARM but for other chips] allocate a pair of large static buffers (>= 1KB each) for I/O and also use a fair amount of stack when in operation ... up to 6KB of stack on one platform I've used. If you don't include space for the debugger's static buffers in your BSS segment, then even just initializing the debugger stub may corrupt your code. BSS data and code normally are adjacent in memory, but where each is placed is up to the linker/loader. Note that the compiler and/or linker will correctly size the BSS segment, but directives in the linker script override computed values. Since you didn't specify a BSS segment, the generated load file itself may be bad [not corrupt per se, but lacking necessary information]. You may need to define the BSS area and specify that it be sized using computed values [this is toolchain dependent]. And of course, if you don't allow sufficient extra space for the stack [or better, a separate stack if possible], using the debugger may blow the stack and corrupt adjacent memory. Check the linker's output map file and make sure there is no overlap between the BSS data and code segments. Allow the program at least a few KB of stack and then see what happens.
>... To prove it > >So I change the layout back to source set a breakpoint at line 140 of the >tclib_printf and > > ?138 int wdTemp; > ?139 >B+>?140 while ((*ptrString) != NULL) > ?141 { > ?142 wdTemp =*ptrString; > ?143 switch((char)wdTemp) > ?144 { > ?145 case '%': > > >No issues there ... but the program will segfault on invalid parameters if >I continue. So its only the first few instructions of ANY function thats >being clobbered ...
That doesn't prove anything - your disassembly showed that the code bytes corresponding to your main() function were ok. In any event, the C code listing will appear to be correct regardless of whether memory has been corrupted: GDB isn't showing you a decompilation of the code bytes in memory, it is reading from the project file(s) on your build system. With memory corruption, a breakpoint set on the C code may never be hit or may break into unrecognizable assembly code. George
Sorry for the time waste. I have found the error after all. Wasnt gdb
so much as my script file and the location of my Interrupt Vector Table.

I had a chance to revisit this with a clear head tonight and the clue
should have been apparent as the repeating sequence of 0x200006b1 which
is the value of my stm32_nvic_unknown_isr handler and my attempt to 
rebuild it in memory before changing the vector table.

Time to revisit the linker script ... 
Sorry for the time waste. I have found the error after all. Wasnt gdb
so much as my script file and the location of my Interrupt Vector Table.

I had a chance to revisit this with a clear head tonight and the clue
should have been apparent as the repeating sequence of 0x200006b1 which
is the value of my stm32_nvic_unknown_isr handler and my attempt to 
rebuild it in memory before changing the vector table.

Time to revisit the linker script ... 
On 24.8.13 1:27 , rombios wrote:

> AND the instructions have changed. Now any casual observer would reach > the conclusion that somehow/somewhere after execution I am overwriting > these values. But I assure thats not the case. I am not doing anything > to clobber memory. I am almost certain of that - prior to this has been
^^^^^^^^^^^^^^
> initialization or the core. To prove it
If you're running from RAM, find the piece of code overwriting the code with 0x200006b1, which seems to be a data pointer. -- -Tauno

The 2024 Embedded Online Conference