EmbeddedRelated.com
Blogs
The 2024 Embedded Online Conference

Examining The Stack For Fun And Profit

Steve BranamFebruary 19, 20201 comment

Well, maybe not so much for profit, but certainly for fun. This is a wandering journey of exploration and discovery, learning a variety of interesting and useful things.

One of the concerns with an embedded system is how much memory it needs, known as the memory footprint. This consists of the persistent storage needed for the program (i.e. the flash memory or filesystem space that stores the executable image), and the volatile storage needed to hold the data while executing over long periods of runtime (i.e. the RAM in all its flavors).

The RAM consists of 3 areas: static memory, heap, and stack. For the C language:

  • Static memory is the fixed allocation of memory used to store global variables, file-scope static variables, and function-scope static variables. These form the .bss and .data segments.
  • Heap is the dynamic memory allocated via malloc() and deallocated via free().
  • Stack is the dynamic memory allocated automatically by calling functions, consisting of stack frames for each function in the current call stack, containing function local variables and return information (i.e. the address to return to when the function returns). It can also be allocated via the alloca() function, extending the current stack frame. In either case, stack is deallocated automatically when the enclosing function returns.

While heap and stack are both used dynamically, their allocation pools are themselves allocated as fixed, static memory regions that have been reserved for them. The sizes of those regions are defined by the runtime environment, and in some cases can be adjusted from their defaults, particularly in embedded systems running on bare metal or under an RTOS (Real-Time Operating System).

There are two competing requirements when it comes to dynamic memory sizing. First, it's often desirable to run with the minimum amount of memory so that the product can be shipped with smaller memory chips, or smaller capacity memory built into the microcontroller, to minimize cost.

Second, the regions need to be large enough to handle the maximum dynamic allocations that will be needed for the system to run properly over long periods, for all code paths, no matter what it does.

Bad things can happen when dynamic allocations overflow, exceeding the space reserved for them. The system may crash, or it may continue to run, but with corrupted information that causes it to misbehave. Depending on what the system is controlling, this can have serious real-world consequences. A misbehaving music player may be annoying, but a misbehaving engine or factory control system can kill people.

These conditions are known as heap exhaustion and stack overflow. In order to avoid them, you need to know how much heap and stack a system will need, so a common task during embedded system development is measuring memory consumption (note that in some embedded systems, use of heap is prohibited, due to the risks of fragmentation, non-deterministic allocation time, and exhaustion).

These measurements can help drive decisions about what size chips to buy and what changes to make to the software. For situations where the chip sizes have already been established by cost and hardware requirements, they can help drive decisions about whether to use the software, or find alternative software with better measurements.

Different systems have different tools for making measurements. Here, I'm using a Raspberry Pi running Raspbian Linux. These specific techniques should be applicable to any Linux platform. Other, non-Linux systems should have similar capabilities that allow generally analogous techniques.

The advantage of doing this on Linux is that it's a very easy platform to work with. There are a variety of good tools built in, and you can do native development, development and testing on the same device. That's not always true with other embedded system environments, where you have to do cross-development (i.e. the system where you do development, the development system, is completely different from the system that runs the code, the target system).

The disadvantage of doing this on Linux is that Linux has its own set of libraries and specific way of doing things, so this information doesn't always apply as directly to other systems. In particular, it's a general-purpose operating system and runtime environment, not an embedded system. So just bear that in mind when working on those other systems.

In this particular case, I found that a system was using significantly more stack in its initialization that in the rest of its operation. That's unfortunate, because that means that even though it could run for a long period with a small stack, it needed a larger stack for a brief period, that would then be wasted for the rest of the time. Sometimes you just have to live with that, but it's worth digging into to understand and see if there's anything you can do about it.

What I discovered was that the getaddrinfo() function used for setting up Linux network socket connections consumes a lot of stack, especially when you consider it's processing what's probably a short hostname string. Investigating further, I realized there was nothing I could do about it, and the Raspberry Pi has plenty of memory for stack, but it serves as a useful exercise for illustrating debugging and analysis techniques.

Start with a simple program to exercise the function. I'll run this under gdb, the GNU debugger, to examine the stack, using some gdb scripting to help. Then I'll use the information I find to examine glibc source code, the GNU C runtime library. Along the way, we'll learn not only about the internals of the function and what it calls, but also something about how dynamic library loading works under Linux.

Full documentation on gdb is at Debugging with GDB. But it's easiest to learn by example. Gdb commands can be abbreviated, so while I'll use some full commands, I'll use a lot of the abbreviations. I'll explain each command briefly as I use it. Hitting return at the prompt repeats or continues the last command (I've added extra blank lines in the listings below for readability, but if you see a prompt with nothing else on the line, that's where I've just hit return for the repeat/continue effect). Gdb is a fantastic tool, well worth learning, widely supported on a range of platforms. In some cases, other tools are built on top of it.

Many of these techniques work on bare-metal embedded systems as well. You can run a cross-tool version of gdb against a remote target device connected via some kind of communications channel. For instance, I do similar things on an ST Micro Nucleo board connected to my laptop via a USB cable. I run openOCD on the laptop, which communicates with the built-in ST-LINK on the Nucleo over the cable. Together, openOCD and the cable provide the communications channel for the ARM cross-tool version of gdb, also running on the laptop, to do debugging of the remote target. I'll be covering that in a later blog post.

Like the Nucleo, the Raspberry Pi processor is also an ARM chip, so it helps to know ARM assembly language and EABI (Embedded Application Binary Interface) for debugging. You don't need to be an expert in it, but you need to be able to read it at a rough level; any previous experience with assembly language helps. I found this tutorial to be just what I needed.

Here's the program (test-getaddrinfo.c), just enough to run the function under test and provide some gdb breakpoint targets (main() doesn't even need any parameters):

#include <sys/socket.h>
#include <netdb.h>
#include <string.h>

int
main()
{
    struct addrinfo  hints;
    struct addrinfo* address_list;
    
    memset(&hints, 0, sizeof(hints));
    hints.ai_family   = AF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_protocol = IPPROTO_TCP;
    
    int result = getaddrinfo("test.example.com", "80", &hints, &address_list);
    return result;
}

Build it with debug symbols via the -g option:

pi@raspberrypi:~/Projects/test-getaddrinfo $ gcc test-getaddrinfo.c -o test-getaddrinfo -g

Here's a set of gdb script functions (stack_functions.gdb) for playing around with the stack. The gdb scripting language is pretty simple, and very worth knowing; here's documentation on it.

# Functions for examining and manipulating the stack in gdb.

# Script constants.
set $one_kb        = 1024.0
set $safety_margin = 16

# Raspbian Linux stack parameters.
set $stack_start   = 0x7efdf000
set $stack_end     = 0x7f000000
set $stack_size    = $stack_end - $stack_start

define stack_args
    if $argc < 2
        printf "Usage: stack_args <offset|start> <length|end>\n"
    else
        if $arg0 < $stack_start
            # Assume arg0 is a relative offset from start of stack.
            set $offset = (int)$arg0
        else
            # Assume arg0 is an absolute address, so compute its offset.
            set $offset = (int)$arg0 - $stack_start
        end
        
        if $arg1 < $stack_start
            # Assume arg1 is a relative length.
            set $length = (int)$arg1
        else
            # Assume arg1 is an absolute address, so compute its length.
            set $length = (int)$arg1 - $stack_start - $offset
        end
    end
end

document stack_args
Usage: stack_args <offset|start> <length|end>

Set stack region offset and length from arguments.
end

define dump_stack
    if $argc < 2
        printf "Usage: dump_stack <offset|start> <length|end>\n"
    else
        stack_args $arg0 $arg1
        
        set $i = 0
        while $i < $length
            set $addr = $stack_start + $offset + $i
            x/4wx $addr
            set $i = $i + 16
        end
    end
end

document dump_stack
Usage: dump_stack <offset|start> <length|end>

Dumps stack starting at <offset|start> bytes, 4 longwords at a time,
for <length|end> bytes.
end

define clear_stack
    if $argc < 2
        printf "Usage: clear_stack <offset|start> <length|end>\n"
    else
        stack_args $arg0 $arg1
        
        if $stack_start + $offset + $safety_margin >= $sp
            printf "Error: start is in active stack.\n"
        else
            if $stack_start + $offset + $length + safety_margin >= $sp
                printf "Error: end is in active stack.\n"
            else
                set $i = 0
                while $i < $length
                    set $addr = $stack_start + $offset + $i
                    set *((int *) $addr) = 0
                    set $i = $i + 4
                    
                    # Takes a while, so give some feedback.
                    if $i % 10000 == 0
                        printf "Cleared %d\n", $i
                    end
                end
            end
        end
    end
end

document clear_stack
Usage: clear_stack <offset|start> <length|end>

Clears stack starting at <offset|start> bytes, one longword at a time,
for <length|end> bytes.
end

define stack_offset
    if $argc < 1
        printf "Usage: stack_offset <address>\n"
    else
        # Cast to int is needed to set $depth when $arg0 is $sp.
        set $addr   = (int)$arg0
        set $offset = $addr - $stack_start
        set $depth  = $stack_end - $addr
        
        printf "Address    %10d = 0x%08x\n", $addr, $addr
        
        if $addr < $stack_start || $addr >= $stack_end
            printf "Warning: address is not in stack.\n"
        end
        
        printf "Stack size   %6d = 0x%05x = %5.1fKB, 0x%x-0x%x\n", $stack_size, $stack_size, $stack_size / $one_kb, $stack_start, $stack_end
        printf "Stack offset %6d = 0x%05x = %5.1fKB\n", $offset, $offset, $offset / $one_kb
        printf "Stack depth  %6d = 0x%05x = %5.1fKB\n", $depth, $depth, $depth / $one_kb
    end
end

document stack_offset
Usage: stack_offset <address>

Shows stack offset and depth represented by address.
end

define scan_stack
    if $argc < 2
        printf "Usage: scan_stack <offset|start> <length|end>\n"
    else
        stack_args $arg0 $arg1
        
        set $addr = $stack_start + $offset
        set $i    = 0
        while $i < $length && *((int *) $addr) == 0
            set $addr = $stack_start + $offset + $i
            set $i = $i + 4
            
            # Takes a while, so give some feedback.
            if $i % 10000 == 0
                printf "Scanned %d\n", $i
            end
        end

        if *((int *) $addr) != 0
            if $addr < $sp
                set $offset = $sp - $addr
                printf "Found data %d bytes deeper than current stack frame (0x%x).\n", $offset, $sp
            else
                printf "Stack is clear up to current stack frame (0x%x), it is deepest stack usage.\n", $sp
            end
            
            stack_offset $addr
            dump_stack $addr-$stack_start 64
        else
            printf "Stack is clear in requested range.\n"
        end
    end
end

document scan_stack
Usage: scan_stack <offset|start> <length|end>

Scans stack for non-zero contents starting at <offset|start> bytes, one
longword at a time, for <length|end> bytes.
end

define stack_walk
    set $first_sp = $sp
    set $last_sp  = $sp
    set $total    = 0
    frame
    printf "Top stack frame 0x%08x\n\n", $last_sp
    
    # Loop will error out gracefully when there are no more frames.
    while 1
        up
        set $delta   = $sp - $last_sp
        set $total   = $total + $delta
        printf "Last stack frame 0x%08x, current 0x%08x, size of last %4d = 0x%03x, total deeper %6d = 0x%05x = %5.1fKB\n\n", $last_sp, $sp, $delta, $delta, $total, $total, $total / $one_kb
        set $last_sp = $sp
    end
end

document stack_walk
Usage: stack_walk

Walks stack frames upward from currently selected frame and computes
incremental and cumulative size of frames, so that stack consumption
can be attributed to specific functions.

Use "f 0" to select deepest frame of call stack, or "f <n>" to select
frame <n> higher up in stack.
end

How do I know where the stack boundaries are for the $stack_start and $stack_end variables? On Linux, the file /proc/<pid>/maps lists the addresses for the various memory sections (the proc file system is actually a pseudofile system that acts as the user interface to kernel data). You can grep for "stack" to see the address range. This information is also available in a gdb session for a running program with the info proc map command. Since Linux uses virtual memory, every process uses the same addresses.

This type of thing is very system-specific, so a different Linux platform might use different addresses. For a bare-metal system such as the Nucleo board, and possibly for an RTOS, these addresses would be found in the linker control script (.ld file).

The Linux stack grows backwards, from end to start (i.e. from higher address to lower address). The size of the stack is known as its depth. It consists of a series of stack frames, one per function call in a call tree (think of a stack of plates building up, but using frames instead of plates). Each frame consists of all the temporary storage that a function needs. This includes saving processor registers that need to be preserved across calls, and any local variables. In some cases, function parameters may be passed via the stack, but the ARM EABI dictates that functions pass the first group of arguments via registers.

The stack is created as zero-filled memory at process creation. The fact that it's initialized to known values makes it easy to find the deepest point of consumption by searching for the first non-zero location.

Two registers are important for tracking the stack, SP (Stack Pointer) and FP (Frame Pointer). The FP is actually R11. Gdb identifies these symbolically as $sp and $r11.

Pushing data onto the stack and popping data off it automatically changes the SP. Offset values can also be subtracted from the SP and added to it to bulk-allocate and deallocate space.

Here's an enormously important note about memory allocated by subtracting from the SP: this does not change the values of the memory locations in the allocated space. It simply moves the stack boundary to include them, and the memory has whatever values were previously stored there. Thus, the variables or data structures this space maps to in the program are uninitialized. That's why you have to assign values to your local variables in some way before you read them. Otherwise you read random, unknown data left there by whoever wrote to those locations last. This is a common source of bugs.

At certain points in a function, the SP is saved to the FP to mark the frame. The actual mechanics of how and when that is done are specified by the EABI.

To analyze the stack usage, start the program under gdb and pull in the stack functions (the -q option here is quiet mode to suppress boilerplate startup messages):

pi@raspberrypi:~/Projects/test-getaddrinfo $ gdb -q ./test-getaddrinfo 
Reading symbols from ./test-getaddrinfo...done.

(gdb) source stack_functions.gdb

The program isn't actually running yet. List the program source and set breakpoints on the main() function and the line containing the final return statement:

(gdb) list
1 #include <sys/socket.h>
2 #include <netdb.h>
3 #include <string.h>
4 
5 int
6 main()
7 {
8     struct addrinfo  hints;
9     struct addrinfo* address_list;
10 
(gdb) 
11     memset(&hints, 0, sizeof(hints));
12     hints.ai_family   = AF_UNSPEC;
13     hints.ai_socktype = SOCK_STREAM;
14     hints.ai_protocol = IPPROTO_TCP;
15 
16     int result = getaddrinfo("test.example.com", "80", &hints, &address_list);
17     return result;
18 }

(gdb) b main
Breakpoint 1 at 0x10480: file test-getaddrinfo.c, line 11.

(gdb) b 17
Breakpoint 2 at 0x104c4: file test-getaddrinfo.c, line 17.

Run the program. When it stops at the first breakpoint, the first executable line of the main() function, show the process memory map to verify the stack addresses (look for the [stack] line in the command output):

(gdb) r
Starting program: /home/pi/Projects/test-getaddrinfo/test-getaddrinfo 

Breakpoint 1, main () at test-getaddrinfo.c:11
11     memset(&hints, 0, sizeof(hints));

(gdb) info proc map
process 10163
Mapped address spaces:
 Start Addr   End Addr       Size     Offset objfile
    0x10000    0x11000     0x1000        0x0 /home/pi/Projects/test-getaddrinfo/test-getaddrinfo
    0x20000    0x21000     0x1000        0x0 /home/pi/Projects/test-getaddrinfo/test-getaddrinfo
    0x21000    0x22000     0x1000     0x1000 /home/pi/Projects/test-getaddrinfo/test-getaddrinfo
 0x76e64000 0x76f8e000   0x12a000        0x0 /lib/arm-linux-gnueabihf/libc-2.24.so
 0x76f8e000 0x76f9d000     0xf000   0x12a000 /lib/arm-linux-gnueabihf/libc-2.24.so
 0x76f9d000 0x76f9f000     0x2000   0x129000 /lib/arm-linux-gnueabihf/libc-2.24.so
 0x76f9f000 0x76fa0000     0x1000   0x12b000 /lib/arm-linux-gnueabihf/libc-2.24.so
 0x76fa0000 0x76fa3000     0x3000        0x0 
 0x76fb8000 0x76fbd000     0x5000        0x0 /usr/lib/arm-linux-gnueabihf/libarmmem.so
 0x76fbd000 0x76fcc000     0xf000     0x5000 /usr/lib/arm-linux-gnueabihf/libarmmem.so
 0x76fcc000 0x76fcd000     0x1000     0x4000 /usr/lib/arm-linux-gnueabihf/libarmmem.so
 0x76fcd000 0x76fce000     0x1000     0x5000 /usr/lib/arm-linux-gnueabihf/libarmmem.so
 0x76fce000 0x76fef000    0x21000        0x0 /lib/arm-linux-gnueabihf/ld-2.24.so
 0x76ff9000 0x76ffb000     0x2000        0x0 
 0x76ffb000 0x76ffc000     0x1000        0x0 [sigpage]
 0x76ffc000 0x76ffd000     0x1000        0x0 [vvar]
 0x76ffd000 0x76ffe000     0x1000        0x0 [vdso]
 0x76ffe000 0x76fff000     0x1000    0x20000 /lib/arm-linux-gnueabihf/ld-2.24.so
 0x76fff000 0x77000000     0x1000    0x21000 /lib/arm-linux-gnueabihf/ld-2.24.so
 0x7efdf000 0x7f000000    0x21000        0x0 [stack]
 0xffff0000 0xffff1000     0x1000        0x0 [vectors]

Those are the addresses I used in the stack functions. Scan the stack from its zero offset for its full length to find the first non-zero value (the stack functions accept either offset values from start of stack or absolute memory addresses):

(gdb) scan_stack 0 $stack_size
Scanned 10000
Scanned 20000
Scanned 30000
Scanned 40000
Scanned 50000
Scanned 60000
Scanned 70000
Scanned 80000
Scanned 90000
Scanned 100000
Scanned 110000
Scanned 120000
Found data 4660 bytes deeper than current stack frame (0x7effeeb0).
Address    2130697340 = 0x7effdc7c
Stack size   135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000
Stack offset 126076 = 0x1ec7c = 123.1KB
Stack depth    9092 = 0x02384 =   8.9KB
0x7effdc7c: 0x00000020 0x00002e41 0x61656100 0x01006962
0x7effdc8c: 0x00000024 0x06003605 0x09010806 0x12020a01
0x7effdc9c: 0x14011304 0x16011501 0x18031701 0x1c021a01
0x7effdcac: 0x00012201 0x00000000 0x7effe8f4 0x00000000

The scan found a non-zero 32-bit word at 4660 bytes deeper into the stack than the current stack frame. It prints out the stack size and addresses, the offset of the word, and the total depth that offset represents. Then it dumps the contents of that memory for 16 words. Note that one of the words contains a value that is itself a stack address (i.e. is in the range of the stack addresses 0x7efdf000-0x7f000000).

You might be wondering why there's stuff on the stack deeper than the current frame, when the program hasn't even begun the main() function yet. That's the pre-main code at work. Every system has some startup code that runs before your actual program code. It could be doing library initialization, data initialization (for instance, copying the contents of the .data segment from the executable image into the static memory space set aside for initialized global and static variables), setting up registers, etc. Again, this is very system-dependent.

So where is the current stack frame? Look at the backtrace, which is the list of stack frames for all the functions curently in the call tree up to the breakpoint, and look at the stack offsets represented by the current SP and FP:

(gdb) ba
#0  main () at test-getaddrinfo.c:11

(gdb) stack_offset $sp
Address    2130702000 = 0x7effeeb0
Stack size   135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000
Stack offset 130736 = 0x1feb0 = 127.7KB
Stack depth    4432 = 0x01150 =   4.3KB

(gdb) stack_offset $r11
Address    2130702044 = 0x7effeedc
Stack size   135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000
Stack offset 130780 = 0x1fedc = 127.7KB
Stack depth    4388 = 0x01124 =   4.3KB

The backtrace is really short, just main(), with no indication of who called it (though we know that something in the pre-main code got us this far). The SP and FP are a little different from each other, indicating the stack frame contains a small amount of data. But generally, the total stack depth is 4.3KB, consisting of the main() stack frame, and whatever the pre-main setup.

Look at the actual assembly code that the compiler generated for main(). Since it exists as binary machine code in memory, we need to disassemble it back to assembly language (gdb doesn't support decompiling assembly language back to the original C source code, but it does know which lines of source code represent which ranges of assembly instructions via the debug symbols included by the gcc -g option):

(gdb) disassemble 
Dump of assembler code for function main:
   0x00010474 <+0>: push {r11, lr}
   0x00010478 <+4>: add r11, sp, #4
   0x0001047c <+8>: sub sp, sp, #40 ; 0x28
=> 0x00010480 <+12>: sub r3, r11, #40 ; 0x28
   0x00010484 <+16>: mov r2, #32
   0x00010488 <+20>: mov r1, #0
   0x0001048c <+24>: mov r0, r3
   0x00010490 <+28>: bl 0x10328 <memset@plt>
   0x00010494 <+32>: mov r3, #0
   0x00010498 <+36>: str r3, [r11, #-36] ; 0xffffffdc
   0x0001049c <+40>: mov r3, #1
   0x000104a0 <+44>: str r3, [r11, #-32] ; 0xffffffe0
   0x000104a4 <+48>: mov r3, #6
   0x000104a8 <+52>: str r3, [r11, #-28] ; 0xffffffe4
   0x000104ac <+56>: sub r3, r11, #44 ; 0x2c
   0x000104b0 <+60>: sub r2, r11, #40 ; 0x28
   0x000104b4 <+64>: ldr r1, [pc, #24] ; 0x104d4 <main+96>
   0x000104b8 <+68>: ldr r0, [pc, #24] ; 0x104d8 <main+100>
   0x000104bc <+72>: bl 0x10334 <getaddrinfo@plt>
   0x000104c0 <+76>: str r0, [r11, #-8]
   0x000104c4 <+80>: ldr r3, [r11, #-8]
   0x000104c8 <+84>: mov r0, r3
   0x000104cc <+88>: sub sp, r11, #4
   0x000104d0 <+92>: pop {r11, pc}
   0x000104d4 <+96>: andeq r0, r1, r12, asr #10
   0x000104d8 <+100>: andeq r0, r1, r0, asr r5
End of assembler dump.

Every function consists of a prologue, a body, and an epilogue. The prologue and epilogue are all the automatic code that the compiler generates to enter and exit the function, according to the EABI. The body is the code the compiler generates to implement the logic of the function, according to the C language statements.

The specific details of these varies a bit based on the particilular function call, but in general:

  • The prologue saves off registers that need to be preserved while they get reused by the function and sets up space for local variables.
  • The epilogue sets up the function return value, restores saved registers, and deallocates local variables.

When source code is available to gdb, the disassemble command takes option /s to intermingle source and assembly. This makes it easier to see the distinct parts of the function, and is a useful way to learn how the compiler translates C source constructs to assembly; it gets even more interesting with optimized code.

(gdb) disassemble /s
Dump of assembler code for function main:
test-getaddrinfo.c:
7 {
   0x00010474 <+0>: push {r11, lr}
   0x00010478 <+4>: add r11, sp, #4
   0x0001047c <+8>: sub sp, sp, #40 ; 0x28
8     struct addrinfo  hints;
9     struct addrinfo* address_list;

10 
11     memset(&hints, 0, sizeof(hints));
   0x00010480 <+12>: sub r3, r11, #40 ; 0x28
   0x00010484 <+16>: mov r2, #32
   0x00010488 <+20>: mov r1, #0
   0x0001048c <+24>: mov r0, r3
   0x00010490 <+28>: bl 0x10328 <memset@plt>

12     hints.ai_family   = AF_UNSPEC;
   0x00010494 <+32>: mov r3, #0
   0x00010498 <+36>: str r3, [r11, #-36] ; 0xffffffdc

13     hints.ai_socktype = SOCK_STREAM;
   0x0001049c <+40>: mov r3, #1
   0x000104a0 <+44>: str r3, [r11, #-32] ; 0xffffffe0

14     hints.ai_protocol = IPPROTO_TCP;
   0x000104a4 <+48>: mov r3, #6
   0x000104a8 <+52>: str r3, [r11, #-28] ; 0xffffffe4

15 
16     int result = getaddrinfo("test.example.com", "80", &hints, &address_list);
   0x000104ac <+56>: sub r3, r11, #44 ; 0x2c
   0x000104b0 <+60>: sub r2, r11, #40 ; 0x28
   0x000104b4 <+64>: ldr r1, [pc, #24] ; 0x104d4 <main+96>
   0x000104b8 <+68>: ldr r0, [pc, #24] ; 0x104d8 <main+100>
   0x000104bc <+72>: bl 0x10334 <getaddrinfo@plt>
=> 0x000104c0 <+76>: str r0, [r11, #-8]

17     return result;
   0x000104c4 <+80>: ldr r3, [r11, #-8]

18 }
   0x000104c8 <+84>: mov r0, r3
   0x000104cc <+88>: sub sp, r11, #4
   0x000104d0 <+92>: pop {r11, pc}
   0x000104d4 <+96>: andeq r0, r1, r12, asr #10
   0x000104d8 <+100>: andeq r0, r1, r0, asr r5
End of assembler dump.

Here's the prologue:

   0x00010474 <+0>: push {r11, lr}
   0x00010478 <+4>: add r11, sp, #4
   0x0001047c <+8>: sub sp, sp, #40 ; 0x28

This pushes the FP and the LR (Link Register), containing the return address of the caller, onto the stack, then adds 4 bytes to the SP, storing the result in the FP, to allocate the saved register space in the stack frame. Then it subtracts 40 bytes from the SP for the local variables.

Here's the epilogue:

   0x000104c8 <+84>: mov r0, r3
   0x000104cc <+88>: sub sp, r11, #4
   0x000104d0 <+92>: pop {r11, pc}
   0x000104d4 <+96>: andeq r0, r1, r12, asr #10
   0x000104d8 <+100>: andeq r0, r1, r0, asr r5

This sets up the return value in R0 as specified by the EABI, subtracts 4 bytes from the FP and stores the results in the SP, then pop the saved FP and LR (I'm not sure at the moment what those andeq lines do, is that something special with returning from main()?).

But there's some subtlety to this. What about the 40 bytes that were subtracted from the SP in the prologue? And why is the LR popped back into the PC?

The FP actually contains the value of the SP before the 40 bytes were subtracted. That marked the boundary of the frame. So just subtracting 4 from it is sufficient to restore the SP to its previous value, pointing to the saved FP and LR.

By popping the saved LR directly into the PC, the processor automatically resumes executing at that saved address. It's equivalent to popping the saved value into the LR, then jumping to the address in the LR.

These are the kinds of things that the compiler does to generate efficient code. But note that this is unoptimized code. That is, it is code that directly implements the C statements. It's not always obvious that the assembly instructions are a direct implementation, because sometimes the implementation is doing interesting things to generate side effects in registers, that are then used in subsequent statements.

But the compiler has many more tricks up it's sleeve. You can have it optimize for speed or for space, which causes it to implement the C code in slightly different ways. There are always multiple ways of accomplishing things, with various tradeoffs. Optimization affects those tradeoffs to achieve a particular goal while still following the C source code logic.

By default, gcc generates unoptimized code, because that allows you to track execution directly in gdb. Optimized code can do weird things from a debugging standpoint, making debugging more difficult. It's still quite possible, it's just more complex.

Look in the body of the function, the code between the prologue and epilogue. Note the arrow pointing to the instruction at offset +12. That's where the current breakpoint is holding execution. Look at the bl instructions:

...
   0x00010490 <+28>: bl 0x10328 <memset@plt>
...
   0x000104bc <+72>: bl 0x10334 <getaddrinfo@plt

Those are the function calls that main() makes. These are branch-and-link instructions: branch to the named function and link the current PC value (i.e. save it in the LR) as the return address.

The function names match the C source code. But what's that @plt stuff? That's the Procedure Linkage Table, part of library loading. We'll look at PLT trampolines in a bit. You didn't realize this included gymnastics, did you?

But for now we know about the stack as it relates to main(). Continue until the next breakpoint, right before main() exits:

(gdb) c
Continuing.

Breakpoint 2, main () at test-getaddrinfo.c:17
17     return result;

That means the program has called the memset() and getaddrinfo() functions, doing whatever stack manipulation they needed, and returned back to this line, where main() is about to return its result, ending the program.

What does the stack look like now?

(gdb) stack_offset $sp
Address    2130702000 = 0x7effeeb0
Stack size   135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000
Stack offset 130736 = 0x1feb0 = 127.7KB
Stack depth    4432 = 0x01150 =   4.3KB

(gdb) stack_offset $r11
Address    2130702044 = 0x7effeedc
Stack size   135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000
Stack offset 130780 = 0x1fedc = 127.7KB
Stack depth    4388 = 0x01124 =   4.3KB

SP and FP look like they did before. That confirms that whatever happened to the stack, it's returned to the state that main() expects; it's maintained the context of main(). What does a scan show?

(gdb) scan_stack 0 $stack_size
Scanned 10000
Scanned 20000
Scanned 30000
Scanned 40000
Scanned 50000
Scanned 60000
Scanned 70000
Scanned 80000
Scanned 90000
Scanned 100000
Scanned 110000
Found data 11648 bytes deeper than current stack frame (0x7effeeb0).
Address    2130690352 = 0x7effc130
Stack size   135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000
Stack offset 119088 = 0x1d130 = 116.3KB
Stack depth   16080 = 0x03ed0 =  15.7KB
0x7effc130: 0x76ff94b0 0x7effc1a8 0x76e66c28 0x000004b0
0x7effc140: 0x7effc1ac 0x76fd8548 0x00000001 0x76e6c754
0x7effc150: 0x000004b0 0x76e70804 0x76ff94b0 0x7effc1ac
0x7effc160: 0x7effc1a8 0x00000000 0x76ffecf0 0x76e70804

Something went a lot deeper into the stack. There's data 11,648 bytes deeper in it, for a maximum depth of 15.7KB.

How can we find out who did that? The answer is to set a watchpoint. That's essentially a data breakpoint. The existing breakpoints are execution breakpoints, where gdb interrupts the program when it executes a particular address. For a data breakpoint, i.e. a watchpoint, gdb interrupts the program when it writes to a particular address; it watches to see when the address gets written.

Restart the program. When it breaks at main(), set a watchpoint on the stack address that the scan found, casting it as a pointer to an int (working with watchpoints can be a bit finicky, so make sure these steps are in exactly this order, with this exact syntax):

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/pi/Projects/test-getaddrinfo/test-getaddrinfo 

Breakpoint 1, main () at test-getaddrinfo.c:11
11     memset(&hints, 0, sizeof(hints));

(gdb) watch *(int*)0x7effc130
Hardware watchpoint 3: *(int*)0x7effc130

Continue from there, and BAM! Caught the writer in the act:

(gdb) c
Continuing.

Hardware watchpoint 3: *(int*)0x7effc130

Old value = 0
New value = 1996461232
check_match (undef_name=undef_name@entry=0x76df8116 "strcasecmp", ref=0x76df775c, ref@entry=0x59c2869, version=0x22e80, version@entry=0x76fffabc, flags=1, flags@entry=2, type_class=type_class@entry=1, sym=0x76e6c754, 
    sym@entry=0x770037f0, symidx=symidx@entry=1200, strtab=0x76e70804 "", strtab@entry=0x0, map=map@entry=0x76ff94b0, versioned_sym=versioned_sym@entry=0x7effc1ac, num_versions=num_versions@entry=0x7effc1a8) at dl-lookup.c:92
92 dl-lookup.c: No such file or directory.

What is the program doing at this point? The backtrace will show that. It's quite a bit longer than the previous backtrace, with a deep call stack:

(gdb) ba
#0  check_match (undef_name=undef_name@entry=0x76df8116 "strcasecmp", ref=0x76df775c, ref@entry=0x59c2869, version=0x22e80, version@entry=0x76fffabc, flags=1, flags@entry=2, type_class=type_class@entry=1, sym=0x76e6c754, 
    sym@entry=0x770037f0, symidx=symidx@entry=1200, strtab=0x76e70804 "", strtab@entry=0x0, map=map@entry=0x76ff94b0, versioned_sym=versioned_sym@entry=0x7effc1ac, num_versions=num_versions@entry=0x7effc1a8) at dl-lookup.c:92
#1  0x76fd8548 in do_lookup_x (undef_name=0xb3850d3a <error: Cannot access memory at address 0xb3850d3a>, undef_name@entry=0x76df8116 "strcasecmp", new_hash=1994852356, new_hash@entry=3011841338, old_hash=0x76fec84c, 
    old_hash@entry=0x7effc218, ref=0x59c2869, result=<optimized out>, result@entry=0x7effc220, scope=0x76fffabc, i=<optimized out>, version=<optimized out>, version@entry=0x22e80, flags=flags@entry=1, skip=skip@entry=0x0, 
    type_class=type_class@entry=1, undef_map=undef_map@entry=0x22ac0) at dl-lookup.c:423
#2  0x76fd8b20 in _dl_lookup_symbol_x (undef_name=0x76df8116 "strcasecmp", undef_map=0x22ac0, ref=0x7effc28c, ref@entry=0x7effc284, symbol_scope=0x22c78, version=0x22e80, type_class=type_class@entry=1, flags=1, 
    skip_map=skip_map@entry=0x0) at dl-lookup.c:833
#3  0x76fde10c in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at dl-runtime.c:111
#4  0x76fe5320 in _dl_runtime_resolve () at ../sysdeps/arm/dl-trampoline.S:57
#5  0x76e05eec in __GI_ns_samename (a=a@entry=0x7effcaf8 "test.example.com", b=0x7effcf38 "test.example.com", b@entry=0x402 <error: Cannot access memory at address 0x402>) at ns_samedomain.c:196
#6  0x76dff850 in __GI___res_nameinquery (name=0x402 <error: Cannot access memory at address 0x402>, name@entry=0x0, type=1, class=1, buf=buf@entry=0x7effe088 "~\027\201\203", 
    eom=eom@entry=0x7effe888 "_nss_dns_get\210\340\377~") at res_send.c:287
#7  0x76dff984 in __GI___res_queriesmatch (buf1=0x7effd4e0 "~\027\001", buf1@entry=0x7effd40c "n", eom1=0x7effd502 "", eom1@entry=0x7effd40c "n", buf2=0x7effe088 "~\027\201\203", eom2=0x7effe888 "_nss_dns_get\210\340\377~")
    at res_send.c:342
#8  0x76e00578 in send_dg (ansp2_malloced=<optimized out>, resplen2=<optimized out>, anssizp2=<optimized out>, ansp2=<optimized out>, anscp=<optimized out>, gotsomewhere=<synthetic pointer>, v_circuit=<synthetic pointer>, 
    ns=0, terrno=0x7effe088, anssizp=0x7effd4c0, ansp=0x7effd3fc, buflen2=<optimized out>, buf2=<optimized out>, buflen=<optimized out>, buf=<optimized out>, statp=0x7effd420) at res_send.c:1422
#9  __libc_res_nsend (statp=statp@entry=0x76fa1b50 <_res>, buf=0x7effd40c "n", buf@entry=0x7effd4e0 "~\027\001", buflen=0, buflen@entry=34, buf2=0x0, buf2@entry=0x7effd504 "s\372\001", buflen2=buflen2@entry=34, 
    ans=<optimized out>, ans@entry=0x7effe088 "~\027\201\203", anssiz=<optimized out>, anssiz@entry=2048, ansp=ansp@entry=0x7effe894, ansp2=ansp2@entry=0x7effe898, nansp2=nansp2@entry=0x7effe89c, 
    resplen2=resplen2@entry=0x7effe8a0, ansp2_malloced=ansp2_malloced@entry=0x7effe8a4) at res_send.c:533
#10 0x76dfdd70 in __GI___libc_res_nquery (statp=statp@entry=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=class@entry=1, type=439963904, type@entry=0, answer=0x7effe088 "~\027\201\203", answer@entry=0x0, 
    anslen=2048, anslen@entry=439963904, answerp=0x7effe894, answerp@entry=0x76ffece8 <__stack_chk_guard>, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, 
    answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:222
#11 0x76dfe37c in __libc_res_nquerydomain (statp=statp@entry=0x76fa1b50 <_res>, name=0x7effe088 "~\027\201\203", name@entry=0x10550 "test.example.com", domain=domain@entry=0x0, class=1, class@entry=0, type=439963904, 
    type@entry=2130700444, answer=0x7effe088 "~\027\201\203", answer@entry=0x9d <error: Cannot access memory at address 0x9d>, anslen=2048, anslen@entry=1994515264, answerp=answerp@entry=0x7effe894, 
    answerp2=answerp2@entry=0x7effe898, nanswerp2=0x7effe89c, nanswerp2@entry=0x7effe8a4, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:592
#12 0x76dfe764 in __GI___libc_res_nsearch (statp=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=0, class@entry=1, type=2130700444, type@entry=439963904, answer=0x7effe088 "~\027\201\203", anslen=anslen@entry=2048, 
    answerp=answerp@entry=0x7effe894, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:376
#13 0x76e1e340 in _nss_dns_gethostbyname4_r (name=name@entry=0x10550 "test.example.com", pat=pat@entry=0x7effe998, buffer=0x7effea88 "\177", buflen=1024, errnop=errnop@entry=0x7effe99c, herrnop=herrnop@entry=0x7effe9ac, 
    ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:326
#14 0x76f1dee0 in gaih_inet (name=<optimized out>, name@entry=0x10550 "test.example.com", service=<optimized out>, req=0x7effeeb4, pai=pai@entry=0x7effea40, naddrs=<optimized out>, naddrs@entry=0x7effea4c, 
    tmpbuf=<optimized out>, tmpbuf@entry=0x7effea80) at ../sysdeps/posix/getaddrinfo.c:848
#15 0x76f1f010 in __GI_getaddrinfo (name=<optimized out>, service=<optimized out>, hints=<optimized out>, pai=0x7effeeb0) at ../sysdeps/posix/getaddrinfo.c:2391
#16 0x000104c0 in main () at test-getaddrinfo.c:16

Looking at the stack offset that the current SP represents, we can see this is indeed the deep point:

(gdb) stack_offset $sp
Address    2130690352 = 0x7effc130
Stack size   135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000
Stack offset 119088 = 0x1d130 = 116.3KB
Stack depth   16080 = 0x03ed0 =  15.7KB

Walk the stack, unwinding the stack frame and computing the size of each one (the set height command disables pagination, so gdb doesn't prompt to continue partway through the output):

(gdb) set height 0
(gdb) stack_walk
#0  check_match (undef_name=undef_name@entry=0x76df8116 "strcasecmp", ref=0x76df775c, ref@entry=0x59c2869, version=0x22e80, version@entry=0x76fffabc, flags=1, flags@entry=2, type_class=type_class@entry=1, sym=0x76e6c754, 
    sym@entry=0x770037f0, symidx=symidx@entry=1200, strtab=0x76e70804 "", strtab@entry=0x0, map=map@entry=0x76ff94b0, versioned_sym=versioned_sym@entry=0x7effc1ac, num_versions=num_versions@entry=0x7effc1a8) at dl-lookup.c:92
92 in dl-lookup.c
Top stack frame 0x7effc130

#1  0x76fd8548 in do_lookup_x (undef_name=0xb3850d3a <error: Cannot access memory at address 0xb3850d3a>, undef_name@entry=0x76df8116 "strcasecmp", new_hash=1994852356, new_hash@entry=3011841338, old_hash=0x76fec84c, 
    old_hash@entry=0x7effc218, ref=0x59c2869, result=<optimized out>, result@entry=0x7effc220, scope=0x76fffabc, i=<optimized out>, version=<optimized out>, version@entry=0x22e80, flags=flags@entry=1, skip=skip@entry=0x0, 
    type_class=type_class@entry=1, undef_map=undef_map@entry=0x22ac0) at dl-lookup.c:423
423 in dl-lookup.c
Last stack frame 0x7effc130, current 0x7effc148, size of last   24 = 0x018, total deeper     24 = 0x00018 =   0.0KB

#2  0x76fd8b20 in _dl_lookup_symbol_x (undef_name=0x76df8116 "strcasecmp", undef_map=0x22ac0, ref=0x7effc28c, ref@entry=0x7effc284, symbol_scope=0x22c78, version=0x22e80, type_class=type_class@entry=1, flags=1, 
    skip_map=skip_map@entry=0x0) at dl-lookup.c:833
833 in dl-lookup.c
Last stack frame 0x7effc148, current 0x7effc1d8, size of last  144 = 0x090, total deeper    168 = 0x000a8 =   0.2KB

#3  0x76fde10c in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at dl-runtime.c:111
111 dl-runtime.c: No such file or directory.
Last stack frame 0x7effc1d8, current 0x7effc278, size of last  160 = 0x0a0, total deeper    328 = 0x00148 =   0.3KB

#4  0x76fe5320 in _dl_runtime_resolve () at ../sysdeps/arm/dl-trampoline.S:57
57 ../sysdeps/arm/dl-trampoline.S: No such file or directory.
Last stack frame 0x7effc278, current 0x7effc2a8, size of last   48 = 0x030, total deeper    376 = 0x00178 =   0.4KB

#5  0x76e05eec in __GI_ns_samename (a=a@entry=0x7effcaf8 "test.example.com", b=0x7effcf38 "test.example.com", b@entry=0x402 <error: Cannot access memory at address 0x402>) at ns_samedomain.c:196
196 ns_samedomain.c: No such file or directory.
Last stack frame 0x7effc2a8, current 0x7effc2c0, size of last   24 = 0x018, total deeper    400 = 0x00190 =   0.4KB

#6  0x76dff850 in __GI___res_nameinquery (name=0x402 <error: Cannot access memory at address 0x402>, name@entry=0x0, type=1, class=1, buf=buf@entry=0x7effe088 "~\027\201\203", 
    eom=eom@entry=0x7effe888 "_nss_dns_get\210\340\377~") at res_send.c:287
287 res_send.c: No such file or directory.
Last stack frame 0x7effc2c0, current 0x7effcae8, size of last 2088 = 0x828, total deeper   2488 = 0x009b8 =   2.4KB

#7  0x76dff984 in __GI___res_queriesmatch (buf1=0x7effd4e0 "~\027\001", buf1@entry=0x7effd40c "n", eom1=0x7effd502 "", eom1@entry=0x7effd40c "n", buf2=0x7effe088 "~\027\201\203", eom2=0x7effe888 "_nss_dns_get\210\340\377~")
    at res_send.c:342
342 in res_send.c
Last stack frame 0x7effcae8, current 0x7effcf28, size of last 1088 = 0x440, total deeper   3576 = 0x00df8 =   3.5KB

#8  0x76e00578 in send_dg (ansp2_malloced=<optimized out>, resplen2=<optimized out>, anssizp2=<optimized out>, ansp2=<optimized out>, anscp=<optimized out>, gotsomewhere=<synthetic pointer>, v_circuit=<synthetic pointer>, 
    ns=0, terrno=0x7effe088, anssizp=0x7effd4c0, ansp=0x7effd3fc, buflen2=<optimized out>, buf2=<optimized out>, buflen=<optimized out>, buf=<optimized out>, statp=0x7effd420) at res_send.c:1422
1422 in res_send.c
Last stack frame 0x7effcf28, current 0x7effd368, size of last 1088 = 0x440, total deeper   4664 = 0x01238 =   4.6KB

#9  __libc_res_nsend (statp=statp@entry=0x76fa1b50 <_res>, buf=0x7effd40c "n", buf@entry=0x7effd4e0 "~\027\001", buflen=0, buflen@entry=34, buf2=0x0, buf2@entry=0x7effd504 "s\372\001", buflen2=buflen2@entry=34, 
    ans=<optimized out>, ans@entry=0x7effe088 "~\027\201\203", anssiz=<optimized out>, anssiz@entry=2048, ansp=ansp@entry=0x7effe894, ansp2=ansp2@entry=0x7effe898, nansp2=nansp2@entry=0x7effe89c, 
    resplen2=resplen2@entry=0x7effe8a0, ansp2_malloced=ansp2_malloced@entry=0x7effe8a4) at res_send.c:533
533 in res_send.c
Last stack frame 0x7effd368, current 0x7effd368, size of last    0 = 0x000, total deeper   4664 = 0x01238 =   4.6KB

#10 0x76dfdd70 in __GI___libc_res_nquery (statp=statp@entry=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=class@entry=1, type=439963904, type@entry=0, answer=0x7effe088 "~\027\201\203", answer@entry=0x0, 
    anslen=2048, anslen@entry=439963904, answerp=0x7effe894, answerp@entry=0x76ffece8 <__stack_chk_guard>, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, 
    answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:222
222 res_query.c: No such file or directory.
Last stack frame 0x7effd368, current 0x7effd4c0, size of last  344 = 0x158, total deeper   5008 = 0x01390 =   4.9KB

#11 0x76dfe37c in __libc_res_nquerydomain (statp=statp@entry=0x76fa1b50 <_res>, name=0x7effe088 "~\027\201\203", name@entry=0x10550 "test.example.com", domain=domain@entry=0x0, class=1, class@entry=0, type=439963904, 
    type@entry=2130700444, answer=0x7effe088 "~\027\201\203", answer@entry=0x9d <error: Cannot access memory at address 0x9d>, anslen=2048, anslen@entry=1994515264, answerp=answerp@entry=0x7effe894, 
    answerp2=answerp2@entry=0x7effe898, nanswerp2=0x7effe89c, nanswerp2@entry=0x7effe8a4, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:592
592 in res_query.c
Last stack frame 0x7effd4c0, current 0x7effd780, size of last  704 = 0x2c0, total deeper   5712 = 0x01650 =   5.6KB

#12 0x76dfe764 in __GI___libc_res_nsearch (statp=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=0, class@entry=1, type=2130700444, type@entry=439963904, answer=0x7effe088 "~\027\201\203", anslen=anslen@entry=2048, 
    answerp=answerp@entry=0x7effe894, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:376
376 in res_query.c
Last stack frame 0x7effd780, current 0x7effdbe0, size of last 1120 = 0x460, total deeper   6832 = 0x01ab0 =   6.7KB

#13 0x76e1e340 in _nss_dns_gethostbyname4_r (name=name@entry=0x10550 "test.example.com", pat=pat@entry=0x7effe998, buffer=0x7effea88 "\177", buflen=1024, errnop=errnop@entry=0x7effe99c, herrnop=herrnop@entry=0x7effe9ac, 
    ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:326
326 nss_dns/dns-host.c: No such file or directory.
Last stack frame 0x7effdbe0, current 0x7effe068, size of last 1160 = 0x488, total deeper   7992 = 0x01f38 =   7.8KB

#14 0x76f1dee0 in gaih_inet (name=<optimized out>, name@entry=0x10550 "test.example.com", service=<optimized out>, req=0x7effeeb4, pai=pai@entry=0x7effea40, naddrs=<optimized out>, naddrs@entry=0x7effea4c, 
    tmpbuf=<optimized out>, tmpbuf@entry=0x7effea80) at ../sysdeps/posix/getaddrinfo.c:848
848 ../sysdeps/posix/getaddrinfo.c: No such file or directory.
Last stack frame 0x7effe068, current 0x7effe8e0, size of last 2168 = 0x878, total deeper  10160 = 0x027b0 =   9.9KB

#15 0x76f1f010 in __GI_getaddrinfo (name=<optimized out>, service=<optimized out>, hints=<optimized out>, pai=0x7effeeb0) at ../sysdeps/posix/getaddrinfo.c:2391
2391 in ../sysdeps/posix/getaddrinfo.c
Last stack frame 0x7effe8e0, current 0x7effe9e8, size of last  264 = 0x108, total deeper  10424 = 0x028b8 =  10.2KB

#16 0x000104c0 in main () at test-getaddrinfo.c:16
16     int result = getaddrinfo("test.example.com", "80", &hints, &address_list);
Last stack frame 0x7effe9e8, current 0x7effeeb0, size of last 1224 = 0x4c8, total deeper  11648 = 0x02d80 =  11.4KB

Initial frame selected; you cannot go up.

For each stack frame, this reports the size of the previous frame, and the total deeper stack (i.e. the cumulative space deeper in the stack). What we're looking for is large frames.

Scrolling through this, there are several where "size of last" is over 1000 bytes. Generally speaking, anything over a hundred is pretty big. That's especially true on embedded systems, where the entire stack may just be a kilobyte or two, whether for bare-metal or per RTOS thread.

Then looking at the preceeding frame in each case, we see the following suspects (I've edited the stack walk output to line up the "size of last" lines with their corresponding frames, and eliminated everything under 1000 bytes):

#5  0x76e05eec in __GI_ns_samename (a=a@entry=0x7effcaf8 "test.example.com", b=0x7effcf38 "test.example.com", b@entry=0x402 <error: Cannot access memory at address 0x402>) at ns_samedomain.c:196
5 Last stack frame 0x7effc2c0, current 0x7effcae8, size of last 2088 = 0x828, total deeper   2488 = 0x009b8 =   2.4KB

#6  0x76dff850 in __GI___res_nameinquery (name=0x402 <error: Cannot access memory at address 0x402>, name@entry=0x0, type=1, class=1, buf=buf@entry=0x7effe088 "~\027\201\203", 
    eom=eom@entry=0x7effe888 "_nss_dns_get\210\340\377~") at res_send.c:287
6 Last stack frame 0x7effcae8, current 0x7effcf28, size of last 1088 = 0x440, total deeper   3576 = 0x00df8 =   3.5KB

#7  0x76dff984 in __GI___res_queriesmatch (buf1=0x7effd4e0 "~\027\001", buf1@entry=0x7effd40c "n", eom1=0x7effd502 "", eom1@entry=0x7effd40c "n", buf2=0x7effe088 "~\027\201\203", eom2=0x7effe888 "_nss_dns_get\210\340\377~")
    at res_send.c:342
7 Last stack frame 0x7effcf28, current 0x7effd368, size of last 1088 = 0x440, total deeper   4664 = 0x01238 =   4.6KB

#11 0x76dfe37c in __libc_res_nquerydomain (statp=statp@entry=0x76fa1b50 <_res>, name=0x7effe088 "~\027\201\203", name@entry=0x10550 "test.example.com", domain=domain@entry=0x0, class=1, class@entry=0, type=439963904, 
    type@entry=2130700444, answer=0x7effe088 "~\027\201\203", answer@entry=0x9d <error: Cannot access memory at address 0x9d>, anslen=2048, anslen@entry=1994515264, answerp=answerp@entry=0x7effe894, 
    answerp2=answerp2@entry=0x7effe898, nanswerp2=0x7effe89c, nanswerp2@entry=0x7effe8a4, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:592
11 Last stack frame 0x7effd780, current 0x7effdbe0, size of last 1120 = 0x460, total deeper   6832 = 0x01ab0 =   6.7KB

#12 0x76dfe764 in __GI___libc_res_nsearch (statp=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=0, class@entry=1, type=2130700444, type@entry=439963904, answer=0x7effe088 "~\027\201\203", anslen=anslen@entry=2048, 
    answerp=answerp@entry=0x7effe894, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:376
12 Last stack frame 0x7effdbe0, current 0x7effe068, size of last 1160 = 0x488, total deeper   7992 = 0x01f38 =   7.8KB

#13 0x76e1e340 in _nss_dns_gethostbyname4_r (name=name@entry=0x10550 "test.example.com", pat=pat@entry=0x7effe998, buffer=0x7effea88 "\177", buflen=1024, errnop=errnop@entry=0x7effe99c, herrnop=herrnop@entry=0x7effe9ac, 
    ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:326
13 Last stack frame 0x7effe068, current 0x7effe8e0, size of last 2168 = 0x878, total deeper  10160 = 0x027b0 =   9.9KB

#15 0x76f1f010 in __GI_getaddrinfo (name=<optimized out>, service=<optimized out>, hints=<optimized out>, pai=0x7effeeb0) at ../sysdeps/posix/getaddrinfo.c:2391
15 Last stack frame 0x7effe9e8, current 0x7effeeb0, size of last 1224 = 0x4c8, total deeper  11648 = 0x02d80 =  11.4KB

This is all a result of calling getaddrinfo(). What's it doing that takes so much space?

Switch to frame 15 and disassemble the current function, which is __GI_getaddrinfo(). Notice that gdb says there's no such file, since this is a prebuilt library function, so leave off the /s option. The function is long, so I've just shown the prologue:

(gdb) f 15
#15 0x76f1f010 in __GI_getaddrinfo (name=<optimized out>, service=<optimized out>, hints=<optimized out>, pai=0x7effeec0) at ../sysdeps/posix/getaddrinfo.c:2391
2391 ../sysdeps/posix/getaddrinfo.c: No such file or directory.

(gdb) disassemble 
Dump of assembler code for function __GI_getaddrinfo:
   0x76f1eef0 <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
   0x76f1eef4 <+4>: add r11, sp, #32
   0x76f1eef8 <+8>: ldr r6, [pc, #2712] ; 0x76f1f998 <__GI_getaddrinfo+2728>
   0x76f1eefc <+12>: sub sp, sp, #1184 ; 0x4a0

The prologue saves off a number of registers that the function will be using (7 plus the usual FP and LR). The last line extends the frame by 1184 bytes, so there's the bulk of that 1224 bytes we saw listed in the stack walk. Adding the 32 bytes that the register save needs, we get 1216. That's close enough for now.

Why does this function need such a large stack frame? We don't have the source...but we have the source file name and partial path. A quick Internet search for "sysdeps/posix/getaddrinfo.c" turns up a website that lists a version of this file: Woboq getaddrinfo source code. Awesome!

Gdb says we're currently at line 2391 of the file, where it calls gaih_inet (frame 14). Searching the source listing webpage, getaddrinfo() calls gaih_inet() at a different line, 2265:

2263       struct scratch_buffer tmpbuf;
2264       scratch_buffer_init (&tmpbuf);
2265       last_i = gaih_inet (name, pservice, hints, end, &naddrs, &tmpbuf);

That means this listing isn't for the exact same version of the library we're running. Again, this is close enough for now.

One of the arguments to gaih_inet() is tmpbuf, a local variable defined in line 2263. Any time you see a large allocation, things called "buffers" are good candidates for investigation. Clicking on scratch_buffer takes us to its structure declaration:

64 /* Scratch buffer.  Must be initialized with scratch_buffer_init
65    before its use.  */
66 struct scratch_buffer {
67   void *data;    /* Pointer to the beginning of the scratch area.  */
68   size_t length; /* Allocated space at the data pointer, in bytes.  */
69   union { max_align_t __align; char __c[1024]; } __space;
70 };

BAM again! It contains a character array of 1024 bytes.

Now that we have navigable source code, we can follow this procedure on down the stack. Frame 13, _nss_dns_gethostbyname4_r(), is the next large one reported by the stack walk, with 2168 bytes. Examine its prologue:

(gdb) f 13
#13 0x76e1e340 in _nss_dns_gethostbyname4_r (name=name@entry=0x10550 "test.example.com", pat=pat@entry=0x7effe9a8, buffer=0x7effea98 "\177", buflen=1024, 
    errnop=errnop@entry=0x7effe9ac, herrnop=herrnop@entry=0x7effe9bc, ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:326
326 nss_dns/dns-host.c: No such file or directory.

(gdb) disassemble 
Dump of assembler code for function _nss_dns_gethostbyname4_r:
   0x76e1e268 <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
   0x76e1e26c <+4>: add r11, sp, #32
   0x76e1e270 <+8>: ldr r4, [pc, #812] ; 0x76e1e5a4 <_nss_dns_gethostbyname4_r+828>
   0x76e1e274 <+12>: sub sp, sp, #76 ; 0x4c

Again, 9 registers get pushed, but then the frame is only extended by 76 bytes. So something different is going on.

Search for the function name in the Woboq search box. Even though the line numbers don't match exactly due to version skew, they're close, and the right file names are showing up, matching what gdb reports, helping to confirm that we're looking at the right code.

Examining that code, the function doesn't have one of those big scratch_buffer structures, but another suspicous looking line is this one, allocating 2048 bytes, close to the reported frame size when you add the 32 bytes for register saves and the 76 bytes of frame extension:

364   host_buffer.buf = orig_host_buffer = (querybuf *) alloca (2048);

That looks similar to malloc(), right? But malloc() does heap allocation, not stack. However, it turns out alloca() is a stack allocator, as described here.

A little further down the disassembly, we see this manipulation of the SP, so that must be the implementation of alloca():

   0x76e1e2bc <+84>: sub r3, sp, #2048 ; 0x800
   0x76e1e2c0 <+88>: ldr r1, [pc, #736] ; 0x76e1e5a8 <_nss_dns_gethostbyname4_r+832>
   0x76e1e2c4 <+92>: ldr r4, [pc, #736] ; 0x76e1e5ac <_nss_dns_gethostbyname4_r+836>
   0x76e1e2c8 <+96>: sub sp, r3, #8

For frame 12, __GI___libc_res_nsearch():

(gdb) f 12
#12 0x76dfe764 in __GI___libc_res_nsearch (statp=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=0, class@entry=1, type=2130700460, type@entry=439963904, 
    answer=0x7effe098 "<\202\201\203", anslen=anslen@entry=2048, answerp=answerp@entry=0x7effe8a4, answerp2=answerp2@entry=0x7effe8a8, 
    nanswerp2=nanswerp2@entry=0x7effe8ac, resplen2=resplen2@entry=0x7effe8b0, answerp2_malloced=answerp2_malloced@entry=0x7effe8b4) at res_query.c:376
376 res_query.c: No such file or directory.

(gdb) disassemble 
Dump of assembler code for function __GI___libc_res_nsearch:
   0x76dfe640 <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
   0x76dfe644 <+4>: sub sp, sp, #1120 ; 0x460

That looks like a familiar stack frame expansion of 1120 bytes. But searching Woboq for "__GI___libc_res_nsearch" doesn't find a match. Try eliminating some of that stuff that looks like library prefix from the name and search for just "res_nsearch". That gets us to several results, one of which is res_query.c, the file gdb listed as containing the function. Frame 11 is __libc_res_nquerydomain() in the same source file. While it's not quite clear looking at the source file what's going on with frame 12, we can see there's a function res_nquerydomain() in it.

Why the naming confusion? I think it has something to do with glibc naming conventions and library symbol formation.

Looking around at frame 11 and 10, we seem to be in the right place, but the function that frame 10 calls, __libc_res_nsend(), doesn't show up as a call in the source file, even when stripping the name down. So we seem to be getting off track. Perhaps it's the library version difference catching up with us.

But looking around at some of the other functions in the file, we can see that in this version of the file, res_nquerydomain() calls context_querydomain_common(), which calls __res_context_querydomain(), which has this local variable, another "buffer":

568         char nbuf[MAXDNAME];

How big is MAXDNAME? Clicking on it shows this:

79 #define MAXDNAME        NS_MAXDNAME

Searching for NS_MAXDNAME reveals this:

59 #define NS_MAXDNAME        1025        /*%< maximum domain name */

That's another BAM! These two symbols show up as local buffer sizes in several functions. So that's probably what's going on in our version of the library, another large local buffer, sized for a large worst-case maximum domain name string.

Move on to frame 7, __GI___res_queriesmatch() in file res_send.c. Entering that file name in the Woboq search box gets us to it. Searching for the trailing part of the function name, we find res_queriesmatch(). Scrolling through it, this jumps right out:

377                 char tname[MAXDNAME+1];

The exact same story for frame 6, __GI___res_nameinquery(). So these giant buffers are getting allocated repeatedly all down the call stack. Frame 5, __GI_ns_samename() in ns_samedomain.c, has this line:

191         char ta[NS_MAXDNAME], tb[NS_MAXDNAME];

GAAAAH, two of them! Ironically, the string we're using is just "test.example.com". The buffers need to be sized for the maximum possible name, but assuming that at every level is pretty wasteful.

We have our answer about how an extra 11.4KB of stack gets sucked up translating a domain name to an IP address. For a general-purpose OS like Linux, that's not really a big deal. But that would never fly on a small embedded system.

That's why you see custom libraries for embedded systems, streamlined TCP/IP stacks and such. Among other things, they would probably constrain domain names to much shorter strings. This illustrates one of the differences between general-purpose coding, such as for desktops and servers, and embedded systems.

Now we have another tangent. The source for ns_samename() shows it calling strcasecmp(). But the gdb backtrace shows it calling _dl_runtime_resolve() in file sysdeps/arm/dl-trampoline.S. Interesting, an assembly language file with a strange name!

Woboq search reveals a number of architecture-specific versions of dl-trampoline.S. We want the ARM version. It starts with this line:

1 /* PLT trampolines.  ARM version.

What the heck is a PLT trampoline? It's the code for the procedure linkage table that triggers dynamic library loading and then jumps to the function in the library, as described in PLT and GOT - the key to code sharing and dynamic libraries.

Why the term "trampoline"? I guess because the first caller to the function runs into the stub, which causes library loading and fixup operations, then bounces into the actual function. Subsequent callers will just jump to the function, skipping the trampoline.

Sounds like something else fun to learn about!



The 2024 Embedded Online Conference

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: