Reply by October 18, 20082008-10-18
Bill <jobhunts02@aol.com> writes:

> I am trying to use DUMA to find the problem and am getting the > following error when I try to link to the DUMA library:
It amazes me that in all this time you never managed to answer a simple question: is the crash *in* malloc, or is it in your own code? You are continuing to debug this as if there is malloc corruption, and this will prove futile if the crash is (as I suspect) in your own code instead.
> # LD_PRELOAD=/lib/libduma.a /bin/snmpd & > # ERROR: ld.so: object '/lib/libduma.a' from LD_PRELOAD cannot be > preloaded: ignored.
You can only preload shared libraries. Besides, reading man page for libduma, I see that it uses *exact* same strategy as efence: a guard page after every allocation. So, once you manage to build a shared libduma.so, and preload it; it will most likely fail just like efence did, because the overhead of guard pages is too great for majority of real-world (non-toy) applications. Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email.
Reply by John Reiser October 17, 20082008-10-17
> # LD_PRELOAD=/lib/libduma.a /bin/snmpd & > # ERROR: ld.so: object '/lib/libduma.a' from LD_PRELOAD cannot be > preloaded: ignored.
Only an executable object (.e_type is ET_DYN or ET_EXEC) can be pre-loaded. An archive library (*.a) that contains ET_REL files cannot be pre-loaded. --
Reply by Bill October 17, 20082008-10-17
I am trying to use DUMA to find the problem and am getting the
following error when I try to link to the DUMA library:


# LD_PRELOAD=/lib/libduma.a /bin/snmpd &
# ERROR: ld.so: object '/lib/libduma.a' from LD_PRELOAD cannot be
preloaded: ignored.

What can cause this error when using LD_PRELOAD?
Reply by Rainer Weikusat October 16, 20082008-10-16
David Schwartz <davids@webmaster.com> writes:
> On Oct 15, 6:35&#4294967295;am, John Reiser <jrei...@BitWagon.com> wrote: > >> #include <stdio.h> >> >> char *f(a) >> { >> &#4294967295; &#4294967295; &#4294967295; &#4294967295; return 0; >> >> } >> >> main() >> { >> &#4294967295; &#4294967295; &#4294967295; &#4294967295; char *p = f(10); >> &#4294967295; &#4294967295; &#4294967295; &#4294967295; if (NULL==p) { >> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; printf("f(10) is NULL.\n"); >> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; /* fflush(stdout); &#4294967295; &#4294967295;THE FIX */ >> &#4294967295; &#4294967295; &#4294967295; &#4294967295; } >> &#4294967295; &#4294967295; &#4294967295; &#4294967295; return *p;} > > This is still not very good. What if 'printf' needs to allocate memory > to do its job? What if 'fflush' does? In an error handler like this, > you are better off calling 'write' directly.
Another option is to force a segfault, ie assign to the area when the pointer is null. The values of the various CPU registers, especially the program counter, can then (in combination with a disasembler) be used to determine the location of the crash and hence, the condition at the time of the test.
Reply by Rainer Weikusat October 16, 20082008-10-16
Paul Pluzhnikov <ppluzhnikov-nsp@gmail.com> writes:
> Nate Eldredge <nate@vulcan.lan> writes: >>> Is the >>> problem that address 0x2d is not in the ranges shown in pmap? >> >> Well, sort of. 0x2d isn't in that range because that page isn't mapped. >> But it's not supposed to be mapped. The first page of virtual memory is >> always unmapped, so that NULL pointer dereferences generate faults. So >> it's an address that can't possibly be valid. If the crash is inside >> malloc, as you said earlier, then most likely some pointer in malloc's >> data structures got overwritten with 0x0000002d. > > The OP stated that he doesn't actually know that, only deduces this > from lack of printed message (which, as John Reiser aptly suggests, may > be due to naive use of stdout buffering; where stderr was likely > called for). > > Much more likely than pointer being overwritten is that malloc() > in fact returned NULL, and OP then did (an equivalent of):
This means roughly 'it is more likely that the system ran out of memory than that the application contained a programming error'. But this is again a question which can be answered very simply: Check the PC/IP value at the time of the segfault. That's either within malloc (as the OP has repeatedly claimed) or within application code. So, what is it?
Reply by David Schwartz October 16, 20082008-10-16
On Oct 15, 6:35=A0am, John Reiser <jrei...@BitWagon.com> wrote:

> #include <stdio.h> > > char *f(a) > { > =A0 =A0 =A0 =A0 return 0; > > } > > main() > { > =A0 =A0 =A0 =A0 char *p =3D f(10); > =A0 =A0 =A0 =A0 if (NULL=3D=3Dp) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("f(10) is NULL.\n"); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* fflush(stdout); =A0 =A0THE FIX */ > =A0 =A0 =A0 =A0 } > =A0 =A0 =A0 =A0 return *p;}
This is still not very good. What if 'printf' needs to allocate memory to do its job? What if 'fflush' does? In an error handler like this, you are better off calling 'write' directly. DS
Reply by October 16, 20082008-10-16
Nate Eldredge <nate@vulcan.lan> writes:

>> Is the >> problem that address 0x2d is not in the ranges shown in pmap? > > Well, sort of. 0x2d isn't in that range because that page isn't mapped. > But it's not supposed to be mapped. The first page of virtual memory is > always unmapped, so that NULL pointer dereferences generate faults. So > it's an address that can't possibly be valid. If the crash is inside > malloc, as you said earlier, then most likely some pointer in malloc's > data structures got overwritten with 0x0000002d.
The OP stated that he doesn't actually know that, only deduces this from lack of printed message (which, as John Reiser aptly suggests, may be due to naive use of stdout buffering; where stderr was likely called for). Much more likely than pointer being overwritten is that malloc() in fact returned NULL, and OP then did (an equivalent of): struct Foo *p = malloc(sizeof(Foo)); p->some_field_at_offset_0x2d = 1; Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email.
Reply by CBFalconer October 15, 20082008-10-15
Andrew Smallshaw wrote:
> Paul Pluzhnikov <ppluzhnikov-nsp@gmail.com> wrote: > >> Efence adds 1 page guard to every malloc. >> It is very rarely helpful in debugging non-toy applications. > > That may be your experience but personally I find it incredibly > useful for certain classes of problems. Maybe not high-level > stuff or full applications but for low level data structure test > beds I find you can literally do in a morning what may take a > week overwise.
Take a look at the description of the debug facilities in nmalloc.txh. That is part of nmalloc.zip, and is the source for the info documentation of nmalloc. nmalloc, in turn, is almost pure standard C, but relies on the system sbrk() to get mamory space, and makes some (quite usual) assumptions about memory. See: <http://cbfalconer.home.att.net/download/nmalloc.zip> -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section.
Reply by Nate Eldredge October 15, 20082008-10-15
Bill <jobhunts02@aol.com> writes:

> When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR
Page fault when accessing an unmapped page.
> and si_addr of 0x2d. What does address 0x2d represent?
The address that the program tried to access.
> Is the > problem that address 0x2d is not in the ranges shown in pmap?
Well, sort of. 0x2d isn't in that range because that page isn't mapped. But it's not supposed to be mapped. The first page of virtual memory is always unmapped, so that NULL pointer dereferences generate faults. So it's an address that can't possibly be valid. If the crash is inside malloc, as you said earlier, then most likely some pointer in malloc's data structures got overwritten with 0x0000002d. If you have a core dump, you might be able to trace backwards a little ways to figure out where this pointer itself is located. If you recognize the data around it, it might suggest to you what part of your program could be guilty of overwriting it. (As a start, 0x2d is ASCII '-'. Any part of your program use hyphens?)
Reply by Bill October 15, 20082008-10-15
On Oct 15, 9:05=A0am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> Bill <jobhunt...@aol.com> writes: > > Below is what pmap -x gives for the process (snmpd) upon failing at a > > call to malloc for 65536 bytes. =A0Does anything here would indicate a > > possible problem trying to malloc 65536 bytes? > > [...] > > > Address =A0 Kbytes =A0 =A0 RSS =A0 =A0Anon =A0Locked Mode =A0 Mapping > > 0f8b8000 =A0 =A0 =A064 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0libresolv-2.6.so
> > [...] > > > 10000000 =A0 =A01192 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0snmpd
> > 10169000 =A0 =A0 =A032 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0snmpd
> > 10171000 =A0 =A0 552 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0 =A0[ anon ]
> > The last line should describe the 'regular heap' of the application > (the area used by brk/sbrk). Its present size is 552K and it could > grow by about another 510M until it would 'hit' ld-2.6.so (sbrk/brk > would return null pointers then). > > > 30000000 =A0 =A0 116 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0ld-2.6.so
> > 3001d000 =A0 =A0 =A024 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 30023000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r--s- =
=A0 =A0[ shmid=3D0x0 ]
> > 30024000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 30025000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r--s- =
=A0 =A0[ shmid=3D0x0 ]
> > 3005c000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r---- =
=A0ld-2.6.so
> > 3005d000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0ld-2.6.so
> > 3005e000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - ----- =
=A0 =A0[ anon ]
> > 3005f000 =A0 =A08188 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 3085e000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - ----- =
=A0 =A0[ anon ]
> > 3085f000 =A0 =A08188 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 7ff61000 =A0 =A0 332 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ stack ]
> > -------- ------- ------- ------- ------- > > total kB =A0 25084 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - > > The two 8818K segements preceded by a single page w/ 'no access' are > most likely (userspace) NPTL-stacks for two threads (default NPTL > thread stack size is 8M, the lowest 4K are used as guard page so that > an access beyond the bounds of one stack causes a [MMU] exception > instead of overwriting data on the other stack). These stacks are > allocated by calling mmap with MAP_ANON. There is still plenty of > space for other anonymous mappings between the highest used address > (0x3105f000) and the lowest presently used address of the > conventional 'stack segment'. > > Unless I am very much mistaken, this process should certainly be > capable of allocating more virtual memory using either brk/sbrk or > mmap.
When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR and si_addr of 0x2d. What does address 0x2d represent? Is the problem that address 0x2d is not in the ranges shown in pmap?