Bill <jobhunts02@aol.com> writes:

> I am trying to use DUMA to find the problem and am getting the
> following error when I try to link to the DUMA library:

It amazes me that in all this time you never managed to answer a
simple question: is the crash *in* malloc, or is it in your own code?

You are continuing to debug this as if there is malloc corruption,
and this will prove futile if the crash is (as I suspect) in your
own code instead.

> # LD_PRELOAD=/lib/libduma.a /bin/snmpd &
> # ERROR: ld.so: object '/lib/libduma.a' from LD_PRELOAD cannot be
> preloaded: ignored.

You can only preload shared libraries.

Besides, reading man page for libduma, I see that it uses *exact*
same strategy as efence: a guard page after every allocation.

So, once you manage to build a shared libduma.so, and preload it;
it will most likely fail just like efence did, because the overhead
of guard pages is too great for majority of real-world (non-toy)
applications.

Cheers,
-- 
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.

> # LD_PRELOAD=/lib/libduma.a /bin/snmpd &
> # ERROR: ld.so: object '/lib/libduma.a' from LD_PRELOAD cannot be
> preloaded: ignored.

Only an executable object (.e_type is ET_DYN or ET_EXEC) can be pre-loaded.
An archive library (*.a) that contains ET_REL files cannot be pre-loaded.

--

I am trying to use DUMA to find the problem and am getting the
following error when I try to link to the DUMA library:


# LD_PRELOAD=/lib/libduma.a /bin/snmpd &
# ERROR: ld.so: object '/lib/libduma.a' from LD_PRELOAD cannot be
preloaded: ignored.

What can cause this error when using LD_PRELOAD?

David Schwartz <davids@webmaster.com> writes:
> On Oct 15, 6:35&#4294967295;am, John Reiser <jrei...@BitWagon.com> wrote:
>
>> #include <stdio.h>
>>
>> char *f(a)
>> {
>> &#4294967295; &#4294967295; &#4294967295; &#4294967295; return 0;
>>
>> }
>>
>> main()
>> {
>> &#4294967295; &#4294967295; &#4294967295; &#4294967295; char *p = f(10);
>> &#4294967295; &#4294967295; &#4294967295; &#4294967295; if (NULL==p) {
>> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; printf("f(10) is NULL.\n");
>> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; /* fflush(stdout); &#4294967295; &#4294967295;THE FIX */
>> &#4294967295; &#4294967295; &#4294967295; &#4294967295; }
>> &#4294967295; &#4294967295; &#4294967295; &#4294967295; return *p;}
>
> This is still not very good. What if 'printf' needs to allocate memory
> to do its job? What if 'fflush' does? In an error handler like this,
> you are better off calling 'write' directly.

Another option is to force a segfault, ie assign to the area when the
pointer is null. The values of the various CPU registers, especially
the program counter, can then (in combination with a disasembler) be
used to determine the location of the crash and hence, the condition
at the time of the test.

Paul Pluzhnikov <ppluzhnikov-nsp@gmail.com> writes:
> Nate Eldredge <nate@vulcan.lan> writes:
>>>  Is the
>>> problem that address 0x2d is not in the ranges shown in pmap?
>>
>> Well, sort of.  0x2d isn't in that range because that page isn't mapped.
>> But it's not supposed to be mapped.  The first page of virtual memory is
>> always unmapped, so that NULL pointer dereferences generate faults.  So
>> it's an address that can't possibly be valid.  If the crash is inside
>> malloc, as you said earlier, then most likely some pointer in malloc's
>> data structures got overwritten with 0x0000002d.
>
> The OP stated that he doesn't actually know that, only deduces this
> from lack of printed message (which, as John Reiser aptly suggests, may
> be due to naive use of stdout buffering; where stderr was likely
> called for).
>
> Much more likely than pointer being overwritten is that malloc()
> in fact returned NULL, and OP then did (an equivalent of):

This means roughly 'it is more likely that the system ran out of
memory than that the application contained a programming error'.
But this is again a question which can be answered very simply: Check
the PC/IP value at the time of the segfault. That's either within
malloc (as the OP has repeatedly claimed) or within application code.

So, what is it?

On Oct 15, 6:35=A0am, John Reiser <jrei...@BitWagon.com> wrote:

> #include <stdio.h>
>
> char *f(a)
> {
> =A0 =A0 =A0 =A0 return 0;
>
> }
>
> main()
> {
> =A0 =A0 =A0 =A0 char *p =3D f(10);
> =A0 =A0 =A0 =A0 if (NULL=3D=3Dp) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("f(10) is NULL.\n");
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* fflush(stdout); =A0 =A0THE FIX */
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 return *p;}

This is still not very good. What if 'printf' needs to allocate memory
to do its job? What if 'fflush' does? In an error handler like this,
you are better off calling 'write' directly.

DS

Nate Eldredge <nate@vulcan.lan> writes:

>>  Is the
>> problem that address 0x2d is not in the ranges shown in pmap?
>
> Well, sort of.  0x2d isn't in that range because that page isn't mapped.
> But it's not supposed to be mapped.  The first page of virtual memory is
> always unmapped, so that NULL pointer dereferences generate faults.  So
> it's an address that can't possibly be valid.  If the crash is inside
> malloc, as you said earlier, then most likely some pointer in malloc's
> data structures got overwritten with 0x0000002d.

The OP stated that he doesn't actually know that, only deduces this
from lack of printed message (which, as John Reiser aptly suggests, may
be due to naive use of stdout buffering; where stderr was likely
called for).

Much more likely than pointer being overwritten is that malloc()
in fact returned NULL, and OP then did (an equivalent of):

  struct Foo *p = malloc(sizeof(Foo));
  p->some_field_at_offset_0x2d = 1;

Cheers,
-- 
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.

Andrew Smallshaw wrote:
> Paul Pluzhnikov <ppluzhnikov-nsp@gmail.com> wrote:
> 
>> Efence adds 1 page guard to every malloc.
>> It is very rarely helpful in debugging non-toy applications.
> 
> That may be your experience but personally I find it incredibly
> useful for certain classes of problems.  Maybe not high-level
> stuff or full applications but for low level data structure test
> beds I find you can literally do in a morning what may take a
> week overwise.

Take a look at the description of the debug facilities in
nmalloc.txh.  That is part of nmalloc.zip, and is the source for
the info documentation of nmalloc.  nmalloc, in turn, is almost
pure standard C, but relies on the system sbrk() to get mamory
space, and makes some (quite usual) assumptions about memory.  See:

  <http://cbfalconer.home.att.net/download/nmalloc.zip>

-- 
 [mail]: Chuck F (cbfalconer at maineline dot net) 
 [page]: <http://cbfalconer.home.att.net>
            Try the download section.

Bill <jobhunts02@aol.com> writes:

> When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR

Page fault when accessing an unmapped page.

> and si_addr of  0x2d.  What does address 0x2d represent?

The address that the program tried to access.

>  Is the
> problem that address 0x2d is not in the ranges shown in pmap?

Well, sort of.  0x2d isn't in that range because that page isn't mapped.
But it's not supposed to be mapped.  The first page of virtual memory is
always unmapped, so that NULL pointer dereferences generate faults.  So
it's an address that can't possibly be valid.  If the crash is inside
malloc, as you said earlier, then most likely some pointer in malloc's
data structures got overwritten with 0x0000002d.

If you have a core dump, you might be able to trace backwards a little
ways to figure out where this pointer itself is located.  If you
recognize the data around it, it might suggest to you what part of your
program could be guilty of overwriting it.  (As a start, 0x2d is ASCII
'-'.  Any part of your program use hyphens?)

On Oct 15, 9:05=A0am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> Bill <jobhunt...@aol.com> writes:
> > Below is what pmap -x gives for the process (snmpd) upon failing at a
> > call to malloc for 65536 bytes. =A0Does anything here would indicate a
> > possible problem trying to malloc 65536 bytes?
>
> [...]
>
> > Address =A0 Kbytes =A0 =A0 RSS =A0 =A0Anon =A0Locked Mode =A0 Mapping
> > 0f8b8000 =A0 =A0 =A064 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0libresolv-2.6.so
>
> [...]
>
> > 10000000 =A0 =A01192 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0snmpd
> > 10169000 =A0 =A0 =A032 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0snmpd
> > 10171000 =A0 =A0 552 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0 =A0[ anon ]
>
> The last line should describe the 'regular heap' of the application
> (the area used by brk/sbrk). Its present size is 552K and it could
> grow by about another 510M until it would 'hit' ld-2.6.so (sbrk/brk
> would return null pointers then).
>
> > 30000000 =A0 =A0 116 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0ld-2.6.so
> > 3001d000 =A0 =A0 =A024 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 30023000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r--s- =
=A0 =A0[ shmid=3D0x0 ]
> > 30024000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 30025000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r--s- =
=A0 =A0[ shmid=3D0x0 ]
> > 3005c000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r---- =
=A0ld-2.6.so
> > 3005d000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0ld-2.6.so
> > 3005e000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - ----- =
=A0 =A0[ anon ]
> > 3005f000 =A0 =A08188 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 3085e000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - ----- =
=A0 =A0[ anon ]
> > 3085f000 =A0 =A08188 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 7ff61000 =A0 =A0 332 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ stack ]
> > -------- ------- ------- ------- -------
> > total kB =A0 25084 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 -
>
> The two 8818K segements preceded by a single page w/ 'no access' are
> most likely (userspace) NPTL-stacks for two threads (default NPTL
> thread stack size is 8M, the lowest 4K are used as guard page so that
> an access beyond the bounds of one stack causes a [MMU] exception
> instead of overwriting data on the other stack). These stacks are
> allocated by calling mmap with MAP_ANON. There is still plenty of
> space for other anonymous mappings between the highest used address
> (0x3105f000) and the lowest presently used address of the
> conventional 'stack segment'.
>
> Unless I am very much mistaken, this process should certainly be
> capable of allocating more virtual memory using either brk/sbrk or
> mmap.


When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR
and si_addr of  0x2d.  What does address 0x2d represent?  Is the
problem that address 0x2d is not in the ranges shown in pmap?