EmbeddedRelated.com
Forums

ElectricFence Exiting: mprotect() failed: Cannot allocate memory

Started by Bill October 13, 2008
Bill wrote:
> > Below is what pmap -x gives for the process (snmpd) upon failing > at a call to malloc for 65536 bytes. Does anything here would > indicate a possible problem trying to malloc 65536 bytes? It > should be noted that a call to pmap -x before the failure while > snmpd was still running gave identical results. Therefore, I > wonder if the cause of the problem can be seen here?
Please do not top-post, but do snip properly. Your answer belongs after (or intermixed with) the quoted material to which you reply, after snipping all irrelevant material. This gives prospective repliers a fighting chance at understanding the thread. See the following links: <http://www.catb.org/~esr/faqs/smart-questions.html> <http://www.caliburn.nl/topposting.html> <http://www.netmeister.org/news/learn2quote.html> <http://cfaj.freeshell.org/google/> (taming google) <http://members.fortunecity.com/nnqweb/> (newusers) -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section.
Bill <jobhunts02@aol.com> writes:

> After about 3 hours, the program seg faults when trying to do a malloc > 65K bytes.
Are you *sure* above description is accurate? Is it that the application gets SIGSEGV *while* trying to do a malloc (IOW, it crashes *inside* malloc), or is it that the application gets NULL from malloc and gets SIGSEGV when it attempts to use returned memory? The former implies heap corruption, the latter heap exhaustion. MALLOC_CHECK_, Valgrind, efence all help with the former, but are useless for the latter. Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email.
On Oct 14, 9:05=A0pm, Paul Pluzhnikov <ppluzhnikov-...@gmail.com> wrote:
> Bill <jobhunt...@aol.com> writes: > > After about 3 hours, the program seg faults when trying to do a malloc > > 65K bytes. > > Are you *sure* above description is accurate? > > Is it that the application gets SIGSEGV *while* trying to do a > malloc (IOW, it crashes *inside* malloc), or is it that the > application gets NULL from malloc and gets SIGSEGV when it attempts > to use returned memory?
A backtrace in the SIGSEGV signal handler I put into the application points to the line where the malloc occurs. There is an if statement to check for a NULL pointer and print a message if malloc returned a NULL pointer. No message is printed.
> > The former implies heap corruption, the latter heap exhaustion. > > MALLOC_CHECK_, Valgrind, efence all help with the former, but are > useless for the latter.
1. Valgrind slows down the application too much to be effective. 2. efence exits during initialization with the "Exiting: mprotect() failed: Cannot allocate memory" error. 3. I am running a test right now with MALLOC_CHECK_=3D2 and will examine the results in the morning.
> > Cheers, > -- > In order to understand recursion you must first understand recursion. > Remove /-nsp/ for email.
> A backtrace in the SIGSEGV signal handler I put into the application > points to the line where the malloc occurs. There is an if statement > to check for a NULL pointer and print a message if malloc returned a > NULL pointer. No message is printed.
Beware of the possibility of buffering. Consider the program below. When run interactively with stdout connected to a terminal, then: f(10) is NULL. Segmentation fault where the first line is unbuffered stdout from the program, and the second line is unbuffered stderr from the shell. When run with stdout re-directed into a regular file, then you see only: Segmentation fault on stderr, and the file is *empty* ["No message is printed."] because the buffer was not flushed. So remember fflush(). ----- #include <stdio.h> char *f(a) { return 0; } main() { char *p = f(10); if (NULL==p) { printf("f(10) is NULL.\n"); /* fflush(stdout); THE FIX */ } return *p; } -----
Bill <jobhunts02@aol.com> writes:
> Below is what pmap -x gives for the process (snmpd) upon failing at a > call to malloc for 65536 bytes. Does anything here would indicate a > possible problem trying to malloc 65536 bytes?
[...]
> Address Kbytes RSS Anon Locked Mode Mapping > 0f8b8000 64 - - - r-x-- libresolv-2.6.so
[...]
> 10000000 1192 - - - r-x-- snmpd > 10169000 32 - - - rwx-- snmpd > 10171000 552 - - - rwx-- [ anon ]
The last line should describe the 'regular heap' of the application (the area used by brk/sbrk). Its present size is 552K and it could grow by about another 510M until it would 'hit' ld-2.6.so (sbrk/brk would return null pointers then).
> 30000000 116 - - - r-x-- ld-2.6.so > 3001d000 24 - - - rw--- [ anon ] > 30023000 4 - - - r--s- [ shmid=0x0 ] > 30024000 4 - - - rw--- [ anon ] > 30025000 4 - - - r--s- [ shmid=0x0 ] > 3005c000 4 - - - r---- ld-2.6.so > 3005d000 4 - - - rwx-- ld-2.6.so > 3005e000 4 - - - ----- [ anon ] > 3005f000 8188 - - - rw--- [ anon ] > 3085e000 4 - - - ----- [ anon ] > 3085f000 8188 - - - rw--- [ anon ] > 7ff61000 332 - - - rw--- [ stack ] > -------- ------- ------- ------- ------- > total kB 25084 - - -
The two 8818K segements preceded by a single page w/ 'no access' are most likely (userspace) NPTL-stacks for two threads (default NPTL thread stack size is 8M, the lowest 4K are used as guard page so that an access beyond the bounds of one stack causes a [MMU] exception instead of overwriting data on the other stack). These stacks are allocated by calling mmap with MAP_ANON. There is still plenty of space for other anonymous mappings between the highest used address (0x3105f000) and the lowest presently used address of the conventional 'stack segment'. Unless I am very much mistaken, this process should certainly be capable of allocating more virtual memory using either brk/sbrk or mmap. BTW, while getting non-spam e-mails at least ocassionally is nice :-), I usually read postings in the groups I frequent, except insofar 'certain posters', whom I deem to be more of an annoyance than an information source, will be filtered by my newsreader.
On 2008-10-14, Paul Pluzhnikov <ppluzhnikov-nsp@gmail.com> wrote:
> > Efence adds 1 page guard to every malloc. > It is very rarely helpful in debugging non-toy applications.
That may be your experience but personally I find it incredibly useful for certain classes of problems. Maybe not high-level stuff or full applications but for low level data structure test beds I find you can literally do in a morning what may take a week overwise. -- Andrew Smallshaw andrews@sdf.lonestar.org
On Oct 15, 9:05=A0am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> Bill <jobhunt...@aol.com> writes: > > Below is what pmap -x gives for the process (snmpd) upon failing at a > > call to malloc for 65536 bytes. =A0Does anything here would indicate a > > possible problem trying to malloc 65536 bytes? > > [...] > > > Address =A0 Kbytes =A0 =A0 RSS =A0 =A0Anon =A0Locked Mode =A0 Mapping > > 0f8b8000 =A0 =A0 =A064 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0libresolv-2.6.so
> > [...] > > > 10000000 =A0 =A01192 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0snmpd
> > 10169000 =A0 =A0 =A032 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0snmpd
> > 10171000 =A0 =A0 552 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0 =A0[ anon ]
> > The last line should describe the 'regular heap' of the application > (the area used by brk/sbrk). Its present size is 552K and it could > grow by about another 510M until it would 'hit' ld-2.6.so (sbrk/brk > would return null pointers then). > > > 30000000 =A0 =A0 116 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0ld-2.6.so
> > 3001d000 =A0 =A0 =A024 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 30023000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r--s- =
=A0 =A0[ shmid=3D0x0 ]
> > 30024000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 30025000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r--s- =
=A0 =A0[ shmid=3D0x0 ]
> > 3005c000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r---- =
=A0ld-2.6.so
> > 3005d000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0ld-2.6.so
> > 3005e000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - ----- =
=A0 =A0[ anon ]
> > 3005f000 =A0 =A08188 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 3085e000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - ----- =
=A0 =A0[ anon ]
> > 3085f000 =A0 =A08188 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 7ff61000 =A0 =A0 332 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ stack ]
> > -------- ------- ------- ------- ------- > > total kB =A0 25084 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - > > The two 8818K segements preceded by a single page w/ 'no access' are > most likely (userspace) NPTL-stacks for two threads (default NPTL > thread stack size is 8M, the lowest 4K are used as guard page so that > an access beyond the bounds of one stack causes a [MMU] exception > instead of overwriting data on the other stack). These stacks are > allocated by calling mmap with MAP_ANON. There is still plenty of > space for other anonymous mappings between the highest used address > (0x3105f000) and the lowest presently used address of the > conventional 'stack segment'. > > Unless I am very much mistaken, this process should certainly be > capable of allocating more virtual memory using either brk/sbrk or > mmap.
When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR and si_addr of 0x2d. What does address 0x2d represent? Is the problem that address 0x2d is not in the ranges shown in pmap?
Bill <jobhunts02@aol.com> writes:

> When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR
Page fault when accessing an unmapped page.
> and si_addr of 0x2d. What does address 0x2d represent?
The address that the program tried to access.
> Is the > problem that address 0x2d is not in the ranges shown in pmap?
Well, sort of. 0x2d isn't in that range because that page isn't mapped. But it's not supposed to be mapped. The first page of virtual memory is always unmapped, so that NULL pointer dereferences generate faults. So it's an address that can't possibly be valid. If the crash is inside malloc, as you said earlier, then most likely some pointer in malloc's data structures got overwritten with 0x0000002d. If you have a core dump, you might be able to trace backwards a little ways to figure out where this pointer itself is located. If you recognize the data around it, it might suggest to you what part of your program could be guilty of overwriting it. (As a start, 0x2d is ASCII '-'. Any part of your program use hyphens?)
Andrew Smallshaw wrote:
> Paul Pluzhnikov <ppluzhnikov-nsp@gmail.com> wrote: > >> Efence adds 1 page guard to every malloc. >> It is very rarely helpful in debugging non-toy applications. > > That may be your experience but personally I find it incredibly > useful for certain classes of problems. Maybe not high-level > stuff or full applications but for low level data structure test > beds I find you can literally do in a morning what may take a > week overwise.
Take a look at the description of the debug facilities in nmalloc.txh. That is part of nmalloc.zip, and is the source for the info documentation of nmalloc. nmalloc, in turn, is almost pure standard C, but relies on the system sbrk() to get mamory space, and makes some (quite usual) assumptions about memory. See: <http://cbfalconer.home.att.net/download/nmalloc.zip> -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section.
Nate Eldredge <nate@vulcan.lan> writes:

>> Is the >> problem that address 0x2d is not in the ranges shown in pmap? > > Well, sort of. 0x2d isn't in that range because that page isn't mapped. > But it's not supposed to be mapped. The first page of virtual memory is > always unmapped, so that NULL pointer dereferences generate faults. So > it's an address that can't possibly be valid. If the crash is inside > malloc, as you said earlier, then most likely some pointer in malloc's > data structures got overwritten with 0x0000002d.
The OP stated that he doesn't actually know that, only deduces this from lack of printed message (which, as John Reiser aptly suggests, may be due to naive use of stdout buffering; where stderr was likely called for). Much more likely than pointer being overwritten is that malloc() in fact returned NULL, and OP then did (an equivalent of): struct Foo *p = malloc(sizeof(Foo)); p->some_field_at_offset_0x2d = 1; Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email.