EmbeddedRelated.com
Forums

Memleak in basic system

Started by wimpunk September 22, 2016
Hi,

I'm stuck with a nasty problem.  I'm running an 4.4.13 based kernel with
a busybox on an ARM-system but it seems to leak memory... during the
working hours.
I know it is not the best way to do but I've monitoring the MemFree
value from /proc/meminfo  and I'm losing 2M/hour.  It's not that big but
having 100M memory free makes it crash after two days.  Or at least what
I think makes it crash, I need more monitored data to be sure.

So I'm wondering: how would you monitor such kind of problem?  How to
find out if it is a kernel issue or related to some running program?

Kind regards,

wimpunk.
On 9/22/2016 1:51 AM, wimpunk wrote:
> Hi, > > I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with > a busybox on an ARM-system but it seems to leak memory... during the > working hours.
This suggests there are "non-working hours" (?). Does it incur losses at those times? (Or, is it actually NOT working/running?) What's it *doing* during the working hours? What *might* be calling for additional memory in that time? Is there a persistent store or are you logging to a memory disk, etc.?
> I know it is not the best way to do but I've monitoring the MemFree > value from /proc/meminfo and I'm losing 2M/hour. It's not that big but > having 100M memory free makes it crash after two days. Or at least what > I think makes it crash, I need more monitored data to be sure. > > So I'm wondering: how would you monitor such kind of problem? How to > find out if it is a kernel issue or related to some running program?
Where does proc/meminfo show INCREASING memory usage? When you kill(8) off "your" processes (i.e., anything that was not part of the standard "system"), is the memory recovered correctly by the kernel? Said another way, if you created a cron(8) task to kill off your processes every hour and restart them immediately thereafter, would the problem "go away" (i.e., be limited to a maximum, non-accumulating loss of ~2MB)? Once you know which process(es) are responsible for the loss, you can explore them in greater detail.
On Thu, 22 Sep 2016 10:51:39 +0200, wimpunk wrote:

> Hi, > > I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with > a busybox on an ARM-system but it seems to leak memory... during the > working hours. > I know it is not the best way to do but I've monitoring the MemFree > value from /proc/meminfo and I'm losing 2M/hour. It's not that big but > having 100M memory free makes it crash after two days. Or at least what > I think makes it crash, I need more monitored data to be sure. > > So I'm wondering: how would you monitor such kind of problem? How to > find out if it is a kernel issue or related to some running program? > > Kind regards, > > wimpunk.
Can't you look at memory usage on a task-by-task basis with ps? How about periodically running it, and looking for a task that's blowing up? -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On 09/22/2016 10:51 AM, wimpunk wrote:
> Hi, > > I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with > a busybox on an ARM-system but it seems to leak memory... during the > working hours. > I know it is not the best way to do but I've monitoring the MemFree > value from /proc/meminfo and I'm losing 2M/hour. It's not that big but > having 100M memory free makes it crash after two days. Or at least what > I think makes it crash, I need more monitored data to be sure. > > So I'm wondering: how would you monitor such kind of problem? How to > find out if it is a kernel issue or related to some running program? > > Kind regards, > > wimpunk. >
Not arm here, but old x86, so maybe not helpful: this box (512 Mb ram) has MemFree: 13740 kB file server (128 Mb ram) has MemFree: 2248kB It's the linux vm caching all sorts of stuff in ram. sooner or later forks will fail, or modules might not load. (basically anything that wants a bigger chunk of cont. memory) echo 3 > /proc/sys/vm/drop_caches frees some memory. see if that helps. (use periodically) ...There's another tunable (/proc/sys/vm/user_reserve_kbytes I think) which is supposed to help with that. but in the past I had tried that and module loading would still sometimes fail, and the box would go into swap-storms all the time. Idk... perhaps they fixed that by now...
Johann Klammer <klammerj@NOSPAM.a1.net> wrote:

> On 09/22/2016 10:51 AM, wimpunk wrote: > > Hi, > > > > I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with > > a busybox on an ARM-system but it seems to leak memory... during the > > working hours. > > I know it is not the best way to do but I've monitoring the MemFree > > value from /proc/meminfo and I'm losing 2M/hour. It's not that big but > > having 100M memory free makes it crash after two days. Or at least what > > I think makes it crash, I need more monitored data to be sure. > > > > So I'm wondering: how would you monitor such kind of problem? How to > > find out if it is a kernel issue or related to some running program? > > > > Kind regards, > > > > wimpunk. > > > Not arm here, but old x86, so maybe not helpful: > > this box (512 Mb ram) has MemFree: 13740 kB > file server (128 Mb ram) has MemFree: 2248kB > > It's the linux vm caching all sorts of stuff in ram. > sooner or later forks will fail, or modules might not load. > (basically anything that wants a bigger chunk of cont. memory) > > > echo 3 > /proc/sys/vm/drop_caches > > frees some memory. see if that helps. > (use periodically)
There is no point in doing that. Kernel will automatically drop caches if processes need it. Bye Jack -- Yoda of Borg am I! Assimilated shall you be! Futile resistance is, hmm?
On 22/09/16 10:51, wimpunk wrote:
> Hi, > > I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with > a busybox on an ARM-system but it seems to leak memory... during the > working hours. > I know it is not the best way to do but I've monitoring the MemFree > value from /proc/meminfo and I'm losing 2M/hour. It's not that big but > having 100M memory free makes it crash after two days. Or at least what > I think makes it crash, I need more monitored data to be sure. > > So I'm wondering: how would you monitor such kind of problem? How to > find out if it is a kernel issue or related to some running program? > > Kind regards, > > wimpunk. >
What /exactly/ are you monitoring from /proc/meminfo? If you are looking at MemFree, then you can expect it to go down regularly - once a system has been used for a while, you don't want MemFree to be more than about 10% of the systems memory. Remember, Linux uses free memory for disk cache. It will clear out old disk cache if it needs the memory for something else, but if the memory is not being used for processes, then it is always best to store file data in the spare ram. So if your system is doing nothing but writing logs to the disk, then it will use steadily more memory for disk caching of the log files. It may not be particularly useful to have the log files in cache, but it is more useful than having nothing at all in memory. Your key figure for the memory in use by processes (and therefore the memory that might be leaking), is MemFree - Buffers - Cached.
On 22/09/16 16:38, Don Y wrote:
> On 9/22/2016 1:51 AM, wimpunk wrote: >> Hi, >> >> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >> a busybox on an ARM-system but it seems to leak memory... during the >> working hours. > > This suggests there are "non-working hours" (?). Does it incur losses > at those times? (Or, is it actually NOT working/running?)
It means not between 8 in the morning and 7 in the evening.
> > What's it *doing* during the working hours? What *might* be calling for > additional memory in that time? > > Is there a persistent store or are you logging to a memory disk, etc.? >
We are saving the MemFree on a monitoring server.
>> I know it is not the best way to do but I've monitoring the MemFree >> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >> having 100M memory free makes it crash after two days. Or at least what >> I think makes it crash, I need more monitored data to be sure. >> >> So I'm wondering: how would you monitor such kind of problem? How to >> find out if it is a kernel issue or related to some running program? > > Where does proc/meminfo show INCREASING memory usage? > > When you kill(8) off "your" processes (i.e., anything that was not > part of the standard "system"), is the memory recovered correctly > by the kernel? Said another way, if you created a cron(8) task > to kill off your processes every hour and restart them immediately > thereafter, would the problem "go away" (i.e., be limited to a > maximum, non-accumulating loss of ~2MB)? > > Once you know which process(es) are responsible for the loss, you > can explore them in greater detail. >
Actually, the box is doing nothing, so there is pretty less to kill. There is an ssh server on which we regularly connect to get /proc/meminfo. The contents of MemFree is added to our monitoring system. After monitoring MemFree for two days on two different systems, this is what we got: https://imagebin.ca/v/2w2uH4yCnAGu
On 22/09/16 16:53, Tim Wescott wrote:
> On Thu, 22 Sep 2016 10:51:39 +0200, wimpunk wrote: > >> Hi, >> >> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >> a busybox on an ARM-system but it seems to leak memory... during the >> working hours. >> I know it is not the best way to do but I've monitoring the MemFree >> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >> having 100M memory free makes it crash after two days. Or at least what >> I think makes it crash, I need more monitored data to be sure. >> >> So I'm wondering: how would you monitor such kind of problem? How to >> find out if it is a kernel issue or related to some running program? >> >> Kind regards, >> >> wimpunk. > > Can't you look at memory usage on a task-by-task basis with ps? How > about periodically running it, and looking for a task that's blowing up? >
Hm, didn't know ps could show me the used memory... Been searching, but I only found a way to show the percentage of memory. I don't think that is accurate enough to see much difference.
On 22/09/16 18:02, Johann Klammer wrote:
> On 09/22/2016 10:51 AM, wimpunk wrote: >> Hi, >> >> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >> a busybox on an ARM-system but it seems to leak memory... during the >> working hours. >> I know it is not the best way to do but I've monitoring the MemFree >> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >> having 100M memory free makes it crash after two days. Or at least what >> I think makes it crash, I need more monitored data to be sure. >> >> So I'm wondering: how would you monitor such kind of problem? How to >> find out if it is a kernel issue or related to some running program? >> >> Kind regards, >> >> wimpunk. >> > Not arm here, but old x86, so maybe not helpful: > > this box (512 Mb ram) has MemFree: 13740 kB > file server (128 Mb ram) has MemFree: 2248kB > > It's the linux vm caching all sorts of stuff in ram. > sooner or later forks will fail, or modules might not load. > (basically anything that wants a bigger chunk of cont. memory) > > > echo 3 > /proc/sys/vm/drop_caches > > frees some memory. see if that helps. > (use periodically) > > ...There's another tunable (/proc/sys/vm/user_reserve_kbytes I think) > which is supposed to help with that. but in the past I had tried that and > module loading would still sometimes fail, and the box would go into swap-storms all the time. > Idk... perhaps they fixed that by now... > >
I could use the drop_caches part when monitoring but according to top the caches are pretty stable. I don't think I'm trapped by the kernel cache.
On 22/09/16 21:17, Jack wrote:
> Johann Klammer <klammerj@NOSPAM.a1.net> wrote: > >> On 09/22/2016 10:51 AM, wimpunk wrote: >>> Hi, >>> >>> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >>> a busybox on an ARM-system but it seems to leak memory... during the >>> working hours. >>> I know it is not the best way to do but I've monitoring the MemFree >>> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >>> having 100M memory free makes it crash after two days. Or at least what >>> I think makes it crash, I need more monitored data to be sure. >>> >>> So I'm wondering: how would you monitor such kind of problem? How to >>> find out if it is a kernel issue or related to some running program? >>> >>> Kind regards, >>> >>> wimpunk. >>> >> Not arm here, but old x86, so maybe not helpful: >> >> this box (512 Mb ram) has MemFree: 13740 kB >> file server (128 Mb ram) has MemFree: 2248kB >> >> It's the linux vm caching all sorts of stuff in ram. >> sooner or later forks will fail, or modules might not load. >> (basically anything that wants a bigger chunk of cont. memory) >> >> >> echo 3 > /proc/sys/vm/drop_caches >> >> frees some memory. see if that helps. >> (use periodically) > > There is no point in doing that. > Kernel will automatically drop caches if processes need it. > > Bye Jack >
But I consider it as a good idea. It could have happened I didn't take the cache in count.