EmbeddedRelated.com
Forums
Memfault State of IoT Report

Memleak in basic system

Started by wimpunk September 22, 2016
On 23/09/16 09:18, David Brown wrote:
> On 22/09/16 10:51, wimpunk wrote: >> Hi, >> >> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >> a busybox on an ARM-system but it seems to leak memory... during the >> working hours. >> I know it is not the best way to do but I've monitoring the MemFree >> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >> having 100M memory free makes it crash after two days. Or at least what >> I think makes it crash, I need more monitored data to be sure. >> >> So I'm wondering: how would you monitor such kind of problem? How to >> find out if it is a kernel issue or related to some running program? >> >> Kind regards, >> >> wimpunk. >> > > What /exactly/ are you monitoring from /proc/meminfo? If you are > looking at MemFree, then you can expect it to go down regularly - once a > system has been used for a while, you don't want MemFree to be more than > about 10% of the systems memory. Remember, Linux uses free memory for > disk cache. It will clear out old disk cache if it needs the memory for > something else, but if the memory is not being used for processes, then > it is always best to store file data in the spare ram.
True, I know it's not a good idea to monitor MemFree, it is influenced by tmpfs and caches. But I also wached the contents of my tmpfs and caches and they almost don't change. On the system there's my ssh server listening and logging to the busybox version of syslog which logs to memory.
> > So if your system is doing nothing but writing logs to the disk, then it > will use steadily more memory for disk caching of the log files. It may > not be particularly useful to have the log files in cache, but it is > more useful than having nothing at all in memory.
It logs to memory but as far as I understand, it just allocates its circular buffer at start and doesn't allocate more memory while running
> > Your key figure for the memory in use by processes (and therefore the > memory that might be leaking), is MemFree - Buffers - Cached. >
Do you have any suggestion on how to find out how much is used per process?
On 23/09/16 10:44, wimpunk wrote:
> On 22/09/16 21:17, Jack wrote: >> Johann Klammer <klammerj@NOSPAM.a1.net> wrote: >> >>> On 09/22/2016 10:51 AM, wimpunk wrote: >>>> Hi, >>>> >>>> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >>>> a busybox on an ARM-system but it seems to leak memory... during the >>>> working hours. >>>> I know it is not the best way to do but I've monitoring the MemFree >>>> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >>>> having 100M memory free makes it crash after two days. Or at least what >>>> I think makes it crash, I need more monitored data to be sure. >>>> >>>> So I'm wondering: how would you monitor such kind of problem? How to >>>> find out if it is a kernel issue or related to some running program? >>>> >>>> Kind regards, >>>> >>>> wimpunk. >>>> >>> Not arm here, but old x86, so maybe not helpful: >>> >>> this box (512 Mb ram) has MemFree: 13740 kB >>> file server (128 Mb ram) has MemFree: 2248kB >>> >>> It's the linux vm caching all sorts of stuff in ram. >>> sooner or later forks will fail, or modules might not load. >>> (basically anything that wants a bigger chunk of cont. memory) >>> >>> >>> echo 3 > /proc/sys/vm/drop_caches >>> >>> frees some memory. see if that helps. >>> (use periodically) >> >> There is no point in doing that. >> Kernel will automatically drop caches if processes need it. >> >> Bye Jack >> > > But I consider it as a good idea. It could have happened I didn't take > the cache in count. >
No, it is not a good idea to drop the caches manually - it is extremely rare that this is useful outside of disk benchmarking. It /is/ a good idea to take them into account when monitoring memory, of course.
On 23/09/16 10:50, wimpunk wrote:
> On 23/09/16 09:18, David Brown wrote: >> On 22/09/16 10:51, wimpunk wrote: >>> Hi, >>> >>> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >>> a busybox on an ARM-system but it seems to leak memory... during the >>> working hours. >>> I know it is not the best way to do but I've monitoring the MemFree >>> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >>> having 100M memory free makes it crash after two days. Or at least what >>> I think makes it crash, I need more monitored data to be sure. >>> >>> So I'm wondering: how would you monitor such kind of problem? How to >>> find out if it is a kernel issue or related to some running program? >>> >>> Kind regards, >>> >>> wimpunk. >>> >> >> What /exactly/ are you monitoring from /proc/meminfo? If you are >> looking at MemFree, then you can expect it to go down regularly - once a >> system has been used for a while, you don't want MemFree to be more than >> about 10% of the systems memory. Remember, Linux uses free memory for >> disk cache. It will clear out old disk cache if it needs the memory for >> something else, but if the memory is not being used for processes, then >> it is always best to store file data in the spare ram. > > True, I know it's not a good idea to monitor MemFree, it is influenced > by tmpfs and caches. But I also wached the contents of my tmpfs and > caches and they almost don't change. On the system there's my ssh > server listening and logging to the busybox version of syslog which logs > to memory. > >> >> So if your system is doing nothing but writing logs to the disk, then it >> will use steadily more memory for disk caching of the log files. It may >> not be particularly useful to have the log files in cache, but it is >> more useful than having nothing at all in memory. > > It logs to memory but as far as I understand, it just allocates its > circular buffer at start and doesn't allocate more memory while running > >> >> Your key figure for the memory in use by processes (and therefore the >> memory that might be leaking), is MemFree - Buffers - Cached. >> > > Do you have any suggestion on how to find out how much is used per process? >
Maybe look at /proc/<x>/status? The VmPeak and VmSize lines would be of particular interest.
On 09/23/2016 10:50 AM, wimpunk wrote:
> > Do you have any suggestion on how to find out how much is used per process? > >
top can do that. it's the VIRT top f{upupupup}s{ESC}
On 9/23/2016 1:35 AM, wimpunk wrote:
>>> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >>> a busybox on an ARM-system but it seems to leak memory... during the >>> working hours. >> >> This suggests there are "non-working hours" (?). Does it incur losses >> at those times? (Or, is it actually NOT working/running?) > > It means not between 8 in the morning and 7 in the evening.
My point was that there is some aspect of "working" vs "nonworking" that the device responds to. It's either doing something during "working hours" that it is NOT doing during "nonworking hours", or /vice versa/.
>> What's it *doing* during the working hours? What *might* be calling for >> additional memory in that time? >> >> Is there a persistent store or are you logging to a memory disk, etc.? > > We are saving the MemFree on a monitoring server.
Is anything on the device expecting to "save stuff" anywhere "locally"? Log files, "black boxes", etc. Memory doesn't get consumed unless something is being SAVED.
>>> I know it is not the best way to do but I've monitoring the MemFree >>> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >>> having 100M memory free makes it crash after two days. Or at least what >>> I think makes it crash, I need more monitored data to be sure. >>> >>> So I'm wondering: how would you monitor such kind of problem? How to >>> find out if it is a kernel issue or related to some running program? >> >> Where does proc/meminfo show INCREASING memory usage? >> >> When you kill(8) off "your" processes (i.e., anything that was not >> part of the standard "system"), is the memory recovered correctly >> by the kernel? Said another way, if you created a cron(8) task >> to kill off your processes every hour and restart them immediately >> thereafter, would the problem "go away" (i.e., be limited to a >> maximum, non-accumulating loss of ~2MB)? >> >> Once you know which process(es) are responsible for the loss, you >> can explore them in greater detail. > > Actually, the box is doing nothing, so there is pretty less to kill. > There is an ssh server on which we regularly connect to get > /proc/meminfo. The contents of MemFree is added to our monitoring system. > After monitoring MemFree for two days on two different systems, this is > what we got: https://imagebin.ca/v/2w2uH4yCnAGu
So, you lost 10 *K* in two days?? That's considerably different from 2 *M* /hour... Or, is the legend in your graph (above) inaccurate: "It's not that big but having 100M memory free makes it crash after two days. ---------------------------------^ At the end of the aforementioned graph ("two days"), it appears there is 90 *K* left "free". Did it crash at that time? Presumably there are two hosts involved (karo107 and karo209). Are they identical? Had they been running for the same length of time at the start of the graph? I.e., why the ~6K difference between them? What's happening at 18:00 on karo107 that looks like a sudden memory release (but that is not apparent on karo209)? What other values reported by meminfo are changing alongside "memfree"? (memory is "conserved" so if it's not "free" its being "used", somehow; how does meminfo claim it is being used? This will give you a clue as to WHERE it is being used and how/if it can be reclaimed.)
On 09/23/16 08:41, wimpunk wrote:
> On 22/09/16 16:53, Tim Wescott wrote: >> On Thu, 22 Sep 2016 10:51:39 +0200, wimpunk wrote: >> >>> Hi, >>> >>> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >>> a busybox on an ARM-system but it seems to leak memory... during the >>> working hours. >>> I know it is not the best way to do but I've monitoring the MemFree >>> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >>> having 100M memory free makes it crash after two days. Or at least what >>> I think makes it crash, I need more monitored data to be sure. >>> >>> So I'm wondering: how would you monitor such kind of problem? How to >>> find out if it is a kernel issue or related to some running program? >>> >>> Kind regards, >>> >>> wimpunk. >> >> Can't you look at memory usage on a task-by-task basis with ps? How >> about periodically running it, and looking for a task that's blowing up? >> > > Hm, didn't know ps could show me the used memory... > Been searching, but I only found a way to show the percentage of memory. > I don't think that is accurate enough to see much difference.
Can you run top, which shows individual process memory dynamically ?... Regards, Chris
wimpunk <wimpunk+news@gmail.com> wrote:

> So I'm wondering: how would you monitor such kind of problem? How to > find out if it is a kernel issue or related to some running program?
Run the application on a desktop computer and use valgrind, LeakSanitizer or any of the other monitoring tools available. -a
On 22/09/16 10:51, wimpunk wrote:
> Hi, > > I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with > a busybox on an ARM-system but it seems to leak memory... during the > working hours. > I know it is not the best way to do but I've monitoring the MemFree > value from /proc/meminfo and I'm losing 2M/hour. It's not that big but > having 100M memory free makes it crash after two days. Or at least what > I think makes it crash, I need more monitored data to be sure. > > So I'm wondering: how would you monitor such kind of problem? How to > find out if it is a kernel issue or related to some running program? > > Kind regards, > > wimpunk. >
All, Thanks for the suggestions. It looks like it was just some normal linux actions which were going on. Letting the system run for a week showed us no memleak. MemFree just goes a little up and down and we were worrying to early. So I was just panicking a little to early. Now can add our normal programs and try to find out which one is leaking. We got issues on running systems so that's why I started to look at my basic system in the first place. Kind regards, wimpunk.
On 28.9.16 19:24, wimpunk wrote:
> On 22/09/16 10:51, wimpunk wrote: >> Hi, >> >> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >> a busybox on an ARM-system but it seems to leak memory... during the >> working hours. >> I know it is not the best way to do but I've monitoring the MemFree >> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >> having 100M memory free makes it crash after two days. Or at least what >> I think makes it crash, I need more monitored data to be sure. >> >> So I'm wondering: how would you monitor such kind of problem? How to >> find out if it is a kernel issue or related to some running program? >> >> Kind regards, >> >> wimpunk. >> > > > All, > > Thanks for the suggestions. It looks like it was just some normal linux > actions which were going on. Letting the system run for a week showed > us no memleak. MemFree just goes a little up and down and we were > worrying to early. So I was just panicking a little to early. Now can > add our normal programs and try to find out which one is leaking. We > got issues on running systems so that's why I started to look at my > basic system in the first place. > > Kind regards, > > wimpunk. >
Get yourself a good book on Linux kernel and read the chapters about memory management, so you'll understand. Unused RAM is excess RAM. One possible book is: Understanding the Linux Kernel. -- -TV
On 28/09/16 20:03, Tauno Voipio wrote:
> On 28.9.16 19:24, wimpunk wrote: >> On 22/09/16 10:51, wimpunk wrote: >>> Hi, >>> >>> I'm stuck with a nasty problem. I'm running an 4.4.13 based kernel with >>> a busybox on an ARM-system but it seems to leak memory... during the >>> working hours. >>> I know it is not the best way to do but I've monitoring the MemFree >>> value from /proc/meminfo and I'm losing 2M/hour. It's not that big but >>> having 100M memory free makes it crash after two days. Or at least what >>> I think makes it crash, I need more monitored data to be sure. >>> >>> So I'm wondering: how would you monitor such kind of problem? How to >>> find out if it is a kernel issue or related to some running program? >>> >>> Kind regards, >>> >>> wimpunk. >>> >> >> >> All, >> >> Thanks for the suggestions. It looks like it was just some normal linux >> actions which were going on. Letting the system run for a week showed >> us no memleak. MemFree just goes a little up and down and we were >> worrying to early. So I was just panicking a little to early. Now can >> add our normal programs and try to find out which one is leaking. We >> got issues on running systems so that's why I started to look at my >> basic system in the first place. >> >> Kind regards, >> >> wimpunk. >> > > Get yourself a good book on Linux kernel and read the chapters about > memory management, so you'll understand. Unused RAM is excess RAM. > > One possible book is: Understanding the Linux Kernel. >
Nah, I just needed more coffee and so I would watch the correct values. I made the wrong associations and thought the problem was bigger because of the scaling of the monitoring system. Although, thanks for the suggestion about the book. Next thing on my shortlist is fixing a bug in a usb wifi kernel module.

Memfault State of IoT Report