有没有办法找到哪个部分的进程使用了​​大部分内存,只看一个生成的核心文件? [英] Is there a way to locate which part of the process used the most of the memory, only looking at a generated core file?

查看:154
本文介绍了有没有办法找到哪个部分的进程使用了​​大部分内存,只看一个生成的核心文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个进程(每次都由监视狗启动,因为某种原因停止),通常使用大约200MB的内存。一旦我看到它吃了内存 - 内存使用大约1.5-2GB,这绝对意味着内存泄漏某处(内存泄漏在引号,因为这不是一个真正的内存泄漏,像分配的内存,从来没有释放 and unreachable - 请注意,仅使用智能指针。因此,我想到一些巨大的容器(我没有找到)或类似的东西)

I have a process (that is started by a watch-dog every time, it's stopped for some reason), that uses usually about 200MB memory. Once I saw it's eating up the memory - with memory usage about 1.5-2GB, which definitely means a "memory leak" somewhere ( "memory leak" in quotes, as that is not a real memory leak - like allocated memory, never freed and unreachable - please note, that only smart pointers are used. So, I think about some huge container (I didn't find) or something like this )

后来,进程崩溃了,因为内存使用率很高,并且生成了一个核心转储 - 大约2GB。但问题是,我不能重现的问题,所以 valgrind 将不会帮助这里(我猜)。它很少发生,我不能捕获它。

Later, the process crashed, because of the high memory usage and a core dump was generated - about 2GB. But the problem is, that I can't reproduce the issue, so valgrind won't help here (I guess). It happens very rarely and I can't "catch" it.

所以,我的问题是 - 有一种方式,使用exe和核心文件,找到进程的哪个部分,已经使用了大部分内存?

So, my question is - is there a way, using the exe and the core file, to locate which part of the process, has used most of the memory?

我看了一下核心文件 gdb ,没有什么不寻常的。但核心是大的,所以必须有一些东西。有一个聪明的方式来了解发生了什么,或者只是猜测可能有帮助(但对于这样的大exe ..,12个线程,约50-100(可能更多)类,等等)

I took a look at the core file with gdb, there's nothing unusual. But the core is big, so there must be something. Is there a clever way to understand what has happened, or only guessing may help (but for such big exe.., 12 threads, about 50-100 (may be more) classes, etc, etc. )

这是一个在RHEL5U3上运行的 C ++ 应用程序。

It's a C++ application, running on RHEL5U3.

推荐答案

以十六进制格式(以字节/字/双字/ qword)打开此coredump。从文件的中间开始尝试注意任何重复模式。如果发现任何东西,尝试确定起始地址和一些可能的数据结构的长度。使用此结构的长度和内容,尝试猜测它可能是什么。使用地址,尝试找到一些指向这个结构的指针。重复,直到你来到堆栈或一些全局变量。在堆栈变量的情况下,你很容易知道这个链启动在哪个函数。在全局变量的情况下,你至少知道它的类型。

Open this coredump in hexadecimal format (as bytes/words/dwords/qwords). Starting from the file's middle try to notice any repeating pattern. If anything is found, try to determine starting address and the length of some possible data structure. Using length and contents of this structure, try to guess what might it be. Using the address, try to find some pointer to this structure. Repeat until you come to either stack or some global variable. In case of stack variable, you'll easily know in which function this chain starts. In case of global variable, you know at least its type.

如果你在coredump中找不到任何模式,泄漏的结构很大。

If you cannot find any pattern in the coredump, chances are that leaking structure is very big. Just compare what you see in the file with possible contents of all large structures in the program.

更新

如果您的coredump有有效的调用堆栈,您可以开始检查其功能。搜索任何不寻常的。检查靠近调用堆栈顶部的内存分配是否请求过多。检查调用堆栈函数中是否存在无限循环。

If your coredump has valid call stack, you can start with inspecting its functions. Search for anything unusual. Check if memory allocations near the top of the call stack do not request too much. Check for possible infinite loops in the call stack functions.

词语仅使用智能指针吓唬我。如果这些智能指针的大部分是共享指针(shared_ptr,intrusive_ptr,...),而不是搜索大容器,值得搜索共享指针周期。

Words "only smart pointers are used" frighten me. If significant part of these smart pointers are shared pointers (shared_ptr, intrusive_ptr, ...), instead of searching for huge containers, it is worth to search for shared pointer cycles.

更新2

尝试确定堆在核心文件中的位置( brk 值)。在gdb下运行coredumped进程,并使用 pmap 命令(从其他终端)。 gdb应该也知道这个值,但我不知道如何问它...如果大多数进程的内存高于 brk ,你可以限制你的搜索大内存分配(很可能是std :: vector)。

Try to determine where your heap ends in the corefile (brk value). Run coredumped process under gdb and use pmap command (from other terminal). gdb should also know this value, but I have no idea how to ask it... If most of the process' memory is above brk, you can limit your search by large memory allocations (most likely, std::vector).

为了提高在现有核心转储堆的堆区域中发现泄漏的机会,可以使用一些编码

To improve chances of finding leaks in heap area of the existing coredump, some coding may be used (I didn't do it myself, just a theory):


  • 读取coredump文件,将每个值解释为指针(忽略代码段,未对齐的值,和指向非堆区域的指针)。对列表进行排序,计算相邻元素的差异。

  • 此时,整个内存被分割为许多可能的结构。计算结构大小的直方图,删除任何无意义的值。

  • 计算指针和结构的地址差异,这些指针属于这些指针。对于每个结构大小,计算指针位移的直方图,再次删除任何无意义的值。

  • 现在,您有足够的信息来猜测结构类型或构造结构的有向图。查找此图的源节点和周期。您甚至可以在列表冷内存区域中显示此图表。

  • Read coredump file, interpreting each value as a pointer (ignore code segment, unaligned values, and pointers to non-heap area). Sort the list, calculate differences of adjacent elements.
  • At this point whole memory is split to many possible structures. Compute a histogram of structure's sizes, drop any insignificant values.
  • Calculate difference of addresses of pointers and structures, where these pointers belong. For each structure size, compute a histogram of pointers' displacement, again drop any insignificant values.
  • Now you have enough information to guess structure types or to construct a directed graph of structures. Find source nodes and cycles of this graph. You can even visualize this graph as in "list "cold" memory areas".

Coredump文件采用 elf 格式。只需要从其标题开始和数据段的大小。为了简化过程,只需将其读为线性文件,忽略结构。

Coredump file is in elf format. Only start and size of data segment is needed from its header. To simplify process, just read it as linear file, ignoring structure.

这篇关于有没有办法找到哪个部分的进程使用了​​大部分内存,只看一个生成的核心文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆