有没有办法定位进程的哪一部分使用了最多的内存,只查看生成的核心文件? [英] Is there a way to locate which part of the process used the most of the memory, only looking at a generated core file?

查看:16
本文介绍了有没有办法定位进程的哪一部分使用了最多的内存,只查看生成的核心文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个进程(每次都由看门狗启动,但由于某种原因停止了),通常使用大约 200MB 内存.一旦我看到它正在消耗内存 - 内存使用量约为 1.5-2GB,这绝对意味着某处的内存泄漏"(引号中的内存泄漏",因为这不是真正的内存泄漏 - 就像分配的内存,从未释放 且无法访问 - 请注意,只使用智能指针.所以,我想到了一些巨大的容器(我没有找到)或类似的东西)

I have a process (that is started by a watch-dog every time, it's stopped for some reason), that uses usually about 200MB memory. Once I saw it's eating up the memory - with memory usage about 1.5-2GB, which definitely means a "memory leak" somewhere ( "memory leak" in quotes, as that is not a real memory leak - like allocated memory, never freed and unreachable - please note, that only smart pointers are used. So, I think about some huge container (I didn't find) or something like this )

后来,由于内存使用率高,进程崩溃了,并且生成了一个核心转储 - 大约 2GB.但问题是,我无法重现该问题,因此 valgrind 在这里无济于事(我猜).它很少发生,我无法抓住"它.

Later, the process crashed, because of the high memory usage and a core dump was generated - about 2GB. But the problem is, that I can't reproduce the issue, so valgrind won't help here (I guess). It happens very rarely and I can't "catch" it.

所以,我的问题是 - 有没有办法使用 exe 和核心文件来定位进程的哪个部分使用了大部分内存?

我用gdb查看了core文件,没有什么异常.但是核心很大,所以肯定有东西.有没有聪明的方法来理解发生了什么,或者只有猜测可能会有所帮助(但是对于这么大的 exe..,12 个线程,大约 50-100 个(可能更多)类等)

I took a look at the core file with gdb, there's nothing unusual. But the core is big, so there must be something. Is there a clever way to understand what has happened, or only guessing may help (but for such big exe.., 12 threads, about 50-100 (may be more) classes, etc, etc. )

这是一个 C++ 应用程序,在 RHEL5U3 上运行.

It's a C++ application, running on RHEL5U3.

推荐答案

以十六进制格式(如字节/单词/dwords/qwords)打开此核心转储.从文件的中间开始尝试注意任何重复的模式.如果找到任何东西,尝试确定一些可能的数据结构的起始地址和长度.使用这个结构的长度和内容,尝试猜测它可能是什么.使用地址,尝试找到指向该结构的指针.重复直到到达堆栈或某个全局变量.如果是堆栈变量,您将很容易知道该链从哪个函数开始.如果是全局变量,你至少知道它的类型.

Open this coredump in hexadecimal format (as bytes/words/dwords/qwords). Starting from the file's middle try to notice any repeating pattern. If anything is found, try to determine starting address and the length of some possible data structure. Using length and contents of this structure, try to guess what might it be. Using the address, try to find some pointer to this structure. Repeat until you come to either stack or some global variable. In case of stack variable, you'll easily know in which function this chain starts. In case of global variable, you know at least its type.

如果您在 coredump 中找不到任何模式,则可能是泄漏结构非常大.只需将您在文件中看到的内容与程序中所有大型结构的可能内容进行比较即可.

If you cannot find any pattern in the coredump, chances are that leaking structure is very big. Just compare what you see in the file with possible contents of all large structures in the program.

更新

如果您的 coredump 具有有效的调用堆栈,您可以从检查其功能开始.寻找任何不寻常的东西.检查调用堆栈顶部附近的内存分配是否没有太多请求.检查调用堆栈函数中可能存在的无限循环.

If your coredump has valid call stack, you can start with inspecting its functions. Search for anything unusual. Check if memory allocations near the top of the call stack do not request too much. Check for possible infinite loops in the call stack functions.

只使用智能指针"这句话吓到我了.如果这些智能指针的很大一部分是共享指针(shared_ptr、intrusive_ptr、...),那么与其搜索巨大的容器,不如搜索共享指针循环.

Words "only smart pointers are used" frighten me. If significant part of these smart pointers are shared pointers (shared_ptr, intrusive_ptr, ...), instead of searching for huge containers, it is worth to search for shared pointer cycles.

更新 2

尝试确定堆在核心文件中的结束位置(brk 值).在 gdb 下运行 coredumped 进程并使用 pmap 命令(从其他终端).gdb 也应该知道这个值,但我不知道如何问它...如果大部分进程的内存都高于 brk,您可以通过大内存分配来限制搜索(很可能,std::vector).

Try to determine where your heap ends in the corefile (brk value). Run coredumped process under gdb and use pmap command (from other terminal). gdb should also know this value, but I have no idea how to ask it... If most of the process' memory is above brk, you can limit your search by large memory allocations (most likely, std::vector).

为了提高在现有 coredump 的堆区域中发现泄漏的机会,可以使用一些编码(我自己没有这样做,只是一个理论):

To improve chances of finding leaks in heap area of the existing coredump, some coding may be used (I didn't do it myself, just a theory):

  • 读取 coredump 文件,将每个值解释为一个指针(忽略代码段、未对齐的值和指向非堆区域的指针).对列表进行排序,计算相邻元素的差异.
  • 此时整个内存被分割成许多可能的结构.计算结构大小的直方图,删除所有无关紧要的值.
  • 计算指针和结构的地址差异,这些指针所属的位置.对于每个结构大小,计算指针位移的直方图,再次删除任何不重要的值.
  • 现在您有足够的信息来猜测结构类型或构建结构的有向图.查找此图的源节点和循环.您甚至可以在"列出冷"内存区域"中可视化该图表.
  • Read coredump file, interpreting each value as a pointer (ignore code segment, unaligned values, and pointers to non-heap area). Sort the list, calculate differences of adjacent elements.
  • At this point whole memory is split to many possible structures. Compute a histogram of structure's sizes, drop any insignificant values.
  • Calculate difference of addresses of pointers and structures, where these pointers belong. For each structure size, compute a histogram of pointers' displacement, again drop any insignificant values.
  • Now you have enough information to guess structure types or to construct a directed graph of structures. Find source nodes and cycles of this graph. You can even visualize this graph as in "list "cold" memory areas".

Coredump 文件是 elf 格式.从其标头中只需要数据段的开始和大小.为简化流程,直接将其读取为线性文件,忽略结构.

Coredump file is in elf format. Only start and size of data segment is needed from its header. To simplify process, just read it as linear file, ignoring structure.

这篇关于有没有办法定位进程的哪一部分使用了最多的内存,只查看生成的核心文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆