跟踪C ++中的内存使用情况并评估内存消耗 [英] Track Memory Usage in C++ and evaluate memory consumption

查看:298
本文介绍了跟踪C ++中的内存使用情况并评估内存消耗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了我的代码的以下问题:我使用Valgrind和gperftools执行堆检查和堆分析,以查看是否释放了我分配的所有内存。这些工具的输出看起来不错,似乎我没有失去记忆。然而,当我看到顶部和输出 ps 我很困惑,因为这基本上不代表





    $ b

    li>热门报告: RES 150M
  • Valgrind(Massif)报告: 23M li> gperftools Heap Profiler报告: 22.7M 高峰使用



我的问题是,来自?我试图跟踪在Valgrind的堆栈使用,但没有任何成功。



一些更多详细信息:




  • 此过程基本上是从mysql加载数据将C api转移到内存存储中

  • 执行泄漏检查并在加载完成后不久断开,显示最终丢失144个字节,并且达到10M可达,

  • 该库不执行复杂的IPC,它启动几个线程,但只有一个线程正在执行工作

  • 加载其他复杂系统库

  • 来自/ proc / pid / smaps的PSS大小对应于TOP和ps中的RES大小



你有什么想法,报告内存消耗的这种差异来自哪里?如何验证我的程序运行正常?

解决方案

最后,我能够解决问题,并乐意分享这个问题我的发现。一般来说,从我的角度评估程序的内存消耗的最佳工具是Valgrind的 Massif 工具。



要剖析应用程序运行的堆 valgrind --tool = massif prog 现在,这将让你基本访问所有有关典型内存分配函数的信息,如 malloc 和朋友。然而,为了深入我激活选项 - pages-as-heap = yes ,然后它将报告甚至底层系统调用的信息。这里给出一个例子是来自我的分析会话:

  67 1,284,382,720 978,575,360 978,575,360 0 0 
100.00%(978,575,360
- > 87.28%(854,118,400B)0x8282419:mmap(syscall-template.S:82)
(页分配系统调用)mmap / mremap / brk,--alloc- | - > 84.80%(829,849,600B)0x821DF7D:_int_malloc(malloc.c:3226)
| | - > 84.36%(825,507,840B)0x821E49F:_int_memalign(malloc.c:5492)
| | | - > 84.36%(825,507,840B)0x8220591:memalign(malloc.c:3880)
| | | - > 84.36%(825,507,840B)0x82217A7:posix_memalign(malloc.c:6315)
| | | - > 83.37%(815,792,128B)0x4C74F9B:std :: _Rb_tree_node< std :: pair< std :: string const,unsigned int> > * std :: _ Rb_tree< std :: string,std :: pair< std :: string const,unsigned int>,std :: _ Select1st< std :: pair< std :: string const,unsigned int& >,std :: less< std :: string> ;, StrategizedAllocator< std :: pair< std :: string const,unsigned int>,MemalignStrategy< 4096> > > :: _ M_create_node< std :: pair< std :: string,unsigned int> >(std :: pair< std :: string,unsigned int>&&)(MemalignStrategy.h:13)
| | | | - > 83.37%(815,792,128B)0x4C7529F:OrderIndifferentDictionary< std :: string,MemalignStrategy< 4096> ;, StrategizedAllocator> :: addValue(std :: string)(stl_tree.h:961)
| | | | - > 83.37%(815,792,128B)0x5458DC9:var_to_string(char ***,unsigned long,unsigned long,AbstractTable *)(AbstractTable.h:341)
| | | | - > 83.37%(815,792,128B)0x545A466:MySQLInput :: load(std :: shared_ptr< AbstractTable>,std :: vector< std :: vector< ColumnMetadata *,std :: allocator< ColumnMetadata *> std :: allocator< std :: vector< ColumnMetadata *,std :: allocator< ColumnMetadata *> *>> const *,Loader :: params const&)(MySQLLoader.cpp:161)
| | | | - > 83.37%(815,792,128B)0x54628F2:Loader :: load(Loader :: params const&)(Loader.cpp:133)
| | | | - > 83.37%(815,792,128B)0x4F6B487:MySQLTableLoad :: executePlanOperation()(MySQLTableLoad.cpp:60)
| | | | - > 83.37%(815,792,128B)0x4F8F8F1:_PlanOperation :: execute_throws()(PlanOperation.cpp:221)
| | | | - > 83.37%(815,792,128B)0x4F92B08:_PlanOperation :: execute()(PlanOperation.cpp:262)
| | | | - > 83.37%(815,792,128B)0x4F92F00:_PlanOperation :: operator()()(PlanOperation.cpp:204)
| | | | - > 83.37%(815,792,128B)0x656F9B0:TaskQueue :: executeTask()(TaskQueue.cpp:88)
| | | | - > 83.37%(815,792,128B)0x7A70AD6:? (在/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| | | | - > 83.37%(815,792,128B)0x6BAEEFA:start_thread(pthread_create.c:304)
| | | | - > 83.37%(815,792,128B)0x8285F4B:克隆(克隆S:112)
| | | |
| | | - > 00.99%(9,715,712B)在1+位置,全部低于ms_print的阈值(01.00%)
| | |
| | - > 00.44%(4,341,760B)在1+位,全部低于ms_print的阈值(01.00%)

正如你可以看到〜85%的内存分配来自单个分支,现在的问题是为什么内存消耗如此之高,如果原始堆分析显示正常消耗。如果你看看这个例子,你会明白为什么。对于分配,我使用 posix_memalign 确保分配发生在有用的边界。然后,该分配器从外部类传递到内部成员变量(在这种情况下是一个映射),以使用分配器进行堆分配。然而,我选择的边界太大了 - 4096 - 在我的情况。这意味着,您将使用 posix_memalign 分配4b,但系统将分配一个完整的页面,以使其正确对齐。如果你现在分配许多小的值,你会得到大量的未使用的内存。这个内存不会被正常的堆分析工具报告,因为你只分配了一部分内存,但是系统分配例程会分配更多的内存,并隐藏其余内容。



解决这个问题,我切换到一个较小的边界,从而可以大大减少内存开销。



作为我在Massif& Co.我只能推荐使用这个工具进行深度剖析,因为它给你一个很好的理解发生了什么,并容易跟踪错误。对于使用 posix_memalign 的情况有所不同。有些情况下真的有必要,但在大多数情况下,你只需要使用正常的 malloc 即可。


I came across the following problem with my code: I was using Valgrind and gperftools to perform heap checking and heap profiling to see if I release all the memory that I allocate. The output of these tools look good and it seems I'm not loosing memory. However, when I'm looking at top and the output of ps I'm confused because this basically does not represent what I'm observing with valgrind and gperftools.

Here are the numbers:

  • Top reports: RES 150M
  • Valgrind (Massif) reports: 23M peak usage
  • gperftools Heap Profiler reports: 22.7M peak usage

My question is now, where does the difference come from? I tried as well to track the stack usage in Valgrind but without any success.

Some more details:

  • The process is basically loading data from mysql via the C api to an in-memory storage
  • Performing a leak check and breaking shortly after the loading is done, shows a definitive lost of 144 bytes, and 10M reachable, wich fits the amount that is currently allocated
  • The library performs no complex IPC, it starts a few threads but only one of the threads is executing the work
  • It does not load other complex system libraries
  • the PSS size from /proc/pid/smaps corresponds to the RES size in TOP and ps

Do you have any ideas, where this difference in reported memory consumption comes from? How can I validate that my program is behaving correctly? Do you have any ideas how I could further investigate this issue?

解决方案

Finally I was able to solve the problem and will happily share my findings. In general the best tool to evaluate memory consumption of a program from my perspective is the Massif tool from Valgrind. it allows you to profile the heap consumption and gives you a detailed analysis.

To profile the heap of your application run valgrind --tool=massif prog now, this will give you basic access to all information about the typical memory allocation functions like malloc and friends. However, to dig deeper I activated the option --pages-as-heap=yes which will then report even the information about the underlaying system calls. To given an example here is something from my profiling session:

 67  1,284,382,720      978,575,360      978,575,360             0            0
100.00% (978,575,360B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->87.28% (854,118,400B) 0x8282419: mmap (syscall-template.S:82)
| ->84.80% (829,849,600B) 0x821DF7D: _int_malloc (malloc.c:3226)
| | ->84.36% (825,507,840B) 0x821E49F: _int_memalign (malloc.c:5492)
| | | ->84.36% (825,507,840B) 0x8220591: memalign (malloc.c:3880)
| | |   ->84.36% (825,507,840B) 0x82217A7: posix_memalign (malloc.c:6315)
| | |     ->83.37% (815,792,128B) 0x4C74F9B: std::_Rb_tree_node<std::pair<std::string const, unsigned int> >* std::_Rb_tree<std::string, std::pair<std::string const, unsigned int>, std::_Select1st<std::pair<std::string const, unsigned int> >, std::less<std::string>, StrategizedAllocator<std::pair<std::string const, unsigned int>, MemalignStrategy<4096> > >::_M_create_node<std::pair<std::string, unsigned int> >(std::pair<std::string, unsigned int>&&) (MemalignStrategy.h:13)
| | |     | ->83.37% (815,792,128B) 0x4C7529F: OrderIndifferentDictionary<std::string, MemalignStrategy<4096>, StrategizedAllocator>::addValue(std::string) (stl_tree.h:961)
| | |     |   ->83.37% (815,792,128B) 0x5458DC9: var_to_string(char***, unsigned long, unsigned long, AbstractTable*) (AbstractTable.h:341)
| | |     |     ->83.37% (815,792,128B) 0x545A466: MySQLInput::load(std::shared_ptr<AbstractTable>, std::vector<std::vector<ColumnMetadata*, std::allocator<ColumnMetadata*> >*, std::allocator<std::vector<ColumnMetadata*, std::allocator<ColumnMetadata*> >*> > const*, Loader::params const&) (MySQLLoader.cpp:161)
| | |     |       ->83.37% (815,792,128B) 0x54628F2: Loader::load(Loader::params const&) (Loader.cpp:133)
| | |     |         ->83.37% (815,792,128B) 0x4F6B487: MySQLTableLoad::executePlanOperation() (MySQLTableLoad.cpp:60)
| | |     |           ->83.37% (815,792,128B) 0x4F8F8F1: _PlanOperation::execute_throws() (PlanOperation.cpp:221)
| | |     |             ->83.37% (815,792,128B) 0x4F92B08: _PlanOperation::execute() (PlanOperation.cpp:262)
| | |     |               ->83.37% (815,792,128B) 0x4F92F00: _PlanOperation::operator()() (PlanOperation.cpp:204)
| | |     |                 ->83.37% (815,792,128B) 0x656F9B0: TaskQueue::executeTask() (TaskQueue.cpp:88)
| | |     |                   ->83.37% (815,792,128B) 0x7A70AD6: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| | |     |                     ->83.37% (815,792,128B) 0x6BAEEFA: start_thread (pthread_create.c:304)
| | |     |                       ->83.37% (815,792,128B) 0x8285F4B: clone (clone.S:112)
| | |     |                         
| | |     ->00.99% (9,715,712B) in 1+ places, all below ms_print's threshold (01.00%)
| | |     
| | ->00.44% (4,341,760B) in 1+ places, all below ms_print's threshold (01.00%)

As you can see ~85% of my memory allocation come from a single branch and the question is now why the memory consumption is so high, if the original heap profiling showed a normal consumption. If you look at the example you will see why. For allocation I used posix_memalign to make sure allocations happen to useful boundaries. This allocator was then passed down from the outer class to the inner member variables (a map in this case) to use the allocator for heap allocation. However, the boundary I choose was too large - 4096 - in my case. This means, you will allocate 4b using posix_memalign but the system will allocate a full page for you to align it correctly. If you now allocate many small values you will end up with lots of unused memory. This memory will not be reported by normal heap profiling tools since you allocate only a fraction of this memory, but the system allocation routines will allocate more and hide the rest.

To solve this problem, I switched to a smaller boundary and thus could drastically reduce the memory overhead.

As a conclusion of my hours spent in front of Massif & Co. I can only recommend to use this tool for deep profiling since it gives you a very good understanding of what is happening and allows tracking errors easily. For the use of posix_memalign the situation is different. There are cases where it is really necessary, however, for most cases you will just fine with a normal malloc.

这篇关于跟踪C ++中的内存使用情况并评估内存消耗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆