内存碎片问题 [英] Memory fragmentation issue

查看:76
本文介绍了内存碎片问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

亲爱的开发社区,


在非托管C ++应用程序中,我们遇到了在多个线程中执行一组任务时增加内存分配时间的问题。每个线程分配大的(1-5 GB)嵌套的std:vector对象,这些对象被发现是瓶颈,因为
每次分配新对象时分配花费的时间更长,即使先前的分配被释放。


我们找到的解决方案是预先为每个线程分配内存对象,并为每次运行重新使用它们(清除),这有助于减少观察到的效果。


我的问题:


A)是否有人有类似的经验,并且是按照预先分配标准解决方案来解决此类问题的方法吗?


<是否有另一种解决这类问题的好方法(例如使用特殊的OS调用以避免内存碎片/使用特殊库)?


如果你有记忆经验使用C ++ / Windows的问题,并有一些有价值的见解,我们很感激。


谢谢,


问候,


Peter

解决方案

一般来说,alloca预先记录所有内存是最好的选择。


有些事情要记住程序的操作,这会使大块内存的重新分配变慢。


1)一旦超过某个阈值,Windows内存管理器将从小块分配策略转变为大块分配策略。这使Windows从使用预分配的块切换到使用VirtualAlloc。 VirtualAlloc分配的内存块
未在虚拟地址空间中正确分配,直到您第一次触摸它为止。这种情况允许地址空间为128TiB的进程每个进程不使用128TiB物理内存。


问题是将内存页面带入虚拟地址空间的速度很慢。因此,如果您没有预先分配向量,那么每次向量重新分配时,它将返回堆分配函数,这将使用VirtualAlloc
重新分配,这将在需要写入时导致延迟初始化新向量时的内存。


2)你应该知道,当向量首次初始化时,它有一个可以写入的内存块,一旦它填满了它将不得不重新分配。这涉及将存储器从当前存储器复制到新分配的存储器块。
如果它可以使用保证的noexcept(即一个noexcept移动构造函数)进行移动,那么向量将使用移动构造函数,否则它将使用复制构造函数。


如果将其置于上一点之上,则重新分配将触发另一块内存与VirtualAlloc一起分配,然后每页内存都必须进入该过程。这需要一段时间,但显然,不止一次这样做
会浪费很多时间。


所以我的答案是:


A)预分配确实是最好的选择。你只需要为每个向量分配一次分配。


B)除此之外唯一的选择是使用池分配器,在那里你将预先分配你需要的所有内存。开始申请。但是这不会给你带来很大的改进,因为你将从中得到的是一个块
的内存分配在几个块上。无论如何,耗费时间,将内存页面带入流程的最重要的事情仍然必须发生。


不幸的是,最大的瓶颈是内核在链接新分配的虚拟内存页时的性能到了物理记忆。除了最大限度地减少执行此操作的次数之外,您无法做太多事情。


Dear development community,

In an unmanaged C++ application we have a problem of increasing memory allocation times when executing a set of tasks within multiple threads. Each thread allocates large (1-5 GB) nested std:vector objects which was found to be the bottleneck, as the allocation takes longer every time we allocate a new object, even if previous allocations are freed.

Our found solution was to allocate the memory objects beforehand for each thread and re-using them (clearing) for each run, which helped to reduce the observed effect.

My questions:

A) Is there anyone who's made similar experiences and is the followed approach of pre-allocation a standard solution to such problems?

B) Is there another good way to approach problems like this (e.g. using special OS calls to avoid memory fragmentation / a special library to use)?

If you're experienced in memory problems using C++ / Windows and have some valuable insights to share we'd appreciate.

Thanks,

Regards,

Peter

解决方案

In general, allocating all of your memory up front is the best option.

There are some things to remember about the operation of programs that makes reallocation of large blocks of memory slow.

1) Once you go over a certain threshold, the Windows memory manager will go from the small block allocation strategy to the large block allocation strategy. This makes Windows switch from using pre-allocated blocks to using VirtualAlloc. The memory block allocated by VirtualAlloc is not properly allocated in the virtual address space until you first touch it. This kind of thing is what allows processes with address spaces of 128TiB not use 128TiB of physical memory per process.

The issue is that bringing the page of memory into the virtual address space is slow. So if you are not pre-allocating the vector then each time the vector re-allocates, it will go back to the heap allocation functions which will re-allocate using VirtualAlloc and this will cause a delay when it needs to write to the memory when it initialises the new vector.

2) As you should know, when a vector first initialises, it has a block of memory that it can write to and once it fills that it will have to re-allocate. This involves copying the memory from the current memory over to the newly allocated block of memory. If it can do a move with a guaranteed noexcept (i.e. a noexcept move constructor) then the vector will use the move constructor, if not then it will use the copy constructor.

If you put this on top of the previous point, then a re-allocation will trigger another block of memory being allocated with VirtualAlloc and each page of memory will then have to be brought into the process. This takes a while, but obviously, doing this more than once will waste a lot of time.

So my answers would be:

A) Pre-allocation is really the best option here. You will only take the allocation hit once per vector.

B) The only option beyond this is to use a pool allocator where you will pre-allocate all of the memory you will need at the start of the application. But this will not give you much of an improvement because all you will get from this is a single block of memory allocation over several blocks. The biggest thing which eats time, bringing the pages of memory into the process, will still have to occur regardless.

Unfortunately, the biggest bottleneck is the kernel's performance when linking freshly allocated virtual memory pages to the physical memory. You can't do much beyond minimising the amount of times you do this.


这篇关于内存碎片问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆