智能关于矢量内存分配 [英] Being Smart About Vector Memory Allocation

查看:82
本文介绍了智能关于矢量内存分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我必须遍历一个潜在的非常大的数字向量,并将偶数和奇数元素复制到新的单独向量中。 (源向量可以具有任何比例的evens对赔率;它可以是所有evens,所有赔率,或者之间的某个地方。)

Let's say I have to iterate over a potentially very large vector of numbers and copy the even and odd elements into new, separate vectors. (The source vector may have any proportion of evens to odds; it could be all evens, all odds, or somewhere in-between.)

为简单起见, push_back 通常用于这类事情:

For simplicity, push_back is often used for this sort of thing:

for (std::size_t Index; Index < Source.size(); Index++)
{
    if (Source[Index] % 2) Odds.push_back(Source[Index]);
    else Evens.push_back(Source[Index]);
}

但是,我担心这将是低效的,作为实现的一部分,像排序算法,性能是至关重要的。例如,QuickSort涉及分隔这样的元素。

However, I'm worried that this will be inefficient and be harmful if it's used as part of the implementation for something like a sorting algorithm, where performance is paramount. QuickSort, for example, involves separating elements much like this.

您可以使用 reserve() -hand,所以只需要一个分配,但是你必须遍历整个源向量两次 - 一次计算需要排序多少个元素,并再次为实际复制。

You could use reserve() to allocate memory before-hand so only one allocation is needed, but then you have to iterate over the entire source vector twice - once to count how many elements will need to be sorted out, and once more for the actual copying.

当然,你可以分配与源向量的大小相同的空间,因为两个新向量都不需要拥有更多的空间,但这看起来有点浪费。

You could, of course, allocate the same amount of space as the source vector's size, since neither new vector will need to hold more than that, but that seems somewhat wasteful.

有没有更好的方法,我错过了?是 push_back()通常被信任管理这种事情的程序员,或者它会变得麻烦敏感的算法?

Is there a better method that I'm missing? Is push_back() usually trusted to manage this sort of thing for the programmer, or can it become burdensome for sensitive algorithms?

推荐答案

我要回答我认为你真的要问的问题,这是应该 push_back()避免在内部循环的重算法?而不是别人似乎已经读入你的帖子,这是如果我调用push_back之前做一个大的向量无关的排序重要吗?此外,我将从我的经验中回答,而不是花时间追逐引文和同行评审的文章。

I'm going to answer the question I think you really meant to ask, which is "should push_back() be avoided in the inner loops of heavy algorithms?" rather than what others seem to have read into your post, which is "does it matter if I call push_back before doing an unrelated sort on a large vector?" Also, I'm going to answer from my experience rather than spend time chasing down citations and peer-reviewed articles.

你的例子基本上是做两件事,总CPU成本:它读取和操作输入向量中的元素,然后必须将元素插入到输出向量中。您关心插入元素的成本,因为:

Your example is basically doing two things that add up to the total CPU cost: it's reading and operating on elements in the input vector, and then it has to insert the elements into the output vector. You're concerned about the cost of inserting elements because:


  1. push_back为一个附加元素预留的空间,但是当你用完剩余空间时速度慢。

  2. 分配内存是很昂贵的( malloc )只是缓慢的,即使小学生假装是不同的)

  3. 在重新分配后将向量的数据从一个区域复制到另一个区域也很慢 :当push_back()发现它没有足够的空间时,它必须去分配一个更大的向量,然后复制所有元素。 (在理论上,对于具有大量OS页面的向量,STL的神奇实现可以使用VMM在虚拟地址空间中移动它们,而不在实际中复制我从来没有见过一个可以。)

  4. 过度分配输出向量导致问题:它导致碎片,使未来的分配变慢;它烧毁数据缓存,使一切都更慢;

  5. 分配输出向量会导致问题,因为重新分配向量是一个O (n)操作,因此重新分配它 m 次是O(m× n)。如果STL的默认分配器使用指数重新分配(使向量的保留是每次重新分配时的先前大小的两倍),这使得您的线性算法O(n + n log m)。

  1. push_back() is constant time (instantaneous, really) when a vector has enough space pre-reserved for an additional element, but slow when you've run out of reserved space.
  2. Allocating memory is costly (malloc() is just slow, even when pedants pretend that new is something different)
  3. Copying a vector's data from one region to another after reallocation is also slow: when push_back() finds it hasn't got enough space, it has to go and allocate a bigger vector, then copy all the elements. (In theory, for vectors that are many OS pages in size, a magic implementation of the STL could use the VMM to move them around in the virtual address space without copying — in practice I've never seen one that could.)
  4. Over-allocating the output vectors causes problems: it causes fragmentation, making future allocations slower; it burns data cache, making everything slower; if persistent, it ties up scarce free memory, leading to disk paging on a PC and a crash on embedded platforms.
  5. Under-allocating the output vectors causes problems because reallocating a vector is an O(n) operation, so reallocating it m times is O(m×n). If the STL's default allocator uses exponential reallocation (making the vector's reserve twice its previous size every time you realloc), that makes your linear algorithm O(n + n log m).

因此,你的直觉是正确的:总是为可能的向量预留空间,而不是因为push_back很慢,而是因为它可以触发 >慢。此外,如果你看看 shrink_to_fit 的实现,你会看到它也做了一个副本重新分配,暂时加倍你的内存成本,并导致进一步碎片。

Your instinct, therefore, is correct: always pre-reserve space for your vectors where possible, not because push_back is slow, but because it can trigger a reallocation that is slow. Also, if you look at the implementation of shrink_to_fit, you'll see it also does a copy reallocation, temporarily doubling your memory cost and causing further fragmentation.

这里的问题是你不总是知道输出向量需要多少空间;通常的反应是使用启发式和可能自定义分配器。默认情况下,为每个输出向量保留n / 2 + k个输入大小,其中k是一些安全余量。这样,你通常有足够的空间输出,只要你的输入是合理的平衡,push_back可以重新分配在罕见的情况下,它不是。如果你发现push_back的指数行为是浪费太多的内存(导致你保留2n元素,当你只需要n + 2),你可以给它一个自定义分配器,扩大向量大小在更小的线性块。但当然,在向量真的不平衡,并且你最终做了很多调整大小的情况下,这将是很慢的。

Your problem here is that you don't always know exactly how much space you'll need for the output vectors; the usual response is to use a heuristic and maybe a custom allocator. Reserve n/2+k of the input size for each of your output vectors by default, where k is some safety margin. That way you'll usually have enough space for the output, so long as your input is reasonably balanced, and push_back can reallocate in the rare cases where it's not. If you find that push_back's exponential behavior is wasting too much memory ( causing you to reserve 2n elements when really you just needed n+2 ), you can give it a custom allocator that expands the vector size in smaller, linear chunks — but of course that will be much slower in cases where the vectors are really unbalanced and you end up doing lots of resizes.

没有办法总是保留准确的正确金额的空间,而不预先走动输入元件;但是如果你知道通常的平衡是什么样子,你可以使用启发式方法来对其进行很好的猜测,以便在许多迭代中获得统计性能。

There's no way to always reserve the exact right amount of space without walking the input elements in advance; but if you know what the balance usually looks like, you can use a heuristic to make a good guess at it for a statistical performance gain over many iterations.

这篇关于智能关于矢量内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆