矢量vs地图性能混乱 [英] vector vs map performance confusion

查看:174
本文介绍了矢量vs地图性能混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

edit:我特别将 std :: vector 线性搜索操作与 std :: map 二进制搜索操作,因为这是Herb的声称似乎与之相关。我知道使用二进制搜索会将性能从O(N)移动到O(log N),但这不会测试Herb的索赔



Bjarne Stroustrup和Herb Sutter最近谈到了如何真棒 std :: vector 是在情况下,期望 std :: list 使用,这是由于在链表遍历期间的高速缓存未命中的成本。 (请参见 http://channel9.msdn.com/Events/Build/2014/2 -661 在48分钟标记)



Herb做了进一步的声明,但是一个有序矢量的操作甚至比 std :: map (请参阅 http://i.imgur.com/zX07TZR。 png 从上面的channel9视频的51:30标记),我发现很难捉摸。因此,我创建了一个小型测试来演示此问题,但难以再现这些结果: https://ideone.com/MN7DYK



这是测试代码:

  template< typename C> ; 
void test(std :: string name,std :: vector< int> shuffledInputValues,C& c)
{
//用'shuffledInputValues'填充容器'c'清除所有
{
std :: cout< 测试<名称<< ...< std :: endl;
timer t;

for(auto val:shuffledInputValues)insert(c,val);
for(auto val:shuffledInputValues)remove(c,val);
}
}

//输出:
//测试向量... 99.189ms
//测试deque ... 120.848ms
// testing set ... 4.077ms

注意 std :: vector 执行比 std :: set 慢一个数量级。当然这是我预期的结果,但我对于Herb试图做出的声明感到困惑。



我做错了什么?



我在测试应用程式上的注意事项:




  • 它必须使用线性操作 - 整个练习是演示CPU缓存魔术,这些是约束Herb和Bjarne对练习

  • 我没有尝试任何棘手的循环,

  • 我将循环限制为10K项,因为ideone在更大的集合上超时,但是增加了大小不更改结果



编辑:请参阅 https://ideone.com/916fVd 作为仅比较查找性能的修改示例。线性搜索具有相同的性能。

解决方案

我发现幻灯片以方便参考(我看不到图形,但我想这可能是因为专有文件格式)。相关幻灯片是编号39,其描述正在被解决的问题:


§生成N个随机整数并将它们插入序列



通过在序列中选择一个随机位置并删除该元素,一次删除一个元素。

p>

现在,很明显,链表不是这个问题的好选择。即使列表比向量在开始或中间插入/删除要好得多,但是由于需要线性搜索,因此在随机位置插入/删除是不好的。由于更好的缓存效率,线性搜索要快得多。



Sutter建议地图(或一般的树)看起来是这个算法的自然选择,因为你得到O(log n)搜索。事实上,它在插入部分中对于大的 N 值很容易击败向量。



。你需要删除第n个元素(对于随机n)。这是我相信你的代码作弊。您可以按照它们插入的顺序删除元素,有效地使用输入向量作为查找表查找元素在随机位置的值,以便您可以在O(log n)中搜索它。所以你真的使用set和一个向量的组合来解决这个问题。



一个常规的二叉搜索树,例如用于 std :: map std :: set (我假设Sutter使用)没有用于查找第n个元素的快速算法。 以下是一个 std :: map 和 std :: set 不提供对基础树结构的访问,所以对于那些你'重新卡住了按顺序遍历(纠正我,如果我错了)这是一个线性搜索!我真的很惊讶地图版本是与Sutter的结果中的矢量一个竞争力。



对于log(n)复杂性,你需要一个结构,如订单统计树,这是不幸的是不是由标准库提供。有此处所示的基于GNU政策的STL地图。



这里是一个快速测试代码我对矢量vs设置vs ost(二进制搜索良好的测量的矢量) https: //ideone.com/DoqL4H
设置慢得多,而其他基于树的结构比向量快,这与Sutter的结果不一致。

 订单统计树:15.958ms 
矢量二分搜索:99.661ms
矢量线性搜索:282.032ms
设置:2004.04ms

(N = 20000,差异只会更大, p>

简而言之,我得出了同样的结论,Sutter的原始结果看起来很奇怪,但原因略有不同。在我看来,更好的渐近复杂性这次赢得较低的常数因子。



请注意,问题描述并不排除重复的随机值的可能性,因此使用map / set而不是multimap / multiset是欺骗有点在地图/集合,但我认为,当价值域远远大于 N 时,只有小的意义。此外,预先保留向量不会显着提高性能(当N = 20000时约为1%)。


edit: I am specifically comparing std::vector's linear search operations to the std::map binary search operations because that is what Herb's claim seemed to relate to. I know that using a binary search will move the performance from O(N) to O(log N) but that would not be testing Herb's claim

Bjarne Stroustrup and Herb Sutter have both recently talked about how awesome std::vector is in situations one would expect std::list to be used, owing to the cost of cache misses during linked list traversal. (see http://channel9.msdn.com/Events/Build/2014/2-661 at the 48 minute mark)

Herb made a further statement however that operations on an ordered vector were even faster than std::map, (see http://i.imgur.com/zX07TZR.png taken from the 51:30 mark of the above channel9 video) which I found difficult to fathom. So I created a small test to demonstrate this and had difficulty reproducing these results: https://ideone.com/MN7DYK

This is the test code:

template <typename C>
void test(std::string name, std::vector<int> shuffledInputValues, C & c)
{
   // fill container 'c' with values from 'shuffledInputValues' then erase them all
   {
      std::cout << "testing " << name << "..." << std::endl;
      timer t;

      for (auto val : shuffledInputValues) insert(c, val);
      for (auto val : shuffledInputValues) remove(c, val);
  }
}

// output:
// testing vector...99.189ms
// testing deque...120.848ms
// testing set...4.077ms

Notice how the std::vector performs an order of magnitude slower than std::set . Of course this is the result I expected, but I am confused about the claim that Herb was trying to make.

What am I doing wrong? Or am I misunderstanding Herb's claim?

Notes on my test app:

  • it must use linear operations - the whole point of the exercise is to demonstrate CPU cache magic, these are the constraints Herb and Bjarne put on the exercise
  • I didn't try any tricky loop-unravelling for my vector iteration, but I believe that the iteration isn't affecting performance much anyway
  • I limited the loop to 10K items because ideone times out on larger sets, but increasing the size doesn't change the results

edit: see https://ideone.com/916fVd for a modified example that only compares the performance of lookups. Linear searching exhibits the same performance.

解决方案

I found the slides for easier reference (I can't see graphs, but I guess that might be because of proprietary file format). A relevant slide is number 39 which describes the problem that is being solved:

§ Generate N random integers and insert them into a sequence so that each is inserted in its proper position in the numerical order.

§ Remove elements one at a time by picking a random position in the sequence and removing the element there.

Now, it should be rather obvious that a linked list is not a good choice for this problem. Even though a list is much better than vector for inserting/removing in the beginning or in the middle, it's not good for inserting/removing in a random position because of the need for linear search. And linear search is much faster with vectors because of better cache efficiency.

Sutter suggests that a map (or a tree in general) would seem a natural choice for this algorithm because you get O(log n) search. And indeed, it does beat the vector quite easily for large N values in the insertion part.

Here comes the but. You need to remove the nth element (for random n). This is where I believe your code is cheating. You remove the elements in the order they were inserted, effectively using the input vector as a lookup table for finding value of an element at a "random" position so that you can search for it in O(log n). So you're really using a combination of set and a vector to solve the problem.

A regular binary search tree such as one used for std::map or std::set (which I assume Sutter used) doesn't have a fast algorithm for finding the nth element. Here's one which is claimed to be O(log n) on average and O(n) in the worst case. But std::map and std::set don't provide access to the underlying tree structure so for those you're stuck with in-order traversal (correct me if I'm wrong) which is a linear search again! I'm actually surprised that the map version is competitive with the vector one in Sutter's results.

For log(n) complexity, you need a structure such as Order statistic tree which is unfortunately not provided by standard library. There's GNU Policy-Based STL MAP as shown here.

Here is a quick test code I made for vector vs set vs ost (vs vector with binary search for good measure) https://ideone.com/DoqL4H Set is much slower while the other tree based structure is faster than the vector, which is not in line with Sutter's results.

order statistics tree: 15.958ms
vector binary search: 99.661ms
vector linear search: 282.032ms
set: 2004.04ms

(N = 20000, difference is only going to be greater in favor for the ost with bigger values)

In short, I came to same conclusion that Sutter's original results seem odd but for a slightly different reason. It seems to me that better asymptotic complexity wins lower constant factors this time.

Note that the problem description doesn't exclude the possibility of duplicate random values so using map / set instead of multimap / multiset is cheating a bit in the favor of the map / set, but I assume that to have only small significance when value domain is much larger than N. Also, pre-reserving the vector doesn't improve the performance significantly (around 1% when N = 20000).

这篇关于矢量vs地图性能混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆