矢量vs阵列性能 [英] Vector vs Array Performance

查看:149
本文介绍了矢量vs阵列性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在另一个线程中,我开始讨论关于矢量和数组,其中我主要玩魔鬼的倡导者,按下按钮。然而,在这过程中,我偶然发现了一个测试案例,让我有些困惑,我想对它进行真正的讨论,关于我的虐待我正在玩魔鬼的倡导者,开始一个真正的讨论这个线程现在是不可能的。然而,特别的例子让我很感兴趣,我不能令人满意地向我解释。

In another thread I started a discussion about Vectors and Arrays, in which I was largely playing devil's advocate, to push buttons. However, during the course of this, I stumbled onto a test case that has me a little perplexed, and I would like to have a real discussion about it, over the "abuse" I'm getting for playing devil's advocate, starting a real discussion on that thread is now impossible. However, the particular example has me intrigued, and I cannot explain it to myself satisfactorily.

讨论是关于Vector vs数组的一般性能,忽略动态元素。例如:显然不断使用push_back()在向量中会减慢它。我们假设向量和数组都预先填充了数据。我提出的并随后由线程中的个人修改的示例如下:

The discussion is about the general performance of Vector vs Arrays, ignoring dynamic elements. Ex: obviously continually using push_back() in a vector is going to slow it down. We're assuming that the vector and array are pre-populated with data. The example I presented, and subsequently modified by an individual in the thread, was the following:

#include <iostream>
#include <vector>
#include <type_traits>
using namespace std;

const int ARRAY_SIZE = 500000000;

// http://stackoverflow.com/a/15975738/500104
template <class T>
class no_init_allocator
{
public:
    typedef T value_type;

    no_init_allocator() noexcept {}
    template <class U>
        no_init_allocator(const no_init_allocator<U>&) noexcept {}
    T* allocate(std::size_t n)
        {return static_cast<T*>(::operator new(n * sizeof(T)));}
    void deallocate(T* p, std::size_t) noexcept
        {::operator delete(static_cast<void*>(p));}
    template <class U>
        void construct(U*) noexcept
        {
            // libstdc++ doesn't know 'is_trivially_default_constructible', still has the old names
            static_assert(is_trivially_default_constructible<U>::value,
            "This allocator can only be used with trivally default constructible types");
        }
    template <class U, class A0, class... Args>
        void construct(U* up, A0&& a0, Args&&... args) noexcept
        {
            ::new(up) U(std::forward<A0>(a0), std::forward<Args>(args)...);
        }
};

int main() {
    srand(5);  //I use the same seed, we just need the random distribution.
    vector<char, no_init_allocator<char>> charArray(ARRAY_SIZE);
    //char* charArray = new char[ARRAY_SIZE];
    for(int i = 0; i < ARRAY_SIZE; i++) {
        charArray[i] = (char)((i%26) + 48) ;
    }

    for(int i = 0; i < ARRAY_SIZE; i++) {
        charArray[i] = charArray[rand() % ARRAY_SIZE];
    }
}



当我在我的机器上运行时,跟随端子输出。第一次运行是用向量线取消注释的,第二次是用数组线取消注释。我使用最高水平的优化,给向量最好的成功机会。下面是我的结果,前两个运行与数组行取消注释,第二个与向量行。

When I run this on my machine, I get the following terminal output. The first run is with the vector line uncommented, the second is with the array line uncommented. I used the highest level of optimization, to give the vector the best chance of success. Below are my results, the first two runs with the array line uncommented, the second two with the vector line.

//Array run # 1
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out

real    0m20.287s
user    0m20.068s
sys 0m0.175s

//Array run # 2
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out

real    0m21.504s
user    0m21.267s
sys 0m0.192s

//Vector run # 1
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out

real    0m28.513s
user    0m28.292s
sys 0m0.178s

//Vector run # 2
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out

real    0m28.607s
user    0m28.391s
sys 0m0.178s

数组超越向量并不让我感到惊讶,但是,差别是50%的数量惊喜我,我希望他们可以忽略不计,我感觉这个测试用例的性质我会模糊结果的性质。当对较小的阵列大小运行此测试时,性能差异显着消失。

That arrays outperform vectors does not surprise me, however, that the difference is on the order of 50% surprises me very much, I would expect that they would be negligible, and I feel like the nature of this test case me be obscuring the nature of the results. When you run this test on array sizes that are smaller, the performance differences dissipate dramatically.

我的解释:

vector的附加实现指令会导致向量指令在内存中对齐,也许在这个例子中,一个分裂在一个真正的坏点在2个不同的块。这导致内存在高速缓存与数据高速缓存与指令高速缓存之间来回跳跃的频率比您预期的要高。我也怀疑LLVM编译器可能夸大了弱点,并且由于一些较新的C ++ 11元素,优化较差,虽然我没有理由除了假设和猜想这些解释之一。

The additional implementation instructions for vector are causing the vector instructions to align in memory poorly, perhaps even on this example, a split at a really bad point on 2 different "blocks". This is causing memory to jump back and forth between levels of cache vs data cache vs instruction cache more frequently than you would expect. I also suspect that the LLVM compiler may be exaggerating weaknesses, and optimizing poorly, due to some of the newer C++11 elements, though I have no reason for either of these explanations besides hypothesis and conjecture.

我有兴趣,如果A:有人可以复制我的结果和B:如果有人有更好的解释如何计算机运行这个特定的基准和为什么矢量是这样在这种情况下显着性能不佳的数组。

I am interested if A: that someone can replicate my results and B: if someone has a better explanation for how the computer is running this particular benchmark and why vector is so dramatically underperforming arrays in this instance.

我的设置: http:// www .newegg.com / Product / Product.aspx?Item = N82E16834100226

推荐答案

我可以保证LLVM misoptimize std :: vector(如果你事实上是优化的),至少现在。它不正确地内联许多涉及的函数调用。你将获得更好的性能与GCC。

I can guarantee that LLVM does infact misoptimize std::vector (if you are in fact optimising at all), at least as of right now. It does not correctly inline many of the function calls involved. You will get better performance with GCC.

这篇关于矢量vs阵列性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆