内部数组访问比std :: vector访问快得多 - Black Magic? [英] Intrinsic array access is much faster than std::vector access -- Black Magic?

查看:157
本文介绍了内部数组访问比std :: vector访问快得多 - Black Magic?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经设置了一个测试程序来比较数组访问性能和std :: vector的性能。我发现了几个类似的问题,但似乎都没有解决我的具体问题。一段时间以来,为什么数组访问速度似乎比矢量访问快6倍,当我在过去读过它们应该是等价的时候,我一直在挠头。事实证明,这似乎是英特尔编译器(v12)和优化(发生在-O1以上的任何事情)的函数,因为我看到使用gcc v4.1.2时std :: vector的性能更好,而数组有 gcc v4.4.4的2倍优势。我正在使用Xeon X5355内核的RHEL 5.8机器上运行测试。顺便说一句,我发现迭代器比元素访问更快。

I have set up a test program to compare array access performance to that of std::vector. I have found several similar questions but none seem to address my specific concern. I was scratching my head for some time over why array access seemed to be 6 times faster than vector access, when I have read in the past that they should be equivalent. As it turns out, this seems to be a function of the Intel compiler (v12) and optimization (occurs with anything above -O1), since I see better performance with std::vector when using gcc v4.1.2, and array has only a 2x advantage with gcc v4.4.4. I am running the tests on a RHEL 5.8 machine with Xeon X5355 cores. As an aside, I have found iterators to be faster than element access.

我正在使用以下命令进行编译:

I am compiling with the following commands:

icpc -fast test.cc
g++44 -O3 test.cc

有人可以解释速度的显着提升吗?

Can anyone explain the dramatic improvement in speed?

#include <vector>
#include <iostream>

using namespace std;

int main() {
  int sz = 100;
  clock_t start,stop;
  int ncycle=1000;
  float temp  = 1.1;

  // Set up and initialize vector
  vector< vector< vector<float> > > A(sz, vector< vector<float> >(sz,  vector<float>(sz, 1.0)));

  // Set up and initialize array
  float*** a = new float**[sz];
  for( int i=0; i<sz; ++i) {
    a[i] = new float*[sz];
    for( int j=0; j<sz; ++j) {
      a[i][j] = new float[sz]();
      for( int k=0; k<sz; ++k)
        a[i][j][k] = 1.0;
    }
  }

  // Time the array
  start = clock();
  for( int n=0; n<ncycle; ++n )
    for( int i=0; i<sz; ++i )
      for( int j=0; j<sz; ++j )
        for( int k=0; k<sz; ++k )
          a[i][j][k] *= temp;

  stop = clock();
  std::cout << "STD ARRAY: " << double((stop - start)) / CLOCKS_PER_SEC << " seconds"     << std::endl;

  // Time the vector
      start = clock();
  /*
  */
  for( int n=0; n < ncycle; ++n )
    for (vector<vector<vector<float> > >::iterator it1 = A.begin(); it1 != A.end();     ++it1)
      for (vector<vector<float> >::iterator it2 = it1->begin(); it2 != it1->end();     ++it2)
        for (vector<float>::iterator it3 =it2->begin(); it3 != it2->end(); ++it3)
          *it3 *= temp;
  /*
     for( int n=0; n < ncycle; ++n )
       for( int i=0; i < sz; ++i )
         for( int j=0; j < sz; ++j )
           for( int k=0; k < sz; ++k )
             A[i][j][k] *= temp;
  */

  stop = clock();
  std::cout << "VECTOR: " << double((stop - start)) / CLOCKS_PER_SEC << " seconds" <<     std::endl;


  for( int i=0; i<100; ++i) {
    for( int j=0; j<100; ++j)
      delete[] a[i][j];
  }
  for( int i=0; i<100; ++i) {
    delete[] a[i];
  }
  delete[] a;
  return 0;
}

已解决

在注意到Bo指示编译器知道关于循环的所有内容并且因此可以比矢量情况更优化它之后,我通过调用rand()将乘法替换为temp乘法。 。这平整了比赛场地,事实上似乎给了std :: vector一点点领先。各种情况的时间如下:

After noting Bo's indication that the compiler "knows everything" about the loop and can therefore optimize it more than the vector case, I replaced the multiplications by "temp" with multiplications by a call to "rand()". This leveled the playing field and in fact seems to give std::vector a slight lead. Timing of various scenarios are as follows:

ARRAY (flat): 111.15 seconds
ARRAY (flat): 0.011115 seconds per cycle
ARRAY (3d): 111.73 seconds
ARRAY (3d): 0.011173 seconds per cycle
VECTOR (flat): 110.51 seconds
VECTOR (flat): 0.011051 seconds per cycle
VECTOR (3d): 118.05 seconds
VECTOR (3d): 0.011805 seconds per cycle
VECTOR (flat iterator): 108.55 seconds
VECTOR (flat iterator): 0.010855 seconds per cycle
VECTOR (3d iterator): 111.93 seconds
VECTOR (3d iterator): 0.011193 seconds per cycle

外卖似乎是矢量和数组一样快,并且在展平(连续内存)并与迭代器一起使用时稍快一些。我的实验仅平均超过10,000次迭代,因此可以认为这些都大致相同,并且选择使用哪个应该由最容易使用的选择来确定;在我的情况下,这将是3d迭代器的情况。

The takeaway seems to be that vectors are just as fast as arrays, and slightly faster when flattened (contiguous memory) and used with iterators. My experiment only averaged over 10,000 iterations, so it could be argued that these are all roughly equivalent and the choice of which to use should be determined by whichever is easiest to use; in my case, that would be the "3d iterator" case.

推荐答案

这里没有黑魔法,它也是编译器很容易看到这里

There is no black magic here, it is just too easy for the compiler to see that here

for( int n=0; n<ncycle; ++n )
   for( int i=0; i<sz; ++i )
     for( int j=0; j<sz; ++j )
       for( int k=0; k<sz; ++k )
          a[i][j][k] *= temp;

编译时所有内容都是已知的。它可以轻松展开循环以加快速度。

everything is known at compile time. It can easily unroll the loop to speed it up.

这篇关于内部数组访问比std :: vector访问快得多 - Black Magic?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆