向量< bool>的性能差. VS2012在64位目标中 [英] Poor performance of vector<bool> in 64-bit target with VS2012

查看:104
本文介绍了向量< bool>的性能差. VS2012在64位目标中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为该课程设定基准:

struct Sieve {
  std::vector<bool> isPrime;

  Sieve (int n = 1) {
    isPrime.assign (n+1, true);
    isPrime[0] = isPrime[1] = false;

    for (int i = 2; i <= (int)sqrt((double)n); ++i)
      if (isPrime[i]) 
        for (int j = i*i; j <= n; j += i)
          isPrime[j] = false;
  }
};

当调用大量构造函数时,例如64位二进制版本与32位版本(发布版本)相比,我的性能(CPU时间)降低了3倍以上.

I'm getting over 3 times worse performance (CPU time) with 64-bit binary vs. 32-bit version (release build) when calling a constructor for a large number, e.g.

Sieve s(100000000);

我测试了sizeof(bool),两个版本都为1. 当我用vector<char>替换vector<bool>时,对于64位和32位版本,性能变得相同.为什么会这样?

I tested sizeof(bool) and it is 1 for both versions. When I substitute vector<bool> with vector<char> performance becomes the same for 64-bit and 32-bit versions. Why is that?

以下是S(100000000)的运行时间(发布模式,先是32位,后是64位):

Here are the run times for S(100000000) (release mode, 32-bit first, 64-bit second)):

vector<bool> 0.97s 3.12s vector<char> 0.99s 0.99s vector<int> 1.57s 1.59s

vector<bool> 0.97s 3.12s vector<char> 0.99s 0.99s vector<int> 1.57s 1.59s

我还对VS2010(Wouter Huysentruit的响应提示)进行了健全性测试,结果为0.98s 0.88s.因此,VS2012实现存在问题.

I also did a sanity test with VS2010 (prompted by Wouter Huysentruit's response), which produced 0.98s 0.88s. So there is something wrong with VS2012 implementation.

我向 编辑

下面的许多答案都评论了使用int进行索引的缺陷.这可能是正确的,但即使是大巫师本人也在他的书中使用了标准的for (int i = 0; i < v.size(); ++i),因此这种模式也不会造成明显的性能损失.此外,在Going Native 2013大会期间提出了这个问题,C ++专家小组的主持人评论了他们早期的建议,即使用size_t进行索引以及将返回类型作为size()作为错误.他们说:我们很抱歉,我们还很年轻..."

Many answers below comment on deficiencies of using int for indexing. This may be true, but even the Great Wizard himself is using a standard for (int i = 0; i < v.size(); ++i) in his books, so such a pattern should not incur a significant performance penalty. Additionally, this issue was raised during Going Native 2013 conference and the presiding group of C++ gurus commented on their early recommendations of using size_t for indexing and as a return type of size() as a mistake. They said: "we are sorry, we were young..."

该问题的标题可以改为:从VS2010升级到VS2012时,此代码的性能下降超过3倍.

The title of this question could be rephrased to: Over 3 times performance drop on this code when upgrading from VS2010 to VS2012.

编辑

我进行了一次粗略的尝试,以找到索引ij的内存对齐,并发现此检测版本:

I made a crude attempt at finding memory alignment of indexes i and j and discovered that this instrumented version:

struct Sieve {
  vector<bool> isPrime;

  Sieve (int n = 1) {
    isPrime.assign (n+1, true);
    isPrime[0] = isPrime[1] = false;

    for (int i = 2; i <= sqrt((double)n); ++i) {
      if (i == 17) cout << ((int)&i)%16 << endl;
      if (isPrime[i]) 
        for (int j = i*i; j <= n; j += i) {
          if (j == 4) cout << ((int)&j)%16 << endl;
          isPrime[j] = false;
        }
    }
  }
};

现在可以自动快速运行(仅比32位版本慢10%).这种性能和VS2010的性能使我们难以接受一种优化器的理论,该理论在处理int索引而不是size_t时存在固有的问题.

auto-magically runs fast now (only 10% slower than 32-bit version). This and VS2010 performance makes it hard to accept a theory of optimizer having inherent problems dealing with int indexes instead of size_t.

推荐答案

我已经在VS2010中使用vector<bool>对此进行了测试:在i3上,32位需要1452ms,而64位需要1264ms.

I've tested this with vector<bool> in VS2010: 32-bit needs 1452ms while 64-bit needs 1264ms to complete on a i3.

VS2012中的同一测试(这次是在i7上)需要700ms(32位)和2730ms(64位),因此VS2012中的编译器出了点问题.也许您可以将此测试用例作为错误报告给Microsoft.

The same test in VS2012 (on i7 this time) needs 700ms (32-bit) and 2730ms (64-bit), so there is something wrong with the compiler in VS2012. Maybe you can report this test case as a bug to Microsoft.

更新

问题在于,当使用int作为迭代器时,VS2012编译器会对内部for循环中的部分代码使用临时堆栈变量.下面列出的组装零件是<vector>+= operator<vector>内部代码的一部分.

The problem is that the VS2012 compiler uses a temporary stack variable for a part of the code in the inner for-loop when using int as iterator. The assembly parts listed below are part of the code inside <vector>, in the += operator of the std::vector<bool>::iterator.

size_t作为迭代器

使用size_t作为迭代器时,部分代码如下所示:

When using size_t as iterator, a part of the code looks like this:

or  rax, -1
sub rax, rdx
shr rax, 5
lea rax, QWORD PTR [rax*4+4]
sub r8, rax

这里,所有指令都使用非常快的CPU寄存器.

Here, all instructions use CPU registers which are very fast.

int作为迭代器

使用int作为迭代器时,该部分看起来像这样:

When using int as iterator, that same part looks like this:

or  rcx, -1
sub rcx, r8
shr rcx, 5
shl rcx, 2
mov rax, -4
sub rax, rcx
mov rdx, QWORD PTR _Tmp$6[rsp]
add rdx, rax

在这里,您看到正在使用_Tmp $ 6堆栈变量,这会导致速度变慢.

Here you see the _Tmp$6 stack variable being used, which causes the slowdown.

将编译器指向正确的方向

有趣的是,您可以直接使用vector<bool>::iterator将编译器指向正确的方向.

The funny part is that you can point the compiler into the right direction by using the vector<bool>::iterator directly.

struct Sieve {
  std::vector<bool> isPrime;

  Sieve (int n = 1) {
    isPrime.assign(n + 1, true);

    std::vector<bool>::iterator it1 = isPrime.begin();
    std::vector<bool>::iterator end = it1 + n;
    *it1++ = false;
    *it1++ = false;
    for (int i = 2; i <= (int)sqrt((double)n); ++it1, ++i)
        if (*it1)
            for (std::vector<bool>::iterator it2 = isPrime.begin() + i * i; it2 <= end; it2 += i)
                *it2 = false;
  }
};

这篇关于向量&lt; bool&gt;的性能差. VS2012在64位目标中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆