向量< bool>的性能差. VS2012在64位目标中 [英] Poor performance of vector<bool> in 64-bit target with VS2012
问题描述
为该课程设定基准:
struct Sieve {
std::vector<bool> isPrime;
Sieve (int n = 1) {
isPrime.assign (n+1, true);
isPrime[0] = isPrime[1] = false;
for (int i = 2; i <= (int)sqrt((double)n); ++i)
if (isPrime[i])
for (int j = i*i; j <= n; j += i)
isPrime[j] = false;
}
};
当调用大量构造函数时,例如64位二进制版本与32位版本(发布版本)相比,我的性能(CPU时间)降低了3倍以上.
I'm getting over 3 times worse performance (CPU time) with 64-bit binary vs. 32-bit version (release build) when calling a constructor for a large number, e.g.
Sieve s(100000000);
我测试了sizeof(bool)
,两个版本都为1
.
当我用vector<char>
替换vector<bool>
时,对于64位和32位版本,性能变得相同.为什么会这样?
I tested sizeof(bool)
and it is 1
for both versions.
When I substitute vector<bool>
with vector<char>
performance becomes the same for 64-bit and 32-bit versions. Why is that?
以下是S(100000000)
的运行时间(发布模式,先是32位,后是64位):
Here are the run times for S(100000000)
(release mode, 32-bit first, 64-bit second)):
vector<bool>
0.97s 3.12s
vector<char>
0.99s 0.99s
vector<int>
1.57s 1.59s
vector<bool>
0.97s 3.12s
vector<char>
0.99s 0.99s
vector<int>
1.57s 1.59s
我还对VS2010(Wouter Huysentruit的响应提示)进行了健全性测试,结果为0.98s 0.88s.因此,VS2012实现存在问题.
I also did a sanity test with VS2010 (prompted by Wouter Huysentruit's response), which produced 0.98s 0.88s. So there is something wrong with VS2012 implementation.
我向 编辑
下面的许多答案都评论了使用int
进行索引的缺陷.这可能是正确的,但即使是大巫师本人也在他的书中使用了标准的for (int i = 0; i < v.size(); ++i)
,因此这种模式也不会造成明显的性能损失.此外,在Going Native 2013大会期间提出了这个问题,C ++专家小组的主持人评论了他们早期的建议,即使用size_t
进行索引以及将返回类型作为size()
作为错误.他们说:我们很抱歉,我们还很年轻..."
Many answers below comment on deficiencies of using int
for indexing. This may be true, but even the Great Wizard himself is using a standard for (int i = 0; i < v.size(); ++i)
in his books, so such a pattern should not incur a significant performance penalty. Additionally, this issue was raised during Going Native 2013 conference and the presiding group of C++ gurus commented on their early recommendations of using size_t
for indexing and as a return type of size()
as a mistake. They said: "we are sorry, we were young..."
该问题的标题可以改为:从VS2010升级到VS2012时,此代码的性能下降超过3倍.
The title of this question could be rephrased to: Over 3 times performance drop on this code when upgrading from VS2010 to VS2012.
编辑
我进行了一次粗略的尝试,以找到索引i
和j
的内存对齐,并发现此检测版本:
I made a crude attempt at finding memory alignment of indexes i
and j
and discovered that this instrumented version:
struct Sieve {
vector<bool> isPrime;
Sieve (int n = 1) {
isPrime.assign (n+1, true);
isPrime[0] = isPrime[1] = false;
for (int i = 2; i <= sqrt((double)n); ++i) {
if (i == 17) cout << ((int)&i)%16 << endl;
if (isPrime[i])
for (int j = i*i; j <= n; j += i) {
if (j == 4) cout << ((int)&j)%16 << endl;
isPrime[j] = false;
}
}
}
};
现在可以自动快速运行(仅比32位版本慢10%).这种性能和VS2010的性能使我们难以接受一种优化器的理论,该理论在处理int
索引而不是size_t
时存在固有的问题.
auto-magically runs fast now (only 10% slower than 32-bit version). This and VS2010 performance makes it hard to accept a theory of optimizer having inherent problems dealing with int
indexes instead of size_t
.
推荐答案
我已经在VS2010中使用vector<bool>
对此进行了测试:在i3上,32位需要1452ms,而64位需要1264ms.
I've tested this with vector<bool>
in VS2010: 32-bit needs 1452ms while 64-bit needs 1264ms to complete on a i3.
VS2012中的同一测试(这次是在i7上)需要700ms(32位)和2730ms(64位),因此VS2012中的编译器出了点问题.也许您可以将此测试用例作为错误报告给Microsoft.
The same test in VS2012 (on i7 this time) needs 700ms (32-bit) and 2730ms (64-bit), so there is something wrong with the compiler in VS2012. Maybe you can report this test case as a bug to Microsoft.
更新
问题在于,当使用int作为迭代器时,VS2012编译器会对内部for循环中的部分代码使用临时堆栈变量.下面列出的组装零件是<vector>
的+= operator
中<vector>
内部代码的一部分.
The problem is that the VS2012 compiler uses a temporary stack variable for a part of the code in the inner for-loop when using int as iterator. The assembly parts listed below are part of the code inside <vector>
, in the += operator
of the std::vector<bool>::iterator
.
size_t作为迭代器
使用size_t
作为迭代器时,部分代码如下所示:
When using size_t
as iterator, a part of the code looks like this:
or rax, -1
sub rax, rdx
shr rax, 5
lea rax, QWORD PTR [rax*4+4]
sub r8, rax
这里,所有指令都使用非常快的CPU寄存器.
Here, all instructions use CPU registers which are very fast.
int作为迭代器
使用int
作为迭代器时,该部分看起来像这样:
When using int
as iterator, that same part looks like this:
or rcx, -1
sub rcx, r8
shr rcx, 5
shl rcx, 2
mov rax, -4
sub rax, rcx
mov rdx, QWORD PTR _Tmp$6[rsp]
add rdx, rax
在这里,您看到正在使用_Tmp $ 6堆栈变量,这会导致速度变慢.
Here you see the _Tmp$6 stack variable being used, which causes the slowdown.
将编译器指向正确的方向
有趣的是,您可以直接使用vector<bool>::iterator
将编译器指向正确的方向.
The funny part is that you can point the compiler into the right direction by using the vector<bool>::iterator
directly.
struct Sieve {
std::vector<bool> isPrime;
Sieve (int n = 1) {
isPrime.assign(n + 1, true);
std::vector<bool>::iterator it1 = isPrime.begin();
std::vector<bool>::iterator end = it1 + n;
*it1++ = false;
*it1++ = false;
for (int i = 2; i <= (int)sqrt((double)n); ++it1, ++i)
if (*it1)
for (std::vector<bool>::iterator it2 = isPrime.begin() + i * i; it2 <= end; it2 += i)
*it2 = false;
}
};
这篇关于向量< bool>的性能差. VS2012在64位目标中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!