VC ++ 2005 SP1-商店转发的不良代码生成问题 [英] VC++ 2005 SP1 - poor code-generation problem for store forwarding
问题描述
全部,
我们试图写部分和",在VC ++中.我们遇到了以下性能问题.参见下面的代码的汇编版本和C版本.我们正在将发行" O2优化模式.
手写汇编代码(forward_psum_asmbly())的执行速度比C代码快3倍以上.原因是愚蠢". VC ++生成的代码.愚蠢的原因在代码框下方进行了说明...
All,
We were attempting to write "partial sums" in VC++. We hit the following performance issue. See below an assembly version and C version of the code. We are compiling in "Release" mode with O2 optimization.
The hand written assembly code (forward_psum_asmbly()) performs more than 3x faster than C code. The reason being "stupid" code generated by VC++. The reason for stupidity is explained below the code box...
void forward_psum_asmbly(float *A, int n)
{
__asm {
push ecx
mov eax, A
mov ecx, n
dec ecx
fld dword ptr [eax]
L_0:
add eax, 4
fadd dword ptr [eax]
fst dword ptr [eax]
dec ecx
cmp ecx, 0
jnz L_0
fstp dword ptr[eax]
pop ecx
}
}
int forward_psum_c(float *B, int n)
{
for(int i =0;i <(n-1); i++)
{
B[i+1] = B[i+1] + B[i];
}
return 0;
}<br/>
The assembly code generated by the compiler has "loop-unrolling" in place (4 unrolls)
(note: assembly has none).
Here is the sample code:
00401020 fld dword ptr [eax-8]
00401023 add eax,10h
00401026 sub ecx,1
00401029 fadd dword ptr [eax-14h]
0040102C fstp dword ptr [esp+4]
00401030 fld dword ptr [esp+4]
00401034 fst dword ptr [eax-14h]
00401037 fadd dword ptr [eax-10h]
0040103A fstp dword ptr [esp+4]
0040103E fld dword ptr [esp+4]
00401042 fst dword ptr [eax-10h]
可以看到虚拟存储并将其加载到[esp + xx]变量-我认为这是用于存储转发优化的...但是,这种优化是USELESS.因为没有"fadd"负载来自该堆栈区域.它们直接从数组中加载.....
此行为是否已在任何最新发行版中解决? (我们使用VS 2005,SP1,Win XP Pro -Service Pack 2)
请注意,
最好的问候,
Sarnath
One can see the dummy stores and load to [esp+xx] variable - which I presume to be for store-forward optimization... However, this optimization is USELESS. Because none of the "fadd"s load from that stack area. They directly load from the array.....
Is this behaviour fixed in any of the latest releases ? (We use VS 2005, SP1, Win XP Pro - Service Pack 2)
Kindly advice,
Best Regards,
Sarnath
推荐答案
您可以尝试使用STL的 partial_sum 功能-
You could try to use the partial_sum function of STL - http://msdn.microsoft.com/en-us/library/dfcy77sd.aspx
这篇关于VC ++ 2005 SP1-商店转发的不良代码生成问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!