VC ++ 2005 SP1-商店转发的不良代码生成问题 [英] VC++ 2005 SP1 - poor code-generation problem for store forwarding

查看:95
本文介绍了VC ++ 2005 SP1-商店转发的不良代码生成问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部,
我们试图写部分和",在VC ++中.我们遇到了以下性能问题.参见下面的代码的汇编版本和C版本.我们正在将发行" O2优化模式.

手写汇编代码(forward_psum_asmbly())的执行速度比C代码快3倍以上.原因是愚蠢". VC ++生成的代码.愚蠢的原因在代码框下方进行了说明...


All,
We were attempting to write "partial sums" in VC++. We hit the following performance issue. See below an assembly version and C version of the code. We are compiling in "Release" mode with O2 optimization.

The hand written assembly code (forward_psum_asmbly()) performs more than 3x faster than C code. The reason being "stupid" code generated by VC++. The reason for stupidity is explained below the code box...


void forward_psum_asmbly(float *A, int n)
{
	__asm {
	push        ecx  
	mov			eax, A
	mov			ecx, n
	dec			ecx
	fld			dword ptr [eax]

L_0:
	add			eax, 4
	fadd        dword ptr [eax] 
	fst         dword ptr [eax] 
	dec			ecx
	cmp			ecx, 0
	jnz			L_0
	fstp		dword ptr[eax]
	pop         ecx  
	}
}


int forward_psum_c(float *B, int n)
{
	for(int i =0;i <(n-1); i++)
	{
		B[i+1] = B[i+1] + B[i];
	}
	return 0;
}<br/>

The assembly code generated by the compiler has "loop-unrolling" in place (4 unrolls) 
(note: assembly has none). Here is the sample code:



00401020 fld dword ptr [eax-8] 00401023 add eax,10h 00401026 sub ecx,1 00401029 fadd dword ptr [eax-14h] 0040102C fstp dword ptr [esp+4] 00401030 fld dword ptr [esp+4] 00401034 fst dword ptr [eax-14h] 00401037 fadd dword ptr [eax-10h] 0040103A fstp dword ptr [esp+4] 0040103E fld dword ptr [esp+4] 00401042 fst dword ptr [eax-10h]


可以看到虚拟存储并将其加载到[esp + xx]变量-我认为这是用于存储转发优化的...但是,这种优化是USELESS.因为没有"fadd"负载来自该堆栈区域.它们直接从数组中加载.....

此行为是否已在任何最新发行版中解决? (我们使用VS 2005,SP1,Win XP Pro -Service Pack 2)

请注意,
最好的问候,
Sarnath


One can see the dummy stores and load to [esp+xx] variable - which I presume to be for store-forward optimization... However, this optimization is USELESS. Because none of the "fadd"s load from that stack area. They directly load from the array.....

Is this behaviour fixed in any of the latest releases ? (We use VS 2005, SP1, Win XP Pro - Service Pack 2)

Kindly advice,
Best Regards,
Sarnath

推荐答案

您可以尝试使用STL的 partial_sum 功能-
You could try to use the partial_sum function of STL - http://msdn.microsoft.com/en-us/library/dfcy77sd.aspx


这篇关于VC ++ 2005 SP1-商店转发的不良代码生成问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆