默认副本分配运算符的性能问题 [英] Performance issue with the default copy assignment operator

查看:105
本文介绍了默认副本分配运算符的性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

依靠编译器(VS2010,x64)提供的默认副本分配运算符时,我遇到了意外的性能损失.

I've encountered an unexpected loss of performance when relying on the default copy assignment operator provided by the compiler (VS2010, x64).

所讨论的类是一个简单的3分量向量,成员中只有3个浮点数,并且定义了一个operator +函数.在我的测试用例中,我遍历了这些向量结构的三个大数组a,b,c,并添加分别有两个条目并分配 结果到第三:

The class in question is a simple 3-component vector with nothing but 3 floats as members and an operator+ function defined. In my testcase, I iterate over three large arrays a, b, c of these vector structs, adding two entries each and assigning the result to the third:

a[i] = b[i] + c[i];

在我看来,每当使用默认赋值运算符时,编译器首先将b [i] + c [i]的结果存储到堆栈中(即,三个分量浮点数中的每个浮点数都有一个存储)只存储到在存储它们之前立即将它们加载回寄存器 到实际目标数组"a".

It seems to me that whenever the default assignment operator is used, the compiler stores the result of b[i] + c[i] to the stack first (i.e., one store for each of the 3 component floats) only to immediately load them back to a register before storing them to the actual target array "a".

当我提供自己的规范赋值运算符来简单地复制3个struct成员时,不必要的数据移动将被忽略,结果将直接写入目标数组.

When I provide my own canonical assignment operator that simply copies the 3 struct members, the unnecessary data movement is omitted and the result is directly written to the target array.

这是我用来重现此行为的简单测试程序:

Here's a simple test program I've used to reproduce the behavior:

#include <cstdio>

struct Vec3
{
	Vec3 operator+(const Vec3& v) const
	{
		Vec3 res;
		res.x = x + v.x; res.y = y + v.y; res.z = z + v.z;
		return res;
	}

	// !!! Remove this to generate slowdown !!!
	Vec3& operator=(const Vec3& v)
	{ 
		x = v.x; y = v.y; z = v.z; 
		return *this; 
	}

	float x, y, z;
};

const int size = 1000000;
Vec3 a[size], b[size], c[size];

int main(int argc, char** argv)
{
	for(int i = 0; i != size; i++)
	{
		a[i] = b[i] + c[i];
	}

	printf("%f\n", a[0]);
	return 0;
}

使用手写的operator =,循环将编译为以下简洁代码:

With the handwritten operator=, the loop compiles to this neat code:

(13F7F1010h):
movss		xmm2,dword ptr [rax+rcx+0B750E0h]  
movss		xmm1,dword ptr [rax+rcx+0B750E4h]  
movss		xmm0,dword ptr [rax+rcx+0B750E8h]  
add			rax,0Ch  
addss		xmm2,dword ptr [rax+rcx+35D4h]  
addss		xmm1,dword ptr [rax+rcx+35D8h]  
addss		xmm0,dword ptr [rax+rcx+35DCh]  
movss		dword ptr [rax+rcx+16E6BD4h],xmm2  
movss		dword ptr [rax+rcx+16E6BD8h],xmm1  
movss		dword ptr [rax+rcx+16E6BDCh],xmm0  
cmp  		rax,0B71B00h  
jne  			main+10h (13FA11010h)

 

没有我自己的operator =,它会炸​​毁:

Without my own operator=, it blows up this:

(13F7F1010h):
movss  		xmm0,dword ptr [rcx+rdx+0B750E0h]  
add    		rcx,0Ch  
addss  		xmm0,dword ptr [rcx+rdx+35D4h]  
movss  		dword ptr [rsp+20h],xmm0  
movss  		xmm0,dword ptr [rcx+rdx+0B750D8h]  
mov    		eax,dword ptr [rsp+20h]  
mov    		dword ptr [rcx+rdx+16E6BD4h],eax  
addss  		xmm0,dword ptr [rcx+rdx+35D8h]  
movss  		dword ptr [rsp+24h],xmm0  
movss  		xmm0,dword ptr [rcx+rdx+0B750DCh]  
mov    		eax,dword ptr [rsp+24h]  
mov    		dword ptr [rcx+rdx+16E6BD8h],eax  
addss  		xmm0,dword ptr [rcx+rdx+35DCh]  
movss  		dword ptr [rsp+28h],xmm0  
mov    		eax,dword ptr [rsp+28h]  
mov    		dword ptr [rcx+rdx+16E6BDCh],eax  
cmp    		rcx,0B71B00h  
jne    		main+10h (13F7F1010h)   

 

在大型数组上进行迭代时,性能影响非常显着.

When iterating over large arrays, the performance hit is quite remarkable.

这是怎么回事?我以为编译器生成的赋值运算符应该看起来与我的手写赋值运算符相似...或者我弄错了吗?

What's going on here? I thought the compiler generated assignment operator should look similar, if not identical to my handwritten one...or am I mistaken?

 

推荐答案

我们可以在您的代码中找到三个寄存器,例如xmm0,1,2.因为您告诉编译器应在结构中分配多少成员.但是,如果您没有编译器,它将使用一个寄存器并迭代结构中的所有成员.

We can find three register in your codes, like xmm0,1,2. Because you tell the compiler how many members in your struct should be assigned. But If you did not the compiler, it will use one register and iterate all the members in your struct.

最后,这是调试版本程序集吗?发行版程序集和/O2怎么样?

At last, is this debug version assembly? how about the release version assembly, and with /O2?


这篇关于默认副本分配运算符的性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆