在 C# 中,x+=y 和 x=x+y(x 和 y 都是简单类型)之间有什么性能差异? [英] Is there any performance difference between x+=y and x=x+y (x and y are both simple types) in C#?
问题描述
在 C/C++ 中,
<块引用>复合赋值运算符将简单赋值运算符与另一个二元运算符组合在一起.复合赋值运算符执行附加运算符指定的操作,然后将结果分配给左操作数.例如,复合赋值表达式如
expression1 += expression2
可以理解为
expression1 = expression1 + expression2
然而,复合赋值表达式并不等同于扩展版本,因为复合赋值表达式只计算 expression1 一次,而扩展版本计算 expression1 两次:在加法运算中和在赋值操作.
(引自 Microsoft Docs)
例如:
- 对于
i+=2;
,i
将被直接修改,无需创建任何新对象. - 对于
i=i+2;
,首先会创建i
的副本.复制的将被修改,然后被分配回i
.
i_copied = i;i_copyed += 2;i = i_copyed;
如果没有编译器的任何优化,第二种方法将构造一个无用的实例,从而降低性能.
在 C# 中,不允许像 +=
这样的运算符被重载.以及所有 简单类型 像 int
或 double
被声明为 readonly struct
(这是否意味着 C# 中的所有结构实际上都是不可变的?).>
我想知道在 C# 中,是否有某种表达式来强制对象被直接修改(至少对于简单类型而言),而不会创建任何无用的实例.>
此外,如果没有来自构造函数的副作用,C# 编译器是否有可能按预期将表达式 x=x+y
优化为 x+=y
和解构器.
C#
当您将 C# 编译为 .NET 程序集时,代码采用 MSIL(Microsoft 中间语言).这允许代码是可移植的..NET 运行时将编译它 JIT 以执行.
MSIL 是一种堆栈语言.它不知道目标硬件的详细信息(例如 CPU 有多少个寄存器).只有一种方法可以编写该加法:
ldloc.0ldloc.1添加stloc.0
加载堆栈中的第一个本地,加载第二个,添加※,从堆栈中设置第一个本地.
※: add
从堆栈中弹出两个元素,将它们相加,并将结果压回到堆栈中.
因此,x=x+y
和 x+=y
将产生相同的代码.
当然,之后会发生一些优化.JIT 编译器会将其转换为实际的机器代码.
这是我在 SharpLab 中看到的:
mov ecx, [ebp-4]添加 ecx, [ebp-8]mov [ebp-4], ecx
所以,我们把[ebp-4]
复制到ecx
中,在里面加上[ebp-8]
,然后复制ecx
回到 [ebp-4]
.
那么...寄存器 ecx
是一个无用的实例吗?
嗯,那是 SharpLab,那是 JIT.理论上,不同的编译器可以将代码转换为不同平台上的不同内容.
您可以将 .NET 代码 AOT 编译为本机映像,这将更积极地进行优化.虽然,我不知道你将如何改进一个简单的添加.哦,我知道,它可能会看到您没有使用此值并将其删除,或者可能会看到您总是添加相同的值并用常量替换它.
可能值得注意的是,现代 .NET JIT 能够在执行期间继续优化代码(它会迅速生成优化不佳的代码本机版本,稍后 - 一旦准备就绪 - 将其替换为更好的版本).这个决定来自这样一个事实,即在 JIT 运行时上,性能取决于创建本机代码所需的时间和运行本机代码所需的时间.
<小时>C++
让我们看看 C++ 是做什么的.这是我看到的 x = x + y
和 x += y
使用 godbolt(默认设置※):
mov eax, DWORD PTR [rbp-8]添加 DWORD PTR [rbp-4], eaxmov eax, DWORD PTR [rbp-4]
指令 mov
、add
、mov
与我们从 SharpLab 获得的指令相匹配,但寄存器选择不同.
※: x86-64 gcc 9.3 with -g -o/tmp/compiler-explorer-compiler2020424-22672-17cap6k.bjoj/output.s -masm=intel -S -fdiagnostics-color=always/tmp/compiler-explorer-compiler2020424-22672-17cap6k.bjoj/example.cpp
添加编译器选项 -O
使代码消失.这是有道理的,因为我没有使用它.
In C/C++,
The compound-assignment operators combine the simple-assignment operator with another binary operator. Compound-assignment operators perform the operation specified by the additional operator, then assign the result to the left operand. For example, a compound-assignment expression such as
expression1 += expression2
can be understood as
expression1 = expression1 + expression2
However, the compound-assignment expression is not equivalent to the expanded version because the compound-assignment expression evaluates expression1 only once, while the expanded version evaluates expression1 twice: in the addition operation and in the assignment operation.
(Quoted from Microsoft Docs)
For example:
- For
i+=2;
,i
would be modified directly without any new objects being created. - For
i=i+2;
, a copy ofi
would be created at first. The copied one would be modified and then be assigned back toi
.
i_copied = i;
i_copied += 2;
i = i_copied;
Without any optimizations from compiler, the second method will construct a useless instance, which degrades the performance.
In C#, operators like +=
are not permitted to be overloaded. And all simple types like int
or double
are declared as readonly struct
(Does that mean all of structs in C# are immutable in truth?).
I wonder in C#, is there a certain expression to force object be modified (at least for simple types) directly, without any useless instances being created.
And also, is it possible the C#-compiler optimizes the expression x=x+y
to x+=y
as expected, if there's no side-effect from constructors and deconstructors.
C#
When you compile C# into a .NET assembly, the code is in MSIL (Microsoft Intermediate Language). This allows the code to be portable. The .NET Runtime will compile it JIT for execution.
MSIL is an stack language. It does not know details of the target hardware (such as how many registers does the CPU have). There is only one way to write that addition:
ldloc.0
ldloc.1
add
stloc.0
Load the first local in the stack, load the second, add※ them, set the first local from the stack.
※: add
pops two elements from the stack, adds them, and pushes the result back into the stack.
Thus, both x=x+y
and x+=y
will yield the same code.
Of course, there are optimizations that happens after. The JIT compiler will convert that into actual machine code.
This is what I see with SharpLab:
mov ecx, [ebp-4]
add ecx, [ebp-8]
mov [ebp-4], ecx
So, we copy [ebp-4]
into ecx
, add [ebp-8]
to it, and then copy ecx
back to [ebp-4]
.
So... Is the register ecx
a useless instance?
Well, that is SharpLab, and that is JIT. A different compiler could in theory convert the code into something different on a different platform.
You can compile .NET code AOT to a native image, which will be more aggressive with optimizations. Although, I do not see how you are going to improve upon a simple addition. Oh, I know, it might see that you do not use this value and remove it, or may see that you are always adding the same values and replace it with a constant.
It might be worth noting that modern .NET JIT is able to continue to optimize the code during execution (it will quickly make a poorly optimized native version of the code, and later - once it is ready - replace it with a better version). This decision comes from the fact that on a JIT runtime, the performance depends on both the time it takes to create the native code and the time that native code takes to run.
C++
Let us see what C++ does. This is what I see for both x = x + y
and x += y
using godbolt (default settings※):
mov eax, DWORD PTR [rbp-8]
add DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-4]
The instructions mov
, add
, mov
match the ones we got from SharpLab, with a different choice of registers.
※: x86-64 gcc 9.3 with -g -o /tmp/compiler-explorer-compiler2020424-22672-17cap6k.bjoj/output.s -masm=intel -S -fdiagnostics-color=always /tmp/compiler-explorer-compiler2020424-22672-17cap6k.bjoj/example.cpp
Adding the compiler option -O
made the code go away. Which makes sense because I was not using it.
这篇关于在 C# 中,x+=y 和 x=x+y(x 和 y 都是简单类型)之间有什么性能差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!