按位运算符,而不是与xor在分支中使用 [英] Bitwise operators, not vs xor use in branching
问题描述
在询问此SO问题
由于您的反汇编代码是为x86编写的,因此我需要验证@ AndonM.Coleman。 ,但值得指出的是,XOR将设置/清除零标志,而NOT不会(如果要执行逐位操作而不影响依赖于先前操作的标志的跳转条件,则有时会很有用)。现在,考虑到你不是直接编写程序集,你真的没有以有意义的方式访问这个标志,所以我怀疑这是支持另一个的原因。
他的评论让我很好奇如果下面的代码会产生相同的汇编指令
include< iostream>
int main()
{
unsigned int val = 0;
std :: cout<< 输入数值:;
std :: cin>> val;
if((val ^〜0U)== 0)
{
std :: cout< 值反转为零< std :: endl;
} else
{
std :: cout< 反转的值不为零< std :: endl;
}
if((〜val)== 0)
{
std :: cout< 值反转为零< std :: endl;
} else
{
std :: cout< 反转的值不为零< std :: endl;
}
return 0;
}
对于以下两个操作
if((val ^〜0U)== 0)
$ b b
和
if((〜val)== 0)
Visual Studio 2010中的未优化版本提供了以下反汇编:
if((val ^〜0U)== 0)
00AD1501 mov eax,dword ptr [val]
00AD1504 xor eax,0FFFFFFFFh
00AD1507 jne main + 86h(0AD1536h)
if((〜val)== 0)
00AD1561 mov eax,dword ptr [val]
00AD1564 not eax
00AD1566 test eax,eax
00AD1568 jne main + 0E7h(0AD1597h)
我的问题关于优化。是最好写
if((val ^〜0U)== 0)
或
if )== 0)
解决方案这取决于很多
如果编译器设置为针对大小进行优化(最小字节码),那么有时它会在看起来很奇怪的地方使用
XOR
。例如,X86使用的可变长度编码方案可以通过将XOR
设置为 0 的寄存器。使用MOV
指令。
考虑使用
XOR
:
if((val ^〜0U)== 0)/ * 3字节取反并测试)* /
< code> XOR eax,0FFFFFFFFh
现在,考虑使用
NOT
的代码:
if val)== 0)/ * 4字节取反和测试(x86)* /
;
NOT eax
编码为2字节指令,但不影响CPU标志。
TEST eax,eax
添加了一个额外的2字节,并且必须设置/清除零标志p>
NOT
也是一个简单的指令,但由于它不影响任何CPU标志,因此必须发出TEST
指令,然后使用它来进行分支,如代码所示。这实际上产生较大的字节码,因此,为了大小而优化的智能编译器将可能尝试避免使用NOT
。这些指令一起完成多少个周期在不同的CPU产生之间变化,智能编译器也会将其作为决策的因素,当被告知要优化速度时。
如果你没有编写手动调整的程序集,最好使用对人类最清楚的任何东西,并希望编译器能够聪明地选择不同的指令/调度等。根据编译时的要求优化大小/速度。编译器有一套聪明的启发式方法,用于选择和调度指令,他们比普通编码器更了解目标CPU架构。
如果你以后发现分支真的是一个瓶颈,没有更高级的方法解决问题,那么你可以做一些低级调优。然而,这是一个琐碎的事情,关注这些天,除非你的目标像低功耗嵌入式CPU或内存有限的设备。我唯一的地方,我已经挤出了足够的性能通过手调整让它值得的算法,从数据并行化,编译器不够聪明,不能有效地利用专门的指令集,如MMX / SSE。
After asking this SO question, I received a very interesting comment from @AndonM.Coleman that I had to verify.
Since your disassembled code is written for x86, it is worth pointing out that XOR will set/clear the Zero Flag whereas NOT will not (sometimes useful if you want to perform a bitwise operation without affecting jump conditions that rely on flags from previous operations). Now, considering you're not writing assembly directly, you really have no access to this flag in a meaningful way so I doubt this is the reason for favoring one over the other.
His comment got me curious if the following code would produce the same assembly instructions
#include <iostream> int main() { unsigned int val = 0; std::cout << "Enter a numeric value: "; std::cin >> val; if ( (val ^ ~0U) == 0) { std::cout << "Value inverted is zero" << std::endl; } else { std::cout << "Value inverted is not zero" << std::endl; } if ( (~val) == 0) { std::cout << "Value inverted is zero" << std::endl; } else { std::cout << "Value inverted is not zero" << std::endl; } return 0; }
For the following two operations
if ( (val ^ ~0U) == 0 )
and
if ( (~val) == 0 )
The not optimized build in Visual Studio 2010 gives the following disassembly:
if ( (val ^ ~0U) == 0) 00AD1501 mov eax,dword ptr [val] 00AD1504 xor eax,0FFFFFFFFh 00AD1507 jne main+86h (0AD1536h) if ( (~val) == 0) 00AD1561 mov eax,dword ptr [val] 00AD1564 not eax 00AD1566 test eax,eax 00AD1568 jne main+0E7h (0AD1597h)
My question regards optimisation. Is it better to write
if ( (val ^ ~0U) == 0)
or
if ( (~val) == 0)
解决方案This depends on a lot of things, but mostly what (if anything) you tell the compiler to optimize for.
If the compiler is set to optimize for size (smallest bytecode), then sometimes it will use
XOR
in seemingly strange places. For instance, the variable length encoding scheme X86 uses can set a register to 0 byXOR
'ing itself in fewer bytes of code than would be required using theMOV
instruction.Consider the code that uses
XOR
:if ( (val ^ ~0U) == 0 ) /* 3-bytes to negate and test (x86) */
XOR eax,0FFFFFFFFh
requires 3-bytes AND sets/clears the Zero Flag (ZF)Now, consider the code that uses
NOT
:if ( (~val) == 0) /* 4-bytes to negate and test (x86) */
NOT eax
is encoded into a 2-byte instruction, but does not affect CPU flags.
TEST eax,eax
adds an additional 2-bytes, and is necessary to set/clear the Zero Flag (ZF)
NOT
is also a simple instruction, but since it does not affect any CPU flags, you must issue aTEST
instruction afterwards to use it for branching as seen in your code. This actually produces larger bytecode, so a smart compiler set to optimize for size would probably try to avoid usingNOT
. How many cycles both of these instructions together take to complete varies between CPU generation, and a smart compiler would also factor this into its decision making when told to optimize for speed.
If you are not writing hand-tuned assembly, it is best to use whatever is clearest to a human and hope that the compiler is smart enough to choose different instructions/scheduling/etc. to optimize for size/speed as requested at compile-time. Compilers have a smart set of heuristics they use to choose and schedule instructions, they know more about the target CPU architecture than the average coder.If you find out later that this branch really is a bottleneck and there is no higher-level way around the problem, then you could do some low-level tuning. However, this is such a trivial thing to focus on these days unless you are targeting something like a low-power embedded CPU or memory limited device. The only places I have ever squeezed out enough performance by hand-tuning to make it worthwhile were in algorithms that benefited from data parallelism and where the compiler was not smart enough to effectively utilize specialized instruction sets like MMX/SSE.
这篇关于按位运算符,而不是与xor在分支中使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!