为什么MSVC执行此位测试之前发出无用MOVSX? [英] Why does MSVC emit a useless MOVSX before performing this Bit Test?
问题描述
编译如下code在MSVC 2013年,64位的发行版本, / O2
优化:
Compiling the following code in MSVC 2013, 64-bit release build, /O2
optimization:
while (*s == ' ' || *s == ',' || *s == '\r' || *s == '\n') {
++s;
}
我得到以下code - 其中有使用64位的寄存器作为与查找表BT
(位测试)指令一个非常酷的优化。
I get the following code - which has a really cool optimization using a 64-bit register as a lookup table with the bt
(bit test) instruction.
mov rcx, 17596481020928 ; 0000100100002400H
npad 5
$LL82@myFunc:
movzx eax, BYTE PTR [rsi]
cmp al, 44 ; 0000002cH
ja SHORT $LN81@myFunc
movsx rax, al
bt rcx, rax
jae SHORT $LN81@myFunc
inc rsi
jmp SHORT $LL82@myFunc
$LN81@myFunc:
; code after loop...
不过,我的问题是:什么是的目的MOVSX獭兔,人
首支后
首先,我们从字符串一个字节加载到 RAX
和零扩展它:
First we load a byte from the string into rax
and zero-extend it:
movzx eax, BYTE PTR [rsi]
然后 CMP
/ JA
对执行符号 <$ C之间的比较$ C>人和 44
,和树枝前锋,如果人
更大。
Then the cmp
/ja
pair performs an unsigned comparison between al
and 44
, and branches forwards if al
is greater.
所以,现在,我们知道 0℃; =&人LT;在无符号数= 44
。因此,最高位人
不可能设置!
So now, we know 0 <= al <= 44
in unsigned numbers. Therefore, the highest bit of al
could not possibly be set!
不过,接下来的指令是 MOVSX RAX,人
。这是一个符号扩展的举动。但自从:
Nonetheless, the next instruction is movsx rax, al
. This is a sign-extended move. But since:
-
人
是最低字节RAX
- 我们已经知道了其他7个字节
RAX
的归零 - 我们只是证明了
人
的最高位不可能设置
al
is the lowest byte ofrax
- we already know the other 7 bytes of
rax
are zeroed - we just proved that
al
's highest bit could not possibly be set
本 MOVSX
必须是一个空操作。
为什么MSVC呢?我假设它不适合填充,因为在这种情况下,另一个 NPAD
将使含义更加清晰。难道刷新数据依赖的东西?
Why does MSVC do it? I'm assuming it's not for padding, since in that case another npad
would make the meaning clearer. Is it flushing data dependencies or something?
(顺便说一句,这个 BT
优化真让我开心一些有趣的事实:它0.6X 4 CMP的运行时间
/ 你所期望的JE
对,它的办法比 strspn $ C $更快C>或
的std ::字符串:: find_first_not_of
,它只能在64位构建即使感兴趣的字符有32下发生的值。)
(By the way, this bt
optimization really makes me happy. Some interesting facts: it runs in 0.6x the time of the 4 cmp
/je
pairs you might expect, it's way faster than strspn
or std::string::find_first_not_of
, and it only happens in 64-bit builds even if the characters of interest have values under 32.)
推荐答案
您一定会认识到,这种优化是通过在看了该模式优化非常具体的code生产。刚刚所述位掩码的生成给它的路程。是的,好的技巧。
You surely recognize that this optimization was produced by very specific code in the optimizer that looked for the pattern. Just the generation of the bit-mask gives it away. Yes, nice trick.
有两种基本的codeGEN这里的情况。第一个是更普遍的一物,何处(charmax - - Charmin牌LT = 64),但charmax> = 64的优化需要从你看到生成不同的code,它需要减去Charmin牌。这个版本做的不的有MOVSX指令。您可以通过替换 * S ==''
按 * S =='A'
看看吧。
There are two basic codegen cases here. First one is the more universal one, where (charmax - charmin <= 64) but charmax >= 64. The optimizer needs to generate different code from what you saw, it needs to subtract charmin. That version does not have the MOVSX instruction. You can see it by replacing *s == ' '
by *s == 'A'
.
再有就是,你测试的特殊情况下,所有字符codeS测试碰巧&LT; 64.微软的程序员在他的code没有解决这个问题,他确信不产生一个愚蠢SUB EAX,0指令。但是忽略了产生MOVSX是没有必要的。只检查在一般情况下的最佳code肯定错过。和一般的函数调用中的code,所以容易被忽视,注意当您使用/ J编译指令变为如何MOVZX。否则很容易被视为必要的,但是本身不接受一个8位寄存器作为第二个操作数使AL寄存器加载BT指令是不够的。
Then there's the special case that you tested, all character codes to test happen to be < 64. The Microsoft programmer did deal with this in his code, he made sure not to generate a silly SUB EAX,0 instruction. But overlooked that generating the MOVSX wasn't necessary. Surely missed by only checking for optimal code in the general case. And a general function call in the code, so easy to overlook, note how the instruction changes to MOVZX when you compile with /J. Otherwise easily deemed necessary, there is no BT instruction that takes an 8-bit register as the 2nd operand so the AL register load isn't enough by itself.
有可能是一个假设性的优化后优化,优化优化器生成的优化code。并决定保留MOVSX以提高执行超标。我严重怀疑它的存在。
There could be a hypothetical post-optimizer optimizer that optimizes the optimized code generated by the optimizer. And decided to keep MOVSX to improve superscalar execution. I seriously doubt it exists.
这篇关于为什么MSVC执行此位测试之前发出无用MOVSX?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!