为什么MSVC执行此位测试之前发出无用MOVSX? [英] Why does MSVC emit a useless MOVSX before performing this Bit Test?

查看:169
本文介绍了为什么MSVC执行此位测试之前发出无用MOVSX?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编译如下code在MSVC 2013年,64位的发行版本, / O2 优化:

Compiling the following code in MSVC 2013, 64-bit release build, /O2 optimization:

while (*s == ' ' || *s == ',' || *s == '\r' || *s == '\n') {
    ++s;
}

我得到以下code - 其中有使用64位的寄存器作为与查找表BT (位测试)指令一个非常酷的优化。

I get the following code - which has a really cool optimization using a 64-bit register as a lookup table with the bt (bit test) instruction.

    mov     rcx, 17596481020928             ; 0000100100002400H
    npad    5
$LL82@myFunc:
    movzx   eax, BYTE PTR [rsi]
    cmp     al, 44                          ; 0000002cH
    ja      SHORT $LN81@myFunc
    movsx   rax, al
    bt      rcx, rax
    jae     SHORT $LN81@myFunc
    inc     rsi
    jmp     SHORT $LL82@myFunc
$LN81@myFunc:
    ; code after loop...

不过,我的问题是:什么是的目的MOVSX獭兔,人首支后

首先,我们从字符串一个字节加载到 RAX 和零扩展它:

First we load a byte from the string into rax and zero-extend it:

movzx eax, BYTE PTR [rsi]

然后 CMP / JA 对执行符号 <$ C之间的比较$ C>人和 44 ,和树枝前锋,如果更大。

Then the cmp/ja pair performs an unsigned comparison between al and 44, and branches forwards if al is greater.

所以,现在,我们知道 0℃; =&人LT;在无符号数= 44 。因此,最高位不可能设置!

So now, we know 0 <= al <= 44 in unsigned numbers. Therefore, the highest bit of al could not possibly be set!

不过,接下来的指令是 MOVSX RAX,人。这是一个符号扩展的举动。但自从:

Nonetheless, the next instruction is movsx rax, al. This is a sign-extended move. But since:


  • 是最低字节 RAX

  • 我们已经知道了其他7个字节 RAX 的归零

  • 我们只是证明了的最高位不可能设置

  • al is the lowest byte of rax
  • we already know the other 7 bytes of rax are zeroed
  • we just proved that al's highest bit could not possibly be set

MOVSX 必须是一个空操作。

为什么MSVC呢?我假设它不适合填充,因为在这种情况下,另一个 NPAD 将使含义更加清晰。难道刷新数据依赖的东西?

Why does MSVC do it? I'm assuming it's not for padding, since in that case another npad would make the meaning clearer. Is it flushing data dependencies or something?

(顺便说一句,这个 BT 优化真让我开心一些有趣的事实:它0.6X 4 CMP的运行时间 / 你所期望的JE 对,它的办法 strspn 的std ::字符串:: find_first_not_of ,它只能在64位构建即使感兴趣的字符有32下发生的值。)

(By the way, this bt optimization really makes me happy. Some interesting facts: it runs in 0.6x the time of the 4 cmp/je pairs you might expect, it's way faster than strspn or std::string::find_first_not_of, and it only happens in 64-bit builds even if the characters of interest have values under 32.)

推荐答案

您一定会认识到,这种优化是通过在看了该模式优化非常具体的code生产。刚刚所述位掩码的生成给它的路程。是的,好的技巧。

You surely recognize that this optimization was produced by very specific code in the optimizer that looked for the pattern. Just the generation of the bit-mask gives it away. Yes, nice trick.

有两种基本的codeGEN这里的情况。第一个是更普遍的一物,何处(charmax - - Charmin牌LT = 64),但charmax> = 64的优化需要从你看到生成不同的code,它需要减去Charmin牌。这个版本做的的有MOVSX指令。您可以通过替换 * S =='' * S =='A'看看吧。

There are two basic codegen cases here. First one is the more universal one, where (charmax - charmin <= 64) but charmax >= 64. The optimizer needs to generate different code from what you saw, it needs to subtract charmin. That version does not have the MOVSX instruction. You can see it by replacing *s == ' ' by *s == 'A'.

再有就是,你测试的特殊情况下,所有字符codeS测试碰巧&LT; 64.微软的程序员在他的code没有解决这个问题,他确信不产生一个愚蠢SUB EAX,0指令。但是忽略了产生MOVSX是没有必要的。只检查在一般情况下的最佳code肯定错过。和一般的函数调用中的code,所以容易被忽视,注意当您使用/ J编译指令变为如何MOVZX。否则很容易被视为必要的,但是本身不接受一个8位寄存器作为第二个操作数使AL寄存器加载BT指令是不够的。

Then there's the special case that you tested, all character codes to test happen to be < 64. The Microsoft programmer did deal with this in his code, he made sure not to generate a silly SUB EAX,0 instruction. But overlooked that generating the MOVSX wasn't necessary. Surely missed by only checking for optimal code in the general case. And a general function call in the code, so easy to overlook, note how the instruction changes to MOVZX when you compile with /J. Otherwise easily deemed necessary, there is no BT instruction that takes an 8-bit register as the 2nd operand so the AL register load isn't enough by itself.

有可能是一个假设性的优化后优化,优化优化器生成的优化code。并决定保留MOVSX以提高执行超标。我严重怀疑它的存在。

There could be a hypothetical post-optimizer optimizer that optimizes the optimized code generated by the optimizer. And decided to keep MOVSX to improve superscalar execution. I seriously doubt it exists.

这篇关于为什么MSVC执行此位测试之前发出无用MOVSX?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆