为什么用字面量而不是反转(按位不是) [英] Why use xor with a literal instead of inversion (bitwise not)

查看:197
本文介绍了为什么用字面量而不是反转(按位不是)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了这个CRC32代码,好奇为什么作者会选择使用

  crc = crc ^〜0U; 

而不是

  crc =〜crc; 

据我所知,它们是等价的。



我甚至在Visual Studio 2010中拆解了两个版本。



未优化的构建

  crc = crc ^〜0U; 
009D13F4 mov eax,dword ptr [crc]
009D13F7 xor eax,0FFFFFFFFh
009D13FA mov dword ptr [crc],eax

crc =〜crc;
011C13F4 mov eax,dword ptr [crc]
011C13F7 not eax
011C13F9 mov dword ptr [crc],eax

我也不能通过考虑每个指令占用的周期数来证明代码,因为两个指令都需要1个周期来完成。事实上, xor 可能会因为从某处加载文字而受到惩罚,但我不确定。





编辑1:这是一个非常简单的方法,



由于我刚刚意识到 crc 变量的类型可能很重要包括整个代码(减去查找表,方式太大)在这里,所以你不必跟随链接。

  uint32_t crc32(uint32_t crc,const void * buf,size_t size)
{
const uint8_t * p;

p = buf; $ b $ c crc = crc ^〜0U;

while(size--)
{
crc = crc32_tab [(crc ^ * p ++)& 0xFF] ^(crc>> 8);
}

return crc ^〜0U;
}

编辑2:



strong>优化版本:



请注意,整个函数(包含在最后一次编辑中)已内联。

  // crc = crc ^〜0U; 
zeroCrc = 0;
zeroCrc = crc32(zeroCrc,zeroBufferSmall,sizeof(zeroBufferSmall));
00971148 mov ecx,14h
0097114D lea edx,[ebp-40h]
00971150或eax,0FFFFFFFFh
00971153 movzx esi,byte ptr [edx]
00971156 xor esi,eax
00971158和esi,0FFh
0097115E shr eax,8
00971161 xor eax,dword ptr ___ defaultmatherr + 4(973018h)[esi * 4]
00971168 add edx, ebx
0097116A sub ecx,ebx
0097116C jne main + 153h(971153h)
0097116E not eax
00971170 mov ebx,eax

// crc = 〜crc;
zeroCrc = 0;
zeroCrc = crc32(zeroCrc,zeroBufferSmall,sizeof(zeroBufferSmall));
01251148 mov ecx,14h
0125114D lea edx,[ebp-40h]
01251150或eax,0FFFFFFFFh
01251153 movzx esi,byte ptr [edx]
01251156 xor esi,eax
01251158和esi,0FFh
0125115E shr eax,8
01251161 xor eax,dword ptr ___ defaultmatherr + 4(1253018h)[esi * 4]
01251168 add edx, ebx
0125116A sub ecx,ebx
0125116C jne main + 153h(1251153h)
0125116E not eax
01251170 mov ebx,eax


解决方案

没有人提到的东西;如果该代码是在具有16位 unsigned int 的机器上编译的,则这两个代码片段不同



crc 指定为32位无符号整数类型。 〜crc 将反转所有位,但如果 unsigned int 为16位,则 crc = crc ^ 〜0U 只会反转低16位。



我不知道CRC算法是否是有意的bug,也许hivert可以澄清;虽然看着OP发布的示例代码,但它确实对随后的循环产生了影响。



注意。很抱歉,发布这是一个答案,因为它不是一个答案,但它太大,不适合在评论:)


I have come across this CRC32 code and was curious why the author would choose to use

crc = crc ^ ~0U;

instead of

crc = ~crc;

As far as I can tell, they are equivalent.

I have even disassembled the two versions in Visual Studio 2010.

Not optimized build:

    crc = crc ^ ~0U;
009D13F4  mov         eax,dword ptr [crc]  
009D13F7  xor         eax,0FFFFFFFFh  
009D13FA  mov         dword ptr [crc],eax 

    crc = ~crc;
011C13F4  mov         eax,dword ptr [crc]  
011C13F7  not         eax  
011C13F9  mov         dword ptr [crc],eax  

I also cannot justify the code by thinking about the number of cycles that each instruction takes since both should be taking 1 cycle to complete. In fact, the xor might have a penalty by having to load the literal from somewhere, though I am not certain of this.

So I'm left thinking that it is possibly just a preferred way to describe the algorithm, rather than an optimization... Would that be correct?

Edit 1:

Since I just realized that the type of the crc variable is probably important to mention I am including the whole code (less the lookup table, way too big) here so you don't have to follow the link.

uint32_t crc32(uint32_t crc, const void *buf, size_t size)
{
    const uint8_t *p;

    p = buf;
    crc = crc ^ ~0U;

    while (size--)
    {
        crc = crc32_tab[(crc ^ *p++) & 0xFF] ^ (crc >> 8);
    }

    return crc ^ ~0U;
}

Edit 2:

Since someone has brought up the fact that an optimized build would be of interest, I have made one and included it below.

Optimized build:

Do note that the whole function (included in the last edit below) was inlined.

// crc = crc ^ ~0U;
    zeroCrc = 0;
    zeroCrc = crc32(zeroCrc, zeroBufferSmall, sizeof(zeroBufferSmall));
00971148  mov         ecx,14h  
0097114D  lea         edx,[ebp-40h]  
00971150  or          eax,0FFFFFFFFh  
00971153  movzx       esi,byte ptr [edx]  
00971156  xor         esi,eax  
00971158  and         esi,0FFh  
0097115E  shr         eax,8  
00971161  xor         eax,dword ptr ___defaultmatherr+4 (973018h)[esi*4]  
00971168  add         edx,ebx  
0097116A  sub         ecx,ebx  
0097116C  jne         main+153h (971153h)  
0097116E  not         eax  
00971170  mov         ebx,eax  

// crc = ~crc;
    zeroCrc = 0;
    zeroCrc = crc32(zeroCrc, zeroBufferSmall, sizeof(zeroBufferSmall));
01251148  mov         ecx,14h  
0125114D  lea         edx,[ebp-40h]  
01251150  or          eax,0FFFFFFFFh  
01251153  movzx       esi,byte ptr [edx]  
01251156  xor         esi,eax  
01251158  and         esi,0FFh  
0125115E  shr         eax,8  
01251161  xor         eax,dword ptr ___defaultmatherr+4 (1253018h)[esi*4]  
01251168  add         edx,ebx  
0125116A  sub         ecx,ebx  
0125116C  jne         main+153h (1251153h)  
0125116E  not         eax  
01251170  mov         ebx,eax  

解决方案

Something nobody's mentioned yet; if this code is being compiled on a machine with 16 bit unsigned int then these two code snippets are different.

crc is specified as a 32-bit unsigned integral type. ~crc will invert all bits, but if unsigned int is 16bit then crc = crc ^ ~0U will only invert the lower 16 bits.

I don't know enough about the CRC algorithm to know whether this is intentional or a bug, perhaps hivert can clarify; although looking at the sample code posted by OP, it certainly does make a difference to the loop that follows.

NB. Sorry for posting this as an "answer" because it isn't an answer, but it's too big to just fit in a comment :)

这篇关于为什么用字面量而不是反转(按位不是)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆