为什么memcmp(a,b,4)有时仅针对uint32比较进行了优化? [英] Why is memcmp(a, b, 4) only sometimes optimized to a uint32 comparison?

查看:180
本文介绍了为什么memcmp(a,b,4)有时仅针对uint32比较进行了优化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下代码:

  #include< string.h> 
$ b int int equal4(const char * a,const char * b)
{
return memcmp(a,b,4)== 0;
}

int less4(const char * a,const char * b)
{
return memcmp(a,b,4)< 0;

$ / code>

x86_64上的GCC 7引入了对第一种情况的优化(Clang已经完成了它很长一段时间):

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ c $ mov $,
sete al
movzx eax,al

但第二种情况仍然称为 memcmp()

  sub rsp,8 
mov edx ,4
call memcmp
add rsp,8
shr eax,31

第二种情况可以应用类似的优化吗?什么是最好的程序集,是否有明确的理由说明为什么它没有被完成(通过GCC或Clang)?



在Godbolt的编译器资源管理器中查看:< a href =https://godbolt.org/g/jv8fcf =noreferrer> https://godbolt.org/g/jv8fcf

memcmp 优化为单个DWORD比较是无效的。
$ b当 memcmp 比较单个字节时,它将从低地址字节转换为高地址字节,而不管平台。


为了让 memcmp 返回零,所有四个字节必须相同。因此,比较顺序并不重要。因此,DWORD优化是有效的,因为您忽略了结果的符号。



但是,当 memcmp 返回a正数,字节顺序。因此,使用32位DWORD比较来实现相同的比较需要一个特定的字节顺序:平台必须是big-endian,否则比较的结果将不正确。

Given this code:

#include <string.h>

int equal4(const char* a, const char* b)
{
    return memcmp(a, b, 4) == 0;
}

int less4(const char* a, const char* b)
{
    return memcmp(a, b, 4) < 0;
}

GCC 7 on x86_64 introduced an optimization for the first case (Clang has done it for a long time):

    mov     eax, DWORD PTR [rsi]
    cmp     DWORD PTR [rdi], eax
    sete    al
    movzx   eax, al

But the second case still calls memcmp():

    sub     rsp, 8
    mov     edx, 4
    call    memcmp
    add     rsp, 8
    shr     eax, 31

Could a similar optimization be applied to the second case? What's the best assembly for this, and is there any clear reason why it isn't being done (by GCC or Clang)?

See it on Godbolt's Compiler Explorer: https://godbolt.org/g/jv8fcf

解决方案

If you generate code for a little-endian platform, optimizing four-byte memcmp for inequality to a single DWORD comparison is invalid.

When memcmp compares individual bytes it goes from low-addressed bytes to high-addressed bytes, regardless of the platform.

In order for memcmp to return zero all four bytes must be identical. Hence, the order of comparison does not matter. Therefore, DWORD optimization is valid, because you ignore the sign of the result.

However, when memcmp returns a positive number, byte ordering matters. Hence, implementing the same comparison using 32-bit DWORD comparison requires a specific endianness: the platform must be big-endian, otherwise the result of comparison would be incorrect.

这篇关于为什么memcmp(a,b,4)有时仅针对uint32比较进行了优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆