为什么memcmp(a,b,4)有时仅针对uint32比较进行了优化? [英] Why is memcmp(a, b, 4) only sometimes optimized to a uint32 comparison?
问题描述
给出以下代码:
#include< string.h>
$ b int int equal4(const char * a,const char * b)
{
return memcmp(a,b,4)== 0;
}
int less4(const char * a,const char * b)
{
return memcmp(a,b,4)< 0;
$ / code>
x86_64上的GCC 7引入了对第一种情况的优化(Clang已经完成了它很长一段时间):
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ c $ mov $,
sete al
movzx eax,al
但第二种情况仍然称为 memcmp()
:
sub rsp,8
mov edx ,4
call memcmp
add rsp,8
shr eax,31
第二种情况可以应用类似的优化吗?什么是最好的程序集,是否有明确的理由说明为什么它没有被完成(通过GCC或Clang)?
在Godbolt的编译器资源管理器中查看:< a href =https://godbolt.org/g/jv8fcf =noreferrer> https://godbolt.org/g/jv8fcf
$ b当
memcmp
比较单个字节时,它将从低地址字节转换为高地址字节,而不管平台。 为了让 memcmp
返回零,所有四个字节必须相同。因此,比较顺序并不重要。因此,DWORD优化是有效的,因为您忽略了结果的符号。
但是,当 memcmp
返回a正数,字节顺序。因此,使用32位DWORD比较来实现相同的比较需要一个特定的字节顺序:平台必须是big-endian,否则比较的结果将不正确。
Given this code:
#include <string.h>
int equal4(const char* a, const char* b)
{
return memcmp(a, b, 4) == 0;
}
int less4(const char* a, const char* b)
{
return memcmp(a, b, 4) < 0;
}
GCC 7 on x86_64 introduced an optimization for the first case (Clang has done it for a long time):
mov eax, DWORD PTR [rsi]
cmp DWORD PTR [rdi], eax
sete al
movzx eax, al
But the second case still calls memcmp()
:
sub rsp, 8
mov edx, 4
call memcmp
add rsp, 8
shr eax, 31
Could a similar optimization be applied to the second case? What's the best assembly for this, and is there any clear reason why it isn't being done (by GCC or Clang)?
See it on Godbolt's Compiler Explorer: https://godbolt.org/g/jv8fcf
If you generate code for a little-endian platform, optimizing four-byte memcmp
for inequality to a single DWORD comparison is invalid.
When memcmp
compares individual bytes it goes from low-addressed bytes to high-addressed bytes, regardless of the platform.
In order for memcmp
to return zero all four bytes must be identical. Hence, the order of comparison does not matter. Therefore, DWORD optimization is valid, because you ignore the sign of the result.
However, when memcmp
returns a positive number, byte ordering matters. Hence, implementing the same comparison using 32-bit DWORD comparison requires a specific endianness: the platform must be big-endian, otherwise the result of comparison would be incorrect.
这篇关于为什么memcmp(a,b,4)有时仅针对uint32比较进行了优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!