如何让 GCC 在没有内置函数的情况下为大端存储生成 bswap 指令? [英] How to make GCC generate bswap instruction for big endian store without builtins?
问题描述
更新:此问题已在 GCC 8.1 中修复.
Update: This was fixed in GCC 8.1.
我正在研究一个以大端格式将 64 位值存储到内存中的函数.我希望我可以编写可在小端和大端平台上运行的可移植 C99 代码,并让现代 x86 编译器自动生成 bswap
指令而无需任何内置或内在函数.所以我从以下功能开始:
I'm working on a function that stores a 64-bit value into memory in big endian format. I was hoping that I could write portable C99 code that works on both little and big endian platforms and have modern x86 compilers generate a bswap
instruction automatically without any builtins or intrinsics. So I started with the following function:
#include <stdint.h>
void
encode_bigend_u64(uint64_t value, void *vdest) {
uint8_t *bytes = (uint8_t *)vdest;
bytes[0] = value >> 56;
bytes[1] = value >> 48;
bytes[2] = value >> 40;
bytes[3] = value >> 32;
bytes[4] = value >> 24;
bytes[5] = value >> 16;
bytes[6] = value >> 8;
bytes[7] = value;
}
这适用于将此函数编译为:
This works fine for clang which compiles this function to:
bswapq %rdi
movq %rdi, (%rsi)
retq
但是 GCC 未能检测到字节交换.我尝试了几种不同的方法,但它们只会让事情变得更糟.我知道 GCC 可以使用按位与、移位和按位或来检测字节交换,但是为什么在写入字节时它不起作用?
But GCC fails to detect the byte swap. I tried a couple of different approaches but they only made things worse. I know that GCC can detect byte swaps using bitwise-and, shift, and bitwise-or, but why doesn't it work when writing bytes?
我发现了相应的 GCC 错误.
推荐答案
这似乎可以解决问题:
void encode_bigend_u64(uint64_t value, void* dest)
{
value =
((value & 0xFF00000000000000u) >> 56u) |
((value & 0x00FF000000000000u) >> 40u) |
((value & 0x0000FF0000000000u) >> 24u) |
((value & 0x000000FF00000000u) >> 8u) |
((value & 0x00000000FF000000u) << 8u) |
((value & 0x0000000000FF0000u) << 24u) |
((value & 0x000000000000FF00u) << 40u) |
((value & 0x00000000000000FFu) << 56u);
memcpy(dest, &value, sizeof(uint64_t));
}
clang 与 -O3
encode_bigend_u64(unsigned long, void*):
bswapq %rdi
movq %rdi, (%rsi)
retq
clang 与 -O3 -march=native
encode_bigend_u64(unsigned long, void*):
movbeq %rdi, (%rsi)
retq
gcc 与 -O3
encode_bigend_u64(unsigned long, void*):
bswap %rdi
movq %rdi, (%rsi)
ret
gcc 与 -O3 -march=native
encode_bigend_u64(unsigned long, void*):
movbe %rdi, (%rsi)
ret
<小时>
在 http://gcc.godbolt.org/ (所以我不知道到底是什么处理器(对于 -march=native
),但我强烈怀疑最近的 x86_64 处理器)
Tested with clang 3.8.0 and gcc 5.3.0 on http://gcc.godbolt.org/ (so I don't know exactly what processor is underneath (for the -march=native
) but I strongly suspect a recent x86_64 processor)
如果您想要一个也适用于大端架构的函数,您可以使用 here 检测系统的字节序并添加if
.联合和指针转换版本都可以工作,并由 gcc
和 clang
进行优化,从而产生完全相同的程序集(没有分支).godebolt 的完整代码:
If you want a function which works for big endian architectures too, you can use the answers from here to detect the endianness of the system and add an if
. Both the union and the pointer casts versions work and are optimized by both gcc
and clang
resulting in the exact same assembly (no branches). Full code on godebolt:
int is_big_endian(void)
{
union {
uint32_t i;
char c[4];
} bint = {0x01020304};
return bint.c[0] == 1;
}
void encode_bigend_u64_union(uint64_t value, void* dest)
{
if (!is_big_endian())
//...
memcpy(dest, &value, sizeof(uint64_t));
}
<小时>
英特尔® 64 和 IA-32 架构指令集参考(3-542 卷 2A):
Intel® 64 and IA-32 Architectures Instruction Set Reference (3-542 Vol. 2A):
MOVBE——交换字节后移动数据
对从第二个复制的数据执行字节交换操作操作数(源操作数)并将结果存储在第一个操作数中(目标操作数).[...]
Performs a byte swap operation on the data copied from the second operand (source operand) and store the result in the first operand (destination operand). [...]
MOVBE 指令用于交换读取的字节从内存中或在写入内存时;从而为将 little-endian 值转换为 big-endian 格式,反之亦然.
The MOVBE instruction is provided for swapping the bytes on a read from memory or on a write to memory; thus providing support for converting little-endian values to big-endian format and vice versa.
这篇关于如何让 GCC 在没有内置函数的情况下为大端存储生成 bswap 指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!