如何让GCC产生无内建的大端存储BSWAP指令? [英] How to make GCC generate bswap instruction for big endian store without builtins?
问题描述
我工作的一个存储64位的值到大端格式存储功能。我希望我能写的便携式C99 code这两个小和大型平台的作品,并有现代的x86编译器生成一个 BSWAP
指令自动没有任何内建或内在。于是我开始用下面的函数:
I'm working on a function that stores a 64-bit value into memory in big endian format. I was hoping that I could write portable C99 code that works on both little and big endian platforms and have modern x86 compilers generate a bswap
instruction automatically without any builtins or intrinsics. So I started with the following function:
#include <stdint.h>
void
encode_bigend_u64(uint64_t value, void *vdest) {
uint64_t bigend;
uint8_t *bytes = (uint8_t*)&bigend;
bytes[0] = value >> 56;
bytes[1] = value >> 48;
bytes[2] = value >> 40;
bytes[3] = value >> 32;
bytes[4] = value >> 24;
bytes[5] = value >> 16;
bytes[6] = value >> 8;
bytes[7] = value;
uint64_t *dest = (uint64_t*)vdest;
*dest = bigend;
}
这对于铛其编译此功能工作正常:
This works fine for clang which compiles this function to:
bswapq %rdi
movq %rdi, (%rsi)
retq
但GCC <一个href=\"https://gcc.godbolt.org/#compilers:!((compiler:g492,options:'-O2',sourcez:MQSwdgxgNgrgJgUwAQB4DOAXO4MDoAWAfAFDEBuA9iHMQpBYgPoBGIA5nXIzAGwAsAChg5%2BjDEjIBDWAgA0EqnCQAqMokwBKJAG9iSfUmFgMo8aw5g4Abj0GjGABxiVzAJ4YEaJAF4kQnE4YyhoAZOacNgZIbh5oANoADAC6PhLSMMiEhEgArDyRBjGecQCMKb5SMkhZSHwOBfpF8QBM5WlVNXwJDdHuxQDMbZUZ1dn9zT1NcXxD6ZnZzXyTffE5sx3ZJfm2jStxPOsjNfU7vbFxAOyHCD32pirq4r7%2BxqbBap4YPcqPqeGWNgAvsQgA)),filterAsm:(commentOnly:!t,directives:!t,labels:!t),version:3\"相对=nofollow>无法检测字节交换。我尝试了几个不同的方法,但他们只把事情弄得更糟。我知道,GCC可以用按位与,移位和按位或字节检测掉期,但为什么写字节时不工作?
But GCC fails to detect the byte swap. I tried a couple of different approaches but they only made things worse. I know that GCC can detect byte swaps using bitwise-and, shift, and bitwise-or, but why doesn't it work when writing bytes?
推荐答案
这似乎这样的伎俩:
void encode_bigend_u64(uint64_t value, void* dest)
{
*(uint64_t*)dest =
((value & 0xFF00000000000000u) >> 56u) |
((value & 0x00FF000000000000u) >> 40u) |
((value & 0x0000FF0000000000u) >> 24u) |
((value & 0x000000FF00000000u) >> 8u) |
((value & 0x00000000FF000000u) << 8u) |
((value & 0x0000000000FF0000u) << 24u) |
((value & 0x000000000000FF00u) << 40u) |
((value & 0x00000000000000FFu) << 56u);
}
铛与 -O3
encode_bigend_u64(unsigned long, void*):
bswapq %rdi
movq %rdi, (%rsi)
retq
铛与 -O3 -march =本地
encode_bigend_u64(unsigned long, void*):
movbeq %rdi, (%rsi)
retq
GCC与 -O3
encode_bigend_u64(unsigned long, void*):
bswap %rdi
movq %rdi, (%rsi)
ret
GCC与 -O3 -march =本地
encode_bigend_u64(unsigned long, void*):
movbe %rdi, (%rsi)
ret
在 http://gcc.godbolt.org/(所以我不知道到底处理器是什么下(对于 -march =本地
),但我强烈怀疑最近x86_64的处理器)
Tested with clang 3.8.0 and gcc 5.3.0 on http://gcc.godbolt.org/ (so I don't know exactly what processor is underneath (for the -march=native
) but I strongly suspect a recent x86_64 processor)
如果你想这对于大端架构太工作的功能,你可以使用从<一个答案href=\"http://stackoverflow.com/questions/1001307/detecting-endianness-programmatically-in-a-c-program\">here检测到系统的字节序并添加如果
。无论是工会和指针蒙上版本一起使用且由 GCC优化
和铛
导致在详细同一个程序集(无分支)。在godebolt 全部code:
If you want a function which works for big endian architectures too, you can use the answers from here to detect the endianness of the system and add an if
. Both the union and the pointer casts versions work and are optimized by both gcc
and clang
resulting in the exact same assembly (no branches). Full code on godebolt:
int is_big_endian(void)
{
union {
uint32_t i;
char c[4];
} bint = {0x01020304};
return bint.c[0] == 1;
}
void encode_bigend_u64_union(uint64_t value, void* dest)
{
if (!is_big_endian())
//...
else
*(uint64_t*)dest = value;
}
或
void encode_bigend_u64_ptr_cast(uint64_t value, void* dest)
{
const uint16_t endian_test = 1;
if (*(uint8_t*)(&endian_test) == 1)
//..
else
*(uint64_t*)dest = value;
}
<一个href=\"http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf\"相对=nofollow>英特尔®64和IA-32架构指令集(3-542卷2A):
Intel® 64 and IA-32 Architectures Instruction Set Reference (3-542 Vol. 2A):
MOVBE,移动数据交换后字节
上执行从所述第二复制的数据的一个字节交换操作
操作数(源操作数),结果存储到第一个操作数
(目标操作数)。 [...]
Performs a byte swap operation on the data copied from the second operand (source operand) and store the result in the first operand (destination operand). [...]
该MOVBE指令提供了交换上读取的字节数
从存储器或到存储器的写入;从而提供支持
转换little-endian的值big-endian格式,反之亦然。
The MOVBE instruction is provided for swapping the bytes on a read from memory or on a write to memory; thus providing support for converting little-endian values to big-endian format and vice versa.
这篇关于如何让GCC产生无内建的大端存储BSWAP指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!