如何让 GCC 在没有内置函数的情况下为大端存储生成 bswap 指令? [英] How to make GCC generate bswap instruction for big endian store without builtins?

查看:17
本文介绍了如何让 GCC 在没有内置函数的情况下为大端存储生成 bswap 指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:此问题已在 GCC 8.1 中修复.

Update: This was fixed in GCC 8.1.

我正在研究一个以大端格式将 64 位值存储到内存中的函数.我希望我可以编写可在小端和大端平台上运行的可移植 C99 代码,并让现代 x86 编译器自动生成 bswap 指令而无需任何内置或内在函数.所以我从以下功能开始:

I'm working on a function that stores a 64-bit value into memory in big endian format. I was hoping that I could write portable C99 code that works on both little and big endian platforms and have modern x86 compilers generate a bswap instruction automatically without any builtins or intrinsics. So I started with the following function:

#include <stdint.h>

void
encode_bigend_u64(uint64_t value, void *vdest) {
    uint8_t *bytes = (uint8_t *)vdest;
    bytes[0] = value >> 56;
    bytes[1] = value >> 48;
    bytes[2] = value >> 40;
    bytes[3] = value >> 32;
    bytes[4] = value >> 24;
    bytes[5] = value >> 16;
    bytes[6] = value >> 8;
    bytes[7] = value;
}

这适用于将此函数编译为:

This works fine for clang which compiles this function to:

bswapq  %rdi
movq    %rdi, (%rsi)
retq

但是 GCC 未能检测到字节交换.我尝试了几种不同的方法,但它们只会让事情变得更糟.我知道 GCC 可以使用按位与、移位和按位或来检测字节交换,但是为什么在写入字节时它不起作用?

But GCC fails to detect the byte swap. I tried a couple of different approaches but they only made things worse. I know that GCC can detect byte swaps using bitwise-and, shift, and bitwise-or, but why doesn't it work when writing bytes?

我发现了相应的 GCC 错误.

推荐答案

这似乎可以解决问题:

void encode_bigend_u64(uint64_t value, void* dest)
{
  value =
      ((value & 0xFF00000000000000u) >> 56u) |
      ((value & 0x00FF000000000000u) >> 40u) |
      ((value & 0x0000FF0000000000u) >> 24u) |
      ((value & 0x000000FF00000000u) >>  8u) |
      ((value & 0x00000000FF000000u) <<  8u) |      
      ((value & 0x0000000000FF0000u) << 24u) |
      ((value & 0x000000000000FF00u) << 40u) |
      ((value & 0x00000000000000FFu) << 56u);
  memcpy(dest, &value, sizeof(uint64_t));
}

clang 与 -O3

encode_bigend_u64(unsigned long, void*):
        bswapq  %rdi
        movq    %rdi, (%rsi)
        retq

clang 与 -O3 -march=native

encode_bigend_u64(unsigned long, void*):
        movbeq  %rdi, (%rsi)
        retq

gcc 与 -O3

encode_bigend_u64(unsigned long, void*):
        bswap   %rdi
        movq    %rdi, (%rsi)
        ret

gcc 与 -O3 -march=native

encode_bigend_u64(unsigned long, void*):
        movbe   %rdi, (%rsi)
        ret

<小时>

http://gcc.godbolt.org/ (所以我不知道到底是什么处理器(对于 -march=native),但我强烈怀疑最近的 x86_64 处理器)


Tested with clang 3.8.0 and gcc 5.3.0 on http://gcc.godbolt.org/ (so I don't know exactly what processor is underneath (for the -march=native) but I strongly suspect a recent x86_64 processor)

如果您想要一个也适用于大端架构的函数,您可以使用 here 检测系统的字节序并添加if.联合和指针转换版本都可以工作,并由 gccclang 进行优化,从而产生完全相同的程序集(没有分支).godebolt 的完整代码:

If you want a function which works for big endian architectures too, you can use the answers from here to detect the endianness of the system and add an if. Both the union and the pointer casts versions work and are optimized by both gcc and clang resulting in the exact same assembly (no branches). Full code on godebolt:

int is_big_endian(void)
{
    union {
        uint32_t i;
        char c[4];
    } bint = {0x01020304};

    return bint.c[0] == 1;
}

void encode_bigend_u64_union(uint64_t value, void* dest)
{
  if (!is_big_endian())
    //...
  memcpy(dest, &value, sizeof(uint64_t));
}

<小时>

英特尔® 64 和 IA-32 架构指令集参考(3-542 卷 2A):


Intel® 64 and IA-32 Architectures Instruction Set Reference (3-542 Vol. 2A):

MOVBE——交换字节后移动数据

对从第二个复制的数据执行字节交换操作操作数(源操作数)并将结果存储在第一个操作数中(目标操作数).[...]

Performs a byte swap operation on the data copied from the second operand (source operand) and store the result in the first operand (destination operand). [...]

MOVBE 指令用于交换读取的字节从内存中或在写入内存时;从而为将 little-endian 值转换为 big-endian 格式,反之亦然.

The MOVBE instruction is provided for swapping the bytes on a read from memory or on a write to memory; thus providing support for converting little-endian values to big-endian format and vice versa.

这篇关于如何让 GCC 在没有内置函数的情况下为大端存储生成 bswap 指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆