为什么GCC不能为两个int32的结构生成最优算子==? [英] Why can't GCC generate an optimal operator== for a struct of two int32s?

查看:38
本文介绍了为什么GCC不能为两个int32的结构生成最优算子==?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个同事给我看了我认为没有必要的代码,但是可以肯定的是.我希望大多数编译器都将相等性测试中的所有这三种尝试视为等同:

A colleague showed me code that I thought wouldn't be necessary, but sure enough, it was. I would expect most compilers would see all three of these attempts at equality tests as equivalent:

#include <cstdint>
#include <cstring>

struct Point {
    std::int32_t x, y;
};

[[nodiscard]]
bool naiveEqual(const Point &a, const Point &b) {
    return a.x == b.x && a.y == b.y;
}

[[nodiscard]]
bool optimizedEqual(const Point &a, const Point &b) {
    // Why can't the compiler produce the same assembly in naiveEqual as it does here?
    std::uint64_t ai, bi;
    static_assert(sizeof(Point) == sizeof(ai));
    std::memcpy(&ai, &a, sizeof(Point));
    std::memcpy(&bi, &b, sizeof(Point));
    return ai == bi;
}

[[nodiscard]]
bool optimizedEqual2(const Point &a, const Point &b) {
    return std::memcmp(&a, &b, sizeof(a)) == 0;
}


[[nodiscard]]
bool naiveEqual1(const Point &a, const Point &b) {
    // Let's try avoiding any jumps by using bitwise and:
    return (a.x == b.x) & (a.y == b.y);
}

但是令我惊讶的是,只有带有 memcpy memcmp 的代码才被GCC变成单个64位比较.为什么?( https://godbolt.org/z/aP1ocs )

But to my surprise, only the ones with memcpy or memcmp get turned into a single 64-bit compare by GCC. Why? (https://godbolt.org/z/aP1ocs)

对于优化器来说,如果我检查四个字节的连续对是否相等,就等于对所有八个字节进行比较是否相同?

Isn't it obvious to the optimizer that if I check equality on contiguous pairs of four bytes that that's the same as comparing on all eight bytes?

避免将两个部分分别布尔化的尝试在某种程度上提高了编译效率(减少了一条指令,并且没有对EDX的错误依赖),但是仍然进行了两个单独的32位运算.

An attempt to avoid separately booleanizing the two parts compiles somewhat more efficiently (one fewer instruction and no false dependency on EDX), but still two separate 32-bit operations.

bool bithackEqual(const Point &a, const Point &b) {
    // a^b == 0 only if they're equal
    return ((a.x ^ b.x) | (a.y ^ b.y)) == 0;
}


当通过 value 传递结构时,

GCC和Clang都具有相同的遗漏优化(因此 a 在RDI中,而 b 在RSI中)因为这就是x86-64 System V的调用约定将结构打包到寄存器中的方式): https://godbolt.org/z/v88a6s.memcpy/memcmp版本都可以编译为 cmp rdi,rsi / sete al ,但是其他版本则分别执行32位操作.


GCC and Clang both have the same missed optimizations when passing the structs by value (so a is in RDI and b is in RSI because that's how x86-64 System V's calling convention packs structs into registers): https://godbolt.org/z/v88a6s. The memcpy / memcmp versions both compile to cmp rdi, rsi / sete al, but the others do separate 32-bit operations.

struct alignas(uint64_t)Point 在按值存储的情况下仍能提供帮助,在这种情况下,参数位于寄存器中,从而优化了两个naiveEqual版本的GCC,但没有优化bithack XOR/OR.( https://godbolt.org/z/ofGa1f ).这是否给我们有关海湾合作委员会内部的任何暗示?对齐并不能帮助Clang.

struct alignas(uint64_t) Point surprisingly still helps in the by-value case where arguments are in registers, optimizing both naiveEqual versions for GCC, but not the bithack XOR/OR. (https://godbolt.org/z/ofGa1f). Does this give us any hints about GCC's internals? Clang isn't helped by alignment.

推荐答案

如果您解决"问题,对齐方式,都给出相同的汇编语言输出(使用GCC):

If you "fix" the alignment, all give the same assembly language output (with GCC):

struct alignas(std::int64_t) Point {
    std::int32_t x, y;
};

演示

请注意,做某些事情(如punning类型)的一些正确/合法方法是使用 memcpy ,因此在使用该功能时进行特定的优化(或者更具攻击性)似乎是合乎逻辑的.

As a note, some correct/legal ways to do some stuff (as type punning) is to use memcpy, so having specific optimization (or be more aggressive) when using that function seems logical.

这篇关于为什么GCC不能为两个int32的结构生成最优算子==?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆