为什么GCC不能为两个int32的结构生成最优算子==? [英] Why can't GCC generate an optimal operator== for a struct of two int32s?

查看：38 发布时间：2021/4/19 20:47:56 c++ gcc x86-64 compiler-optimization micro-optimization

本文介绍了为什么GCC不能为两个int32的结构生成最优算子==?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

一个同事给我看了我认为没有必要的代码，但是可以肯定的是.我希望大多数编译器都将相等性测试中的所有这三种尝试视为等同:

A colleague showed me code that I thought wouldn't be necessary, but sure enough, it was. I would expect most compilers would see all three of these attempts at equality tests as equivalent:

#include <cstdint>
#include <cstring>

struct Point {
    std::int32_t x, y;
};

[[nodiscard]]
bool naiveEqual(const Point &a, const Point &b) {
    return a.x == b.x && a.y == b.y;
}

[[nodiscard]]
bool optimizedEqual(const Point &a, const Point &b) {
    // Why can't the compiler produce the same assembly in naiveEqual as it does here?
    std::uint64_t ai, bi;
    static_assert(sizeof(Point) == sizeof(ai));
    std::memcpy(&ai, &a, sizeof(Point));
    std::memcpy(&bi, &b, sizeof(Point));
    return ai == bi;
}

[[nodiscard]]
bool optimizedEqual2(const Point &a, const Point &b) {
    return std::memcmp(&a, &b, sizeof(a)) == 0;
}


[[nodiscard]]
bool naiveEqual1(const Point &a, const Point &b) {
    // Let's try avoiding any jumps by using bitwise and:
    return (a.x == b.x) & (a.y == b.y);
}

但是令我惊讶的是，只有带有 memcpy 或 memcmp 的代码才被GCC变成单个64位比较.为什么?( https://godbolt.org/z/aP1ocs )

But to my surprise, only the ones with memcpy or memcmp get turned into a single 64-bit compare by GCC. Why? (https://godbolt.org/z/aP1ocs)

对于优化器来说，如果我检查四个字节的连续对是否相等，就等于对所有八个字节进行比较是否相同?

Isn't it obvious to the optimizer that if I check equality on contiguous pairs of four bytes that that's the same as comparing on all eight bytes?

避免将两个部分分别布尔化的尝试在某种程度上提高了编译效率(减少了一条指令，并且没有对EDX的错误依赖)，但是仍然进行了两个单独的32位运算.

An attempt to avoid separately booleanizing the two parts compiles somewhat more efficiently (one fewer instruction and no false dependency on EDX), but still two separate 32-bit operations.

bool bithackEqual(const Point &a, const Point &b) {
    // a^b == 0 only if they're equal
    return ((a.x ^ b.x) | (a.y ^ b.y)) == 0;
}

当通过 value 传递结构时，

GCC和Clang都具有相同的遗漏优化(因此 a 在RDI中，而 b 在RSI中)因为这就是x86-64 System V的调用约定将结构打包到寄存器中的方式): https://godbolt.org/z/v88a6s.memcpy/memcmp版本都可以编译为 cmp rdi，rsi / sete al ，但是其他版本则分别执行32位操作.

GCC and Clang both have the same missed optimizations when passing the structs by value (so a is in RDI and b is in RSI because that's how x86-64 System V's calling convention packs structs into registers): https://godbolt.org/z/v88a6s. The memcpy / memcmp versions both compile to cmp rdi, rsi / sete al, but the others do separate 32-bit operations.

struct alignas(uint64_t)Point 在按值存储的情况下仍能提供帮助，在这种情况下，参数位于寄存器中，从而优化了两个naiveEqual版本的GCC，但没有优化bithack XOR/OR.( https://godbolt.org/z/ofGa1f ).这是否给我们有关海湾合作委员会内部的任何暗示?对齐并不能帮助Clang.

struct alignas(uint64_t) Point surprisingly still helps in the by-value case where arguments are in registers, optimizing both naiveEqual versions for GCC, but not the bithack XOR/OR. (https://godbolt.org/z/ofGa1f). Does this give us any hints about GCC's internals? Clang isn't helped by alignment.

为什么GCC不能为两个int32的结构生成最优算子==? [英] Why can't GCC generate an optimal operator== for a struct of two int32s?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

为什么GCC不能为两个int32的结构生成最优算子==? [英] Why can&#39;t GCC generate an optimal operator== for a struct of two int32s?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

为什么GCC不能为两个int32的结构生成最优算子==? [英] Why can't GCC generate an optimal operator== for a struct of two int32s?

登录关闭