std::min 与使用 #pragma GCC 优化(“O3")的三元 gcc 自动矢量化 [英] std::min vs ternary gcc auto vectorization with #pragma GCC optimize ("O3")

查看:114
本文介绍了std::min 与使用 #pragma GCC 优化(“O3")的三元 gcc 自动矢量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道为什么我的编译器要这样做"不是最好的问题类型,但这个问题对我来说真的很奇怪,我很困惑.

I know that "why is my compiler doing this" aren't the best type of questions, but this one is really bizarre to me and I'm thoroughly confused.

我原以为 std::min() 和手写的三元是一样的(可能还有一些编译时模板的东西),而且在正常使用时它似乎编译成相同的操作.然而,当试图做一个min and sum"时loop autovectorize 它们似乎不一样,如果有人能帮我找出原因,我会很高兴.这是一个产生问题的小示例代码:

I had thought that std::min() was the same as the handwritten ternary (with maybe some compile time template stuff), and it seems to compile down into the same operation when used normally. However, when trying to make a "min and sum" loop autovectorize they don't seem to be the same, and I would love if someone could help me figure out why. Here is a small example code that produces the issue:

#pragma GCC target ("avx2")
#pragma GCC optimize ("O3")

#include <cstdio>
#include <cstdlib>
#include <algorithm>

#define N (1<<20)
char a[N], b[N];

int main() {
    for (int i=0; i<N; ++i) {
        a[i] = rand()%100;
        b[i] = rand()%100;
    }

    int ans = 0;
    #pragma GCC ivdep
    for (int i=0; i<N; ++i) {
        //ans += std::min(a[i], b[i]);
        ans += a[i]>b[i] ? a[i] : b[i];
    }
    printf("%d\n", ans);
}

我在 gcc 9.3.0 上编译,使用编译命令 g++ -o test test.cpp -ftree-vectorize -fopt-info-vec-missed -fopt-info-vec-optimized -funsafe-math-optimizations.

I compile this on gcc 9.3.0, with the compilation command g++ -o test test.cpp -ftree-vectorize -fopt-info-vec-missed -fopt-info-vec-optimized -funsafe-math-optimizations.

上面的代码在编译过程中被调试为:

And the code above as is debugs during compilation as:

test.cpp:19:17: optimized: loop vectorized using 32 byte vectors

相反,如果我注释三元并取消注释 std::min,我会得到:

In contrast, if I comment the ternary and uncomment the std::min, I get this:

test.cpp:19:17: missed: couldn't vectorize loop
test.cpp:20:35: missed: statement clobbers memory: _9 = std::min<char> (_8, _7);

所以 std::min() 似乎在做一些不寻常的事情,阻止 gcc 理解它只是一个最小操作.这是标准造成的吗?还是执行失败?或者是否有一些编译标志可以使这项工作?

So std::min() seems to be doing something unusual that prevents gcc from understanding that it is just a min operation. Is this something that is caused by the standard? Or is it an implementation failure? Or is there some compile flag that would make this work?

推荐答案

总结:不要使用 #pragma GCC optimize.在命令行上改用 -O3,你会得到你期望的行为.

Summary: don't use #pragma GCC optimize. Use -O3 on the command line instead, and you'll get the behavior you expect.

GCC 的文档#pragma GCC optimize 上说:

GCC's documentation on #pragma GCC optimize says:

在这一点之后定义的每个函数都被视为已经为每个字符串参数声明了一个 optimize(string) 属性.

Each function that is defined after this point is treated as if it had been declared with one optimize(string) attribute for each string argument.

以及优化 属性记录为:

And the optimize attribute is documented as:

优化属性用于指定要使用与命令行中指定的优化选项不同的优化选项来编译函数.[...] 优化属性应仅用于调试目的.它不适合在生产代码中使用. [强调,感谢 Peter Cordes 发现最后一部分.]

The optimize attribute is used to specify that a function is to be compiled with different optimization options than specified on the command line. [...] The optimize attribute should be used for debugging purposes only. It is not suitable in production code. [Emphasis added, thanks Peter Cordes for spotting the last part.]

所以,不要使用它.

特别是,在文件顶部指定 #pragma GCC optimize ("O3") 实际上并不等同于在文件顶部使用 -O3命令行.事实证明,前者不会导致 std::min 被内联,因此编译器实际上确实假设它可能会修改全局内存,例如您的 a,b数组.这自然会抑制矢量化.

In particular, it looks like specifying #pragma GCC optimize ("O3") at the top of your file is not actually equivalent to using -O3 on the command line. It turns out that the former doesn't result in std::min being inlined, and so the compiler actually does assume that it might modify global memory, such as your a,b arrays. This naturally inhibits vectorization.

仔细阅读__attribute__((optimize)) 的文档,它看起来像每个函数main()std::min() 将像 -O3 一样被编译.但这与使用 -O3 将它们编译在一起不同,因为只有在后一种情况下,程序间优化(如内联)才可用.

A careful reading of the documentation for __attribute__((optimize)) makes it look like each of the functions main() and std::min() will be compiled as if with -O3. But that's not the same as compiling the two of them together with -O3, as only in the latter case would interprocedural optimizations like inlining be available.

这是一个关于 Godbolt 的非常简单的例子.使用 #pragma GCC optimize ("O3") 函数 foo()please_inline_me() 都被优化了,但是 please_inline_me() 不会被内联.但是在命令行上使用 -O3 就可以了.

Here is a very simple example on godbolt. With #pragma GCC optimize ("O3") the functions foo() and please_inline_me() are each optimized, but please_inline_me() does not get inlined. But with -O3 on the command line, it does.

猜测是 optimize 属性和扩展 #pragma GCC optimize 会导致编译器将该函数视为其定义在单独的源中正在使用指定选项编译的文件.事实上,如果 std::min()main() 在单独的源文件中定义,你可以用 -O3 编译每个文件但你不会内联.

A guess would be that the optimize attribute, and by extension #pragma GCC optimize, causes the compiler to treat the function as if its definition were in a separate source file which was being compiled with the specified option. And indeed, if std::min() and main() were defined in separate source files, you could compile each one with -O3 but you wouldn't get inlining.

可以说 GCC 手册应该更明确地记录这一点,但我想如果它只是为了调试,假设它是为熟悉这种区别的专家准备的可能是公平的.

Arguably the GCC manual should document this more explicitly, though I guess if it's only meant for debugging, it might be fair to assume it's intended for experts who would be familiar with the distinction.

如果你真的在命令行上用 -O3 编译你的例子,你会得到两个版本相同的(矢量化)程序集,或者至少我做到了.(修正向后比较后:您的三元代码正在计算最大值而不是最小值.)

If you really do compile your example with -O3 on the command line, you get identical (vectorized) assembly for both versions, or at least I did. (After fixing the backwards comparison: your ternary code is computing max instead of min.)

这篇关于std::min 与使用 #pragma GCC 优化(“O3")的三元 gcc 自动矢量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆