GCC似乎错过了简单的优化 [英] GCC seemingly misses simple optimization

查看:81
本文介绍了GCC似乎错过了简单的优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试引入具有三元运算符E1 ? E2 : E3语义的泛型函数.我看到编译器能够根据三元运算符的E1条件消除E2E3之一的计算.但是,在ternary函数调用的情况下(即使E2/E3没有副作用),GCC也会错过此优化.

I am trying to introduce a generic function with the semantics of ternary operator: E1 ? E2 : E3. I see that compiler is able to eliminate calculation of one of E2 or E3 depending on E1 condition for the ternary operator. However GCC misses this optimization in case of ternary function call (even when E2/E3 have no side effects).

在下面的清单中,函数ternary被编写为与三元运算符相似.但是,GCC发出了对函数f的潜在重调用,似乎可以消除某些输入值(对于三元运算符完全如此),因为f是用纯属性声明的-请查看Godbolt链接以获取生成的汇编代码由GCC提供.

In the listing below function ternary is written to behave similarly to the ternary operator. However GCC emits potentially heavy call to function f which seems can be eliminated for some input values (exactly how it is done for ternary operator) because f is declared with pure attribute - please look at the godbolt link for assembly code generated by GCC.

是在GCC(优化空间)中可以改进的东西,还是C ++标准明确禁止此类优化?

Is it something that could be improved in GCC (room for optimization) or does the C++ standard explicitly prohibit such kind of optimizations?

// Very heavy function
int f() __attribute__ ((pure));

inline int ternary(bool cond, int n1, int n2) {
    return cond ? n1 : n2;
}

int foo1(int i) {
    return i == 0 ? f() : 0;
}

int foo2(int i) {
    return ternary(i == 0, f(), 0);
}

带有-O3 -std=c++11的组装清单:

foo1(int):
  test edi, edi
  jne .L2
  jmp f()
.L2:
  xor eax, eax
  ret
foo2(int):
  push rbx
  mov ebx, edi
  call f()
  test ebx, ebx
  mov edx, 0
  pop rbx
  cmovne eax, edx
  ret

https://godbolt.org/z/HfpNzo

推荐答案

我看到编译器能够根据三元运算符的E1条件(只要E2/E3没有副作用)消除E2或E3之一的计算.

I see that compiler is able to eliminate calculation of one of E2 or E3 depending on E1 condition (as long as E2/E3 has no side effects) for the ternary operator.

编译器不会消除它;首先,它永远不会将其优化为cmov. C ++抽象机评估三元运算符的未使用面.

The compiler doesn't eliminate it; it just never optimizes it into a cmov in the first place. The C++ abstract machine doesn't evaluate the not-used side of the ternary operator.

int a, b;
void foo(int sel) {
    sel ? a++ : b++;
}

像这样编译(如果两个输入都没有任何副作用,则三元运算符只能将其优化为asm cmov.否则它们并不完全相同.

The ternary operator can only optimize to an asm cmov if neither input has any side-effects. Otherwise they're not exactly equivalent.

在C ++抽象机(即gcc优化器的输入)中,您的foo2始终会调用f(),而您的foo1不会.的方式.

In the C++ abstract machine (i.e. the input to gcc's optimizer), your foo2 does always call f(), while your foo1 doesn't. It's no surprise that foo1 compiles the way it does.

要使foo2以这种方式进行编译,就必须优化对f()的调用.通常会调用它为ternary()创建arg.

For foo2 to compile that way, it would have to optimize away the call to f(). It's always called to create an arg for ternary().

这里有一个遗漏的优化,您应该报告GCC的bugzilla(使用missed-optimization关键字作为标签). https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc

There is a missed-optimization here, which you should report on GCC's bugzilla (use the missed-optimization keyword as a tag). https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc

int f() __attribute__ ((pure)); 应该的调用应该能够被优化.它可以读取全局变量,但不能有任何副作用.(

A call to int f() __attribute__ ((pure)); should be able to be optimized away. It can read globals, but must not have any side effects. (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html)

正如@melpomene在评论中发现的那样,int f() __attribute__ ((const));确实为您提供了所需的优化. __attribute__((const))函数甚至无法读取全局变量,只能读取其args. (因此没有args,它必须始终返回一个常量.)

As @melpomene discovered in comments, int f() __attribute__ ((const)); does give you the optimization you're looking for. An __attribute__((const)) function can't even read globals, only its args. (Thus with no args, it must always return a constant.)

HVD指出gcc没有f()的任何费用信息.即使它可以优化了对((pure)) f()以及((const)) f的调用,也许不是因为它不知道它比条件分支更昂贵?可能使用配置文件引导的优化进行编译会说服gcc做点什么?

HVD points out that gcc doesn't have any cost info for f(). Even if it could have optimized away the call to ((pure)) f() as well as to ((const)) f, maybe it didn't because it didn't know it was more expensive than a conditional branch? Possibly compiling with profile-guided optimization would convince gcc to do something?

但是考虑到它使foo2中的对((const)) f的调用成为条件,gcc可能只是不知道它可以优化对((pure))函数的调用?也许它只能对它们进行CSE(如果未编写任何全局变量),而不能完全脱离基本块进行优化?也许当前的优化器无法利用.就像我说的,看起来像是一个漏选错误.

But given that it made the call to ((const)) f conditional in foo2, gcc may just not know that it can optimize away calls to ((pure)) functions? Maybe it can only CSE them (if no globals have been written), but not optimize away entirely from a basic block? Or maybe the current optimizer just fails to take advantage. Like I said, looks like a missed-opt bug.

这篇关于GCC似乎错过了简单的优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆