C ++编译器是否优化重复的函数调用? [英] Do C++ compilers optimize repeated function calls?

查看:161
本文介绍了C ++编译器是否优化重复的函数调用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编译器(通常还是特别是)是否优化重复的函数调用?



例如,考虑这种情况。

  struct foo {
member_type m;
return_type f()const; //返回值
};

函数定义在一个翻译单元中

  return_type foo :: f()const {
/ *使用m的值进行一些计算* /
/ *按值返回* /
}

重复的函数调用在另一个单元中

  foo bar; 

some_other_function_a(bar.f());
some_other_function_b(bar.f());

第二个翻译单元中的代码是否会转换成这个?

  foo bar; 

const return_type _tmp_bar_f = bar.f();

some_other_function_a(_tmp_bar_f);
some_other_function_b(_tmp_bar_f);

可能,计算 f 确实可以是昂贵,但返回的类型可能很小(想想返回一个 double 的数学函数)。编译器会这样做吗?是否有某些情况下有或没有?您可以考虑此问题的广义版本,而不仅仅是成员函数或不带参数的函数。



根据@BaummitAugen的建议进行澄清



我对这里问题的理论方面更感兴趣,而不是人们是否可以依靠它来使真实世界的代码运行得更快。我对使用Linux的x86_64上的GCC特别感兴趣。并且优化级别足够高,请参见此处: https://gcc.gnu.org/wiki/LinkTimeOptimization 除了编译时,实际上没有理由不做这两个事情。



此外,您始终可以通过用适当的属性标记函数来帮助编译器。您可能想用const属性标记函数,如下所示:

  struct foo {
member_type m;
return_type f()const __attribute __((const)); //返回值
};

在此处查看GCC文档以了解哪个属性合适: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html



从广义上讲,这对于编译器来说很容易检测。实际上,它执行的转换要少得多。但是,链接时间优化之所以重要很重要,是因为一旦GCC生成了实际的机器代码,它就不会真正知道该做什么是安全的。例如,您的函数可以修改数据(在类外部)或访问易失性变量。



编辑:



< GCC最肯定可以做到这一点。使用此代码和标志-O3 -fno-inline:



C ++代码:

  #include< iostream> 

int函数(int c){
for(int i = 0; i!= c; ++ i){
c + = i;
}
返回c;
}

int main(){
char c;
:: std :: cin>> C;
返回函数(c)+函数(c)+函数(c)+函数(c)+函数(c);
}

组装输出:

  4006a0:48 83 ec 18 sub rsp,0x18 
4006a4:bf 80 0c 60 00 mov edi,0x600c80
4006a9:48 8d 74 24 0f lea rsi, [rsp + 0xf]
4006ae:e8 ad ff ff ff呼叫400660< _ZStrsIcSt11char_traitsIcEERSt13basic_istreamIT_T0_ES6_RS3_ @ plt>
4006b3:0f b6 7c 24 0f movzx edi,BYTE PTR [rsp + 0xf]
4006b8:e8 13 01 00 00呼叫4007d0< _Z8functioni>
4006bd:48 83 c4 18 add rsp,0x18
4006c1:8d 04 80 lea eax,[rax + rax * 4]
4006c4:c3 ret
4006c5:66 66 2e 0f 1f 84 00 data32 nop WORD PTR cs:[rax + rax * 1 + 0x0]
4006cc:00 00 00 00

但是,当函数在单独的编译单元中并且未指定-flto选项时,它不会执行此操作。为了澄清起见,该行调用函数:

 调用4007d0< _Z8functioni> 

此行将结果乘以5(相加五个副本):

  lea eax,[rax + rax * 4] 


Do compilers (generally or in particular) optimize repeated function calls?

For example, consider this case.

struct foo {
  member_type m;
  return_type f() const; // returns by value
};

The function definition is in one translation unit

return_type foo::f() const {
  /* do some computation using the value of m */
  /* return by value */
}

Repeated function calls are in another unit

foo bar;

some_other_function_a(bar.f());
some_other_function_b(bar.f());

Would the code in the second translation unit be converted to this?

foo bar;

const return_type _tmp_bar_f = bar.f();

some_other_function_a(_tmp_bar_f);
some_other_function_b(_tmp_bar_f);

Potentially, the computation f does can be expensive, but the returned type can be something very small (think about a mathematical function returning a double). Do compilers do this? Are there cases when they do or don't? You can consider a generalized version of this question, not just for member functions, or functions with no arguments.

Clarification per @BaummitAugen's suggestion:

I'm more interested in the theoretical aspect of the question here, and not so much in whether one could rely on this to make real world code run faster. I'm particularly interested in GCC on x86_64 with Linux.

解决方案

GCC absolutely optimizes across compilation units if you have Link Time Optimization on and the optimization level is high enough, see here: https://gcc.gnu.org/wiki/LinkTimeOptimization There is really no reason besides compilation time to not do both of these.

Additionally, you can always help the compiler along by marking the function with the appropriate attributes. You probably want to mark the function with the attribute const as follows:

struct foo {
  member_type m;
  return_type f() const __attribute__((const)); // returns by value
};

Take a look at GCCs documentation here to see which attribute is appropriate: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

In a more general sense, this is very easy for a compiler to detect. It actually performs transformations that are much less obvious. The reason why Link Time Optimization is important, though, is that once GCC has generated actual machine code, it will not really know what is safe at that point to do. Your function could, for example, modify data (outside your class) or access a volatile variable.

EDIT:

GCC most definitely can do this optimization. With this code and the flags -O3 -fno-inline:

C++ code:

#include <iostream>

int function(int c){
  for(int i = 0; i != c; ++i){
    c += i;
  }
  return c;
}

int main(){
  char c;
  ::std::cin >> c;
  return function(c) + function(c) + function(c) + function(c) + function(c);
}

Assembly Output:

4006a0: 48 83 ec 18             sub    rsp,0x18
4006a4: bf 80 0c 60 00          mov    edi,0x600c80
4006a9: 48 8d 74 24 0f          lea    rsi,[rsp+0xf]
4006ae: e8 ad ff ff ff          call   400660 <_ZStrsIcSt11char_traitsIcEERSt13basic_istreamIT_T0_ES6_RS3_@plt>
4006b3: 0f b6 7c 24 0f          movzx  edi,BYTE PTR [rsp+0xf]
4006b8: e8 13 01 00 00          call   4007d0 <_Z8functioni>
4006bd: 48 83 c4 18             add    rsp,0x18
4006c1: 8d 04 80                lea    eax,[rax+rax*4]
4006c4: c3                      ret    
4006c5: 66 66 2e 0f 1f 84 00    data32 nop WORD PTR cs:[rax+rax*1+0x0]
4006cc: 00 00 00 00 

It does, however, fail to do this when the function is in a separate compilation unit and the -flto option is not specified. Just to clarify, this line calls the function:

call   4007d0 <_Z8functioni>

And this line multiplies the result by 5 (adding together five copies):

lea    eax,[rax+rax*4]

这篇关于C ++编译器是否优化重复的函数调用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆