为什么gcc不能内联可以确定的函数指针? [英] Why gcc can't inline function pointers that can be determined?

查看:181
本文介绍了为什么gcc不能内联可以确定的函数指针?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的程序在gcc4.6.2下用-O3在centos上编译:

  #include< iostream> 
#include< vector>
#include< algorithm>
#include< ctime>
使用namespace std;

模板< typename T>
class F {
public:
typedef void(T :: * Func)();

F(Func f):f_(f){}

void operator()(T& t){
(t。* f _)() ;
}
private:
Func f_;
};
$ b struct X {
X():x_(0){}

void f(){
++ x_;
}

int x_;
};

int main()
{
const int N = 100000000;
vector< X> XV(N);
auto begin = clock();
for_each(xv.begin(),xv.end(),F< X> X& X :: f));
auto end = clock();
cout<<结束 - 开始< ENDL;

objdump -D 显示循环的生成代码为:

  40097c:e8 57 fe ff ff callq 4007d8< clock @ plt> 
400981:49 89 c5 mov%rax,%r13
400984:0f 1f 40 00 nopl 0x0(%rax)
400988:48 89 ef mov%rbp,%rdi
40098b:48 83 c5 04 add $ 0x4,%rbp
40098f:e8 8c ff ff callf 400920< _ZN1X1fEv>
400994:4c 39 e5 cmp%r12,%rbp
400997:75 ef jne 400988< main + 0x48>
400999:e8 3a fe ff ff callq 4007d8< clock @ plt>

显然,gcc并没有内联该函数。为什么gcc不能进行这种优化?是否有任何编译器标志可以让gcc做所需的优化? Adams Meyers的Effective C ++(第三版)第30项:理解内联的来龙去脉,他声称函数指针的调用从不内联。第三版于2008年发布,我确实已经能够通过编译时常量指针从gcc 4.6开始,在2011年(也许是2010年)推出gcc来内联函数调用。然而,这是在C中,并且很棘手。在一种情况下,我必须在内联调用之前声明调用函数 __ attribute __((flatten))(在这种情况下,我将函数指针作为一个结构体,谁的指针我然后传递给一个内联函数,该函数将通过内联的指针调用函数)。

所以总之,不,这不是一个bug gcc,但这并不意味着gcc(和/或其他编译器)可能无法在某天内联。但我认为真正的问题是你不明白这里发生了什么。为了达到这种理解,你必须更像一个汇编程序员或编译器程序员。



传递一个 F< ; X> 并用指向另一个类的成员函数的指针初始化它。您还没有声明实例 F 对象常量,它是 Func f _ 成员常量,也不是您的 void F :: operator()(T& t)成员为常量。在C ++语言级别,编译器必须将其视为非常量。但这并不意味着它不能在最优化阶段之后确定函数指针不会改变,而是在这一点上让它变得难以置信。但至少它是一个本地的。如果你的 F< X> 对象是全局的,并且没有被声明为 static ,它将完全禁止它被认为是常量。

希望你通过函数指针进行内联练习,而不是真正的间接寻址解决方案。当你想要C ++来制作真正的性能时,你可以使用类型的力量。具体来说,当我将模板参数声明为成员函数指针时,它不仅是一个常量,而是它的一部分。我从来没有见过这种技术产生函数调用的情况。

  #include< iostream> 
#include< vector>
#include< algorithm>
#include< ctime>
使用namespace std;

模板< typename T,void(T :: * f _)()>
class F {
public:
void operator()(T& t){
(t。* f _)();
}
};
$ b struct X {
X():x_(0){}

void f(){
++ x_;
}

int x_;
};
$ b $ int __attribute __((flatten))main()
{
const int N = 100000000;
vector< X> XV(N);

auto begin = clock();
for_each(xv.begin(),xv.end(),F< X& X :: f>());
auto end = clock();
cout<<结束 - 开始< ENDL;

}


The following program compiled under gcc 4.6.2 on centos with -O3:

#include <iostream>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;

template <typename T>
class F {
public:
     typedef void (T::*Func)();

     F(Func f) : f_(f) {}

     void operator()(T& t) {
         (t.*f_)();
     }
private:
     Func f_;
};

struct X {
    X() : x_(0) {}

    void f(){
        ++x_;
    }

    int x_;
};

int main()
{
     const int N = 100000000;
     vector<X> xv(N);
     auto begin = clock();
     for_each (xv.begin(), xv.end(), F<X>(&X::f));
     auto end = clock();
     cout << end - begin << endl;
}

objdump -D shows that the generated code for the loop is:

  40097c:       e8 57 fe ff ff          callq  4007d8 <clock@plt>
  400981:       49 89 c5                mov    %rax,%r13
  400984:       0f 1f 40 00             nopl   0x0(%rax)
  400988:       48 89 ef                mov    %rbp,%rdi
  40098b:       48 83 c5 04             add    $0x4,%rbp
  40098f:       e8 8c ff ff ff          callq  400920 <_ZN1X1fEv>
  400994:       4c 39 e5                cmp    %r12,%rbp
  400997:       75 ef                   jne    400988 <main+0x48>
  400999:       e8 3a fe ff ff          callq  4007d8 <clock@plt>

Obviously gcc doesn't inline the function. Why isn't gcc capable of this optimization? Is there any compiler flag that can make gcc do the desired optimization?

解决方案

Some good reading material on this is Scott Adams Meyers' Effective C++ (Third Edition) Item 30: Understand the ins and outs of inlining, where he claims that a call to function pointer is never inlined. The third edition was published in 2008, and I have indeed been able to get gcc to inline function call by compile-time-constant-pointer starting in gcc 4.6, which came out in 2011 (maybe 2010?). However, this was in C and is tricky. In one scenario, I had to declare the calling function __attribute__((flatten)) before it would inline the call (in this situation, I passed the function pointer as the member of a struct, who's pointer I then passed to an inline function that would make the function call by pointer that got inlined).

So in short, no, this isn't a bug gcc, but that doesn't mean that gcc (and/or other compilers) might not be able to inline this some day. But the real issue, I think, is that you don't understand what's really happening here. To get that understanding, you have to think more like an assembly programmer, or a compiler programmer.

You're passing an object of type F<X> and initializing it with a pointer to a member function of another class. You have not declared your instance F<X> object constant, it's Func f_ member as constant, nor your void F::operator()(T& t) member as constant. At the C++ language level, the compiler has to treat it as non-constant. That still doesn't mean that it can't later, at the optimization stage, determine that your function pointer doesn't change, but you're making it incredibly hard at this point. But at least it's a local. If your F<X> object had been global and not declared static, it would forbid it entirely from being considered constant.

Hopefully, you're doing this on an exercise in inlining by function pointer and not as a real solution for indirection. When you want C++ to make real performance stuff, you use the power of types. Specifically, when I declare a template parameter as a member function pointer, it isn't just a constant, it's part of the type. I've never seen a case where this technique generates a function call.

#include <iostream>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;

template <typename T, void (T::*f_)()>
class F {
public:
     void operator()(T& t) {
         (t.*f_)();
     }
};

struct X {
    X() : x_(0) {}

    void f(){
        ++x_;
    }

    int x_;
};

int __attribute__((flatten)) main()
{
     const int N = 100000000;
     vector<X> xv(N);

     auto begin = clock();
     for_each (xv.begin(), xv.end(), F<X, &X::f>());
     auto end = clock();
     cout << end - begin << endl;

}

这篇关于为什么gcc不能内联可以确定的函数指针?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆