理解在C ++ 11中lambda函数的开销 [英] Understanding the overhead of lambda functions in C++11

查看:230
本文介绍了理解在C ++ 11中lambda函数的开销的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这已在中提及为什么C ++ lambda在调用多次时比普通函数慢? C ++ 0x Lambda开销< a>
但我认为我的例子与前者的讨论有点不同,并且与后者的结果矛盾。

This was already touched in Why C++ lambda is slower than ordinary function when called multiple times? and C++0x Lambda overhead But I think my example is a bit different from the discussion in the former and contradicts the result in the latter.

在寻找瓶颈在我的代码中,我发现一个recusive模板函数处理一个具有给定处理器函数的可变参数列表,如将值复制到缓冲区中。

On the search for a bottleneck in my code I found a recusive template function that processes a variadic argument list with a given processor function, like copying the value into a buffer.

template <typename T>
void ProcessArguments(std::function<void(const T &)> process)
{}

template <typename T, typename HEAD, typename ... TAIL>
void ProcessArguments(std::function<void(const T &)> process, const HEAD &head, const TAIL &... tail)
{
  process(head);
  ProcessArguments(process, tail...);
}

我将使用此代码的程序的运行时与lambda函数以及使用移动指针将参数复制到全局缓冲器中的全局函数:

I compared the runtime of a program that uses this code together with a lambda function as well as a global function that copies the arguments into a global buffer using a moving pointer:

int buffer[10];
int main(int argc, char **argv)
{
  int *p = buffer;

  for (unsigned long int i = 0; i < 10E6; ++i)
  {
    p = buffer;
    ProcessArguments<int>([&p](const int &v) { *p++ = v; }, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  }
}

时间在我的机器上超过6秒。

compiled with g++ 4.6 and -O3 measuring with the tool time takes more than 6 seconds on my machine while

int buffer[10];
int *p = buffer;
void CopyIntoBuffer(const int &value)
{
  *p++ = value;
}

int main(int argc, char **argv)
{
  int *p = buffer;

  for (unsigned long int i = 0; i < 10E6; ++i)
  {
    p = buffer;
    ProcessArguments<int>(CopyIntoBuffer, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  }

  return 0;
}

大约需要1.4秒。

我不知道幕后的情况,解释时间开销,我想知道我是否可以改变一些东西,使用lambda函数,而不用运行时支付。

I do not get what is going on behind the scenes that explains the time overhead and am wondering if I can change something to make use of lambda functions without paying with runtime.

推荐答案

这里的问题是你对std :: function的使用。
你通过复制发送它,因此复制它的内容(并且在你解开参数时递归地执行)。

The problem here is your usage of std::function. You send it by copy and therefore copying its contents (and doing that recursively as you unwind parameters).

现在, well,只是指向函数。
对于lambda,内容至少是指向您捕获的函数+引用的指针。这是复制的两倍。另外,因为std :: function的类型擦除复制任何数据很可能会更慢(不是内联)。

Now, for pointer to function, contents is, well, just pointer to function. For lambda, contents are at least pointer to function + reference that you captured. This is twice as much to copy. Plus, because of std::function's type erasure copying any data will most likely be slower (not inlined).

这里有几个选项,最好的可能是传递不是std :: function,而是模板。好处是你的方法调用更可能是内联的,没有类型擦除发生std :: function,没有复制发生,一切都非常好。像这样:

There are several options here, and the best would probably be passing not std::function, but template instead. The benefits are that your method call is more likely to be inlined, no type erasure happens by std::function, no copying happens, everything is so very good. Like that:

template <typename TFunc>
void ProcessArguments(const TFunc& process)
{}

template <typename TFunc, typename HEAD, typename ... TAIL>
void ProcessArguments(const TFunc& process, const HEAD &head, const TAIL &... tail)
{
  process(head);
  ProcessArguments(process, tail...);
}

然后,你可以做同样的,但通过复制发送过程。现在,复制确实发生,但仍然是整洁的内联。同样重要的是,进程的主体也可以内联,特别是对于lamda:
template
void ProcessArguments(TFunc process)
{}

Then, you could actually do the same but send the process by copy. Now, copying does happen, but still is neatly inlined. What's equally important is that process' body can also be inlined, especially for lamda: template void ProcessArguments(TFunc process) {}

template <typename TFunc, typename HEAD, typename ... TAIL>
void ProcessArguments(TFunc process, const HEAD &head, const TAIL &... tail)
{
  process(head);
  ProcessArguments(process, tail...);
}

第三,尝试通过引用传递std :: function<这样你至少避免复制,但调用不会被内联。

Third, well, try passing std::function<> by reference. This way you at least avoid copying, but calls will not be inlined.

这里是一些perf的结果(使用ideones的C ++ 11编译器)。
请注意,正如预期的那样,内联的lambda正在为您提供最佳性能:

Here are some perf results (using ideones' C++11 compiler). Note that, as expected, inlined lambda body is giving you best performance:

Original function:
0.483035s

Original lambda:
1.94531s


Function via template copy:
0.094748

### Lambda via template copy:
0.0264867s


Function via template reference:
0.0892594s

### Lambda via template reference:
0.0264201s


Function via std::function reference:
0.0891776s

Lambda via std::function reference:
0.09s

这篇关于理解在C ++ 11中lambda函数的开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆