为什么将函数包装到lambda中可能会使程序更快? [英] Why wrapping a function into a lambda potentially make the program faster?

查看：52 发布时间：2021/5/28 20:07:58 c++ performance lambda stl inlining

本文介绍了为什么将函数包装到lambda中可能会使程序更快?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

标题可能太笼统了.我正在对大型 vector< unsigned>上的以下2条语句进行基准测试v :

The title may be too general. I am benchmarking the following 2 statements on a large vector<unsigned> v:

sort(v.begin(), v.end(), l);

sort(v.begin(), v.end(), [](unsigned a, unsigned b) { return l(a, b); });

其中 l 被定义为

bool l(unsigned a, unsigned b) { return a < b; }

结果使我感到惊讶:第二个速度与 sort(v.begin()，v.end()); 或 sort(v.begin()，v.)一样快.end()，std :: less<>(()); ，而第一个则明显慢一些.

The result surprises me: the second is as fast as sort(v.begin(), v.end()); or sort(v.begin(), v.end(), std::less<>()); while the first is significantly slower.

我的问题是为什么将函数包装在lambda中会加快程序的速度.

My question is why wrapping the function in a lambda speeds up the program.

此外， sort(v.begin()，v.end()，[](unsigned a，unsigned b){return l(b，a);}); 一样快也是如此.

Moreover, sort(v.begin(), v.end(), [](unsigned a, unsigned b) { return l(b, a); }); is as fast, too.

相关代码:

#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <functional>
#include <algorithm>

using std::cout;
using std::endl;
using std::vector;

bool l(unsigned a, unsigned b) { return a < b; };

int main(int argc, char** argv)
{
    auto random = std::default_random_engine();
    vector<unsigned> d;
    for (unsigned i = 0; i < 100000000; ++i)
        d.push_back(random());
    auto t0 = std::chrono::high_resolution_clock::now();
    std::sort(d.begin(), d.end());
    auto t1 = std::chrono::high_resolution_clock::now();
    cout << std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() << endl;


    d.clear();
    for (unsigned i = 0; i < 100000000; ++i)
        d.push_back(random());
    t0 = std::chrono::high_resolution_clock::now();
    std::sort(d.begin(), d.end(), l);
    t1 = std::chrono::high_resolution_clock::now();
    cout << std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() << endl;

    d.clear();
    for (unsigned i = 0; i < 100000000; ++i)
        d.push_back(random());
    t0 = std::chrono::high_resolution_clock::now();
    std::sort(d.begin(), d.end(), [](unsigned a, unsigned b) {return l(a, b); });
    t1 = std::chrono::high_resolution_clock::now();
    cout << std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() << endl;
    return 0;
}

已在g ++和MSVC上进行了测试.

Tested on both g++ and MSVC.

更新:

我发现lambda版本生成的汇编代码与默认代码完全相同( sort(v.begin()，v.end()))，而使用函数的代码则不同.但是我不知道汇编程序，因此不能做更多的事情.

I found that the lambda version generate exactly same assembly code as default one (sort(v.begin(), v.end())), while the one using a function is different. But I do not know assembly and thus can't do more.

推荐答案

sort 可能是一个大函数，因此通常不进行内联.因此，它是单独编译的.考虑 sort :

sort is potentially a big function, so it's usually not inlined. Therefore, it is compiled alone. Consider sort:

template <typename RanIt, typename Pred>
void sort(RanIt, RanIt, Pred)
{
}

如果 Pred 是 bool(*)(无符号，无符号)，则无法内联函数—函数指针类型不能唯一地标识函数.只有一个 sort< It，It，bool(*)(unsigned，unsigned)> ，并且所有具有不同函数指针的调用都将调用它.用户将 l 传递给函数，但这只是作为普通参数处理.因此，不可能内联该呼叫.

If Pred is bool (*)(unsigned, unsigned), there is no way to inline the function — a function pointer type cannot uniquely identify a function. There is only one sort<It, It, bool (*)(unsigned, unsigned)>, and it is invoked by all calls with different function pointers. The user passes l to the function, but that's just processed as an ordinary argument. It is therefore impossible to inline the call.

如果 Pred 是lambda，则内联函数调用—很简单.lambda类型唯一标识一个函数.每次对此 sort 实例化的调用都调用相同的(lambda)函数，因此函数指针没有问题.lambda本身包含对 l 的直接调用，这也很容易内联.因此，编译器内联所有函数调用并生成与无谓词 sort 相同的代码.

If Pred is a lambda, it is trivial to inline the function call — the lambda type uniquely identifies a function. Every call to this instantiation of sort invoke the same (lambda) function, so we don't have the problem for function pointers. The lambda itself contains a direct call to l, which is also easy to inline. Therefore, the compiler inlines all function calls and generate the same code as a no-predicate sort.

具有函数闭包类型( std :: less<> )的情况类似:调用 std :: less<> 的行为是完全完全的.编译 sort 时已知，因此内联很简单.

The case with a function closure type (std::less<>) is similar: the behavior of calling a std::less<> is fully known when compiling sort, so inlining is trivial.

这篇关于为什么将函数包装到lambda中可能会使程序更快?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么将函数包装到lambda中可能会使程序更快? [英] Why wrapping a function into a lambda potentially make the program faster?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

为什么将函数包装到lambda中可能会使程序更快? [英] Why wrapping a function into a lambda potentially make the program faster?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭