性能差异:std :: accumulate vs std :: inner_product vs Loop [英] Difference in performance: std::accumulate vs std::inner_product vs Loop

查看:185
本文介绍了性能差异:std :: accumulate vs std :: inner_product vs Loop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

今天,我想分享一下当我尝试执行此简单操作时令人震惊的事情:

Today, I want to share something that was blowing my mind when I tried to implement this simple operation:

我发现了执行相同操作的不同方法:

I found different ways to perform the same operation:

  1. 使用std::inner_product.
  2. 实现谓词并使用std::accumulate函数.
  3. 使用C风格的循环.
  1. By using the std::inner_product.
  2. Implementing a predicate and using the std::accumulate function.
  3. Using a loop in C style.

我想通过使用Quick Bench并启用所有优化来执行一些基准测试.

I wanted to perform some benchmark by using Quick Bench and enabling all the optimizations.

首先,我将两个C ++替代方案与浮点值进行了比较.这是使用std::accumulate:

First of all, I compared the two C++ alternatives with floating values. This is the code used by using std::accumulate:

const auto predicate = [](const double previous, const double current) {
    return previous + current * current;
};
const auto result = std::accumulate(input.cbegin(), input.cend(), 0, predicate);

使用std::inner_product功能对照此代码:

const auto result = std::inner_product(input.cbegin(), input.cend(), input.cbegin(), 1);

在启用所有优化的情况下运行基准测试后,我得到了以下结果:

After running the benchmark with all the optimization enabled, I got this result:

这两种算法似乎都能达到相同的性能.我确实想进一步尝试C的实现:

Both algorithms seem to reach the same performance. I did want to go further and try the C implementation:

double result = 0;
for (auto i = 0; i < input.size(); ++i) {
  result += input[i] * input[i];
}

令人惊讶的是,我发现:

And surprisingly, I found:

我没想到会有这个结果.我确定出了点问题,所以我检查了GCC的实现:

I was not expecting this result. I was sure there is something wrong so I did check the GCC implementation:

template<typename _InputIterator1, typename _InputIterator2, typename _Tp>
inline _Tp
inner_product(_InputIterator1 __first1, _InputIterator1 __last1,
      _InputIterator2 __first2, _Tp __init)
{
  // concept requirements
  __glibcxx_function_requires(_InputIteratorConcept<_InputIterator1>)
  __glibcxx_function_requires(_InputIteratorConcept<_InputIterator2>)
  __glibcxx_requires_valid_range(__first1, __last1);

  for (; __first1 != __last1; ++__first1, (void)++__first2)
__init = __init + (*__first1 * *__first2);
  return __init;
}

我发现它的作用与C实现相同.在回顾了实现之后,我发现了一些奇怪的东西(或者至少我不希望有那么大的影响):在所有内部累积中,它都在进行从迭代器value_type到初始值类型的转换.

I found that It was doing the same as the C implementation. After reviewing the implementation, I discovered something weird, (or at least I was not expecting to have that significant impact): in all the internal accumulations, it was doing a cast from the iterator value_type to the type of the initial value.

在我的情况下,我将初始值初始化为0或1,这些值被视为整数,并且在每次累加中,编译器都在进行强制转换.在不同的测试案例中,我的输入数组存储了截断的浮点,因此结果没有变化.

In my case, I was initializing the initial values to 0 or 1, the values were considered integers and in each accumulation, the compiler was doing the casting. In the different test cases, my input array stores truncated floating points, so the result did not change.

将初始值更新为双精度类型后:

After updating the initial value to a double type:

const auto result = std::accumulate(input.cbegin(), input.cend(), 0.0, predicate);

并且:

const auto result = std::inner_product(input.cbegin(), input.cend(), input.cbegin(), 0.0);

我得到了预期的结果:

现在,我知道将初始值保留为与迭代器的基础类型无关的独立类型可能会使函数更灵活并允许执行更多操作.但是

Now, I understand that leaving the initial value to be an independent type from the underlying type of the iterator may make the function more flexible and allow to do more things. But,

如果我要堆积数组的元素,那么我期望得到相同的类型.内部产品也一样.

应该是默认行为吗?

标准为何决定以这种方式执行?

推荐答案

如果我要堆积数组的元素,那么我期望得到的结果是相同的类型.

If I am accumulating elements of an array, I am expecting to get the same type as a result.

您的期望是错误的(尽管并不清楚结果类型相同"是什么意思),正如您可以从

Your expectation is wrong (though it is not quite clear what "same type as result" means), as you can clearly see from std::accumulate documentation:

template< class InputIt, class T >
T accumulate( InputIt first, InputIt last, T init );

template< class InputIt, class T, class BinaryOperation >
T accumulate( InputIt first, InputIt last, T init,
              BinaryOperation op );

返回类型与您用于初始值的类型完全相同.您可以在循环中获得相同的效果:

return type is exactly the same type you use for initial value. The same effect you can have on the loop:

auto result = 0; // vs auto result = 0.0;
for (auto i = 0; i < input.size(); ++i) {
  result += input[i] * input[i];
}

为什么标准决定以这种方式执行它?

Why did the standard decide to perform it in this way?

通过这种方式,您可以决定要使用哪种类型进行汇总.注意std::accumulate可用于左折,并且T不等于std::iterator_traits<InputIt>::value_type的情况要比匹配时少(可能更多).

Because this way you can decide what type you use to aggregate. Note std::accumulate can be used for left fold and cases when T not equal to std::iterator_traits<InputIt>::value_type not less often (probably even more) than when they match.

这篇关于性能差异:std :: accumulate vs std :: inner_product vs Loop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆