在转换一个向量的元素时连接两个向量的最佳方法是什么? [英] What is the optimal way to concatenate two vectors whilst transforming elements of one vector?

查看:48
本文介绍了在转换一个向量的元素时连接两个向量的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有

std::vectorvec1 {/* 填充了 T1 的 */};std::vectorvec2 {/* 填充 T2 的 */};

和一些函数 T1 f(T2) 当然可以是一个 lambda.在将 f 应用于 vec2<中的每个 T2 的同时连接 vec1vec2 的最佳方法是什么?/代码>?

显而易见的解决方案是std::transform,即

vec1.reserve(vec1.size() + vec2.size());std::transform(vec2.begin(), vec2.end(), std::back_inserter(vec1), f);

但我说这不是最佳,因为std::back_inserter必须对每个插入的元素进行不必要的容量检查.什么是最佳的,就像

vec1.insert(vec1.end(), vec2.begin(), vec2.end(), f);

只需一次容量检查就可以逃脱.遗憾的是,这不是有效的 C++.本质上,这与 std::vector::insert 最适合向量连接的原因相同,请参阅 this 问题和 this 问题以进一步讨论这一点.

所以:

  1. std::transform 是使用 STL 的最佳方法吗?
  2. 如果是这样,我们可以做得更好吗?
  3. 上述 insert 函数被排除在 STL 之外是否有充分的理由?

更新

我已经尝试验证多次容量检查是否确实有任何明显的成本.为此,我基本上只是将 id 函数 (f(x) = x) 传递给 std::transformpush_back 中讨论的方法答案.完整代码为:

#include #include <向量>#include <迭代器>#include <算法>#include #include <chrono>#include <数字>#include <随机>使用 std::size_t;std::vectorgenerate_random_ints(size_t n){std::default_random_engine 生成器;自动种子1 = std::chrono::system_clock::now().time_since_epoch().count();generator.seed((无符号)seed1);std::uniform_int_distribution制服 {};std::vectorv(n);std::generate_n(v.begin(), n, [&] () { return uniform(generator); });返回 v;}模板 <typename D=std::chrono::nanoseconds, typename F>D 基准(F f,无符号 num_tests){D 总{0};for (unsigned i = 0; i (结束 - 开始);}返回 D {total/num_tests};}模板 void std_insert(std::vector vec1, const std::vector &vec2){vec1.insert(vec1.end(), vec2.begin(), vec2.end());}template void push_back_concat(std::vector vec1, const std::vector &vec2, UnaryOperation op){vec1.reserve(vec1.size() + vec2.size());for (const auto& x : vec2) {vec1.push_back(op(x));}}template void transform_concat(std::vector vec1, const std::vector &vec2, UnaryOperation op){vec1.reserve(vec1.size() + vec2.size());std::transform(vec2.begin(), vec2.end(), std::back_inserter(vec1), op);}int main(int argc, char **argv){无符号 num_tests {1000};size_t vec1_size {10000000};size_t vec2_size {10000000};自动 vec1 = generate_random_ints(vec1_size);自动 vec2 = generate_random_ints(vec1_size);auto f_std_insert = [&vec1, &vec2] () {std_insert(vec1, vec2);};auto f_push_back_id = [&vec1, &vec2] () {push_back_concat(vec1, vec2, [] (int i) { return i; });};auto f_transform_id = [&vec1, &vec2] () {transform_concat(vec1, vec2, [] (int i) { return i; });};auto std_insert_time = benchmark<std::chrono::milliseconds>(f_std_insert, num_tests).count();auto push_back_id_time = benchmark(f_push_back_id, num_tests).count();auto transform_id_time = benchmark(f_transform_id, num_tests).count();std::cout <<标准插入:" <<std_insert_time <<毫秒"<

编译:

g++ vector_insert_demo.cpp -std=c++11 -O3 -o vector_insert_demo

输出:

std_insert:44mspush_back_id:61 毫秒变换 ID:61 毫秒

编译器将内联 lambda,因此可以安全地降低成本.除非其他人对这些结果有一个可行的解释(或愿意检查组件),否则我认为可以合理地得出多项容量检查的成本很高的结论.

解决方案

UPDATE:性能差异是由于 reserve() 调用造成的,至少在 libstdc++ 中,它使容量恰好是您请求的内容,而不是使用指数增长因子.

<小时>

我做了一些计时测试,结果很有趣.使用 std::vector::insertboost::transform_iterator 是我发现的最快方法:

版本 1:

voidappendTransformed1(std::vector&vec1,const std::vector&vec2){auto v2begin = boost::make_transform_iterator(vec2.begin(),f);auto v2end = boost::make_transform_iterator(vec2.end(),f);vec1.insert(vec1.end(),v2begin,v2end);}

版本 2:

voidappendTransformed2(std::vector&vec1,const std::vector&vec2){vec1.reserve(vec1.size()+vec2.size());对于(自动 x:vec2){vec1.push_back(f(x));}}

版本 3:

voidappendTransformed3(std::vector&vec1,const std::vector&vec2){vec1.reserve(vec1.size()+vec2.size());std::transform(vec2.begin(),vec2.end(),std::inserter(vec1,vec1.end()),f);}

时间:

<前>版本 1:0.59s版本 2:8.22s版本 3:8.42s

main.cpp:

#include <算法>#include <cassert>#include <chrono>#include <迭代器>#include #include <随机>#include <向量>#include "appendtransformed.hpp"使用 std::cerr;模板<类型名称引擎>静态 std::vectorrandomInts(Engine &engine,size_t n){自动分配 = std::uniform_int_distribution(0,999);自动生成器 = [&]{返回分布(引擎);};auto vec = std::vector();std::generate_n(std::inserter(vec,vec.end()),n,generator);返回 vec;}模板<类型名称引擎>静态 std::vectorrandomFloats(Engine &engine,size_t n){自动分配 = std::uniform_real_distribution(0,1000);自动生成器 = [&]{返回分布(引擎);};auto vec = std::vector();std::generate_n(std::inserter(vec,vec.end()),n,generator);返回 vec;}静态自动appendTransformedFunction(int 版本) ->void(*)(std::vector&,const std::vector &){开关(版本){情况1:返回appendTransformed1;情况 2:返回 appendTransformed2;情况 3:返回 appendTransformed3;默认:cerr<<未知版本:" <<版本<<"\n";退出(EXIT_FAILURE);}返回0;}int main(int argc,char **argv){如果(argc!=2){cerr<<"用法:appendtest (1|2|3)\n";退出(EXIT_FAILURE);}自动版本 = atoi(argv[1]);汽车引擎 = std::default_random_engine();汽车 vec1_size = 1000000u;汽车 vec2_size = 1000000u;自动计数 = 100;汽车 vec1 = randomInts(引擎,vec1_size);汽车 vec2 = randomFloats(engine,vec2_size);命名空间计时 = std::chrono;使用 chrono::system_clock;自动 appendTransformed = appendTransformedFunction(version);自动开始时间 = system_clock::now();for (auto i=0; i!=count; ++i) {appendTransformed(vec1,vec2);}自动结束时间 = system_clock::now();断言(vec1.size()== vec1_size+count*vec2_size);自动求和 = std::accumulate(vec1.begin(),vec1.end(),0u);auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();cerr<<使用版本" <<版本<<":\n";cerr<<总和="<<总和<<"\n";cerr<<" 过去了:"<<elapsed_seconds <<"s\n";}

编译器:g++ 4.9.1

选项:-std=c++11 -O2

Suppose I have

std::vector<T1> vec1 {/* filled with T1's */};
std::vector<T2> vec2 {/* filled with T2's */};

and some function T1 f(T2) which could of course be a lambda. What is the optimal way to concatenate vec1 and vec2 whilst applying f to each T2 in vec2?

The apparently obvious solution is std::transform, i.e.

vec1.reserve(vec1.size() + vec2.size());
std::transform(vec2.begin(), vec2.end(), std::back_inserter(vec1), f);

but I say this is not optimal as std::back_inserter must make an unnecessary capacity check on each inserted element. What would be optimal is something like

vec1.insert(vec1.end(), vec2.begin(), vec2.end(), f);

which could get away with a single capacity check. Sadly this is not valid C++. Essentially this is the same reason why std::vector::insert is optimal for vector concatenation, see this question and the comments in this question for further discussion on this point.

So:

  1. Is std::transform the optimal method using the STL?
  2. If so, can we do better?
  3. Is there a good reason why the insert function described above was left out of the STL?

UPDATE

I've had a go at verifying if the multiple capacity checks do have any noticeable cost. To do this I basically just pass the id function (f(x) = x) to the std::transform and push_back methods discussed in the answers. The full code is:

#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <cstdint>
#include <chrono>
#include <numeric>
#include <random>

using std::size_t;

std::vector<int> generate_random_ints(size_t n)
{
    std::default_random_engine generator;
    auto seed1 = std::chrono::system_clock::now().time_since_epoch().count();
    generator.seed((unsigned) seed1);
    std::uniform_int_distribution<int> uniform {};
    std::vector<int> v(n);
    std::generate_n(v.begin(), n, [&] () { return uniform(generator); });
    return v;
}

template <typename D=std::chrono::nanoseconds, typename F>
D benchmark(F f, unsigned num_tests)
{
    D total {0};
    for (unsigned i = 0; i < num_tests; ++i) {
        auto start = std::chrono::system_clock::now();
        f();
        auto end = std::chrono::system_clock::now();
        total += std::chrono::duration_cast<D>(end - start);
    }
    return D {total / num_tests};
}

template <typename T>
void std_insert(std::vector<T> vec1, const std::vector<T> &vec2)
{
    vec1.insert(vec1.end(), vec2.begin(), vec2.end());
}

template <typename T1, typename T2, typename UnaryOperation>
void push_back_concat(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
{
    vec1.reserve(vec1.size() + vec2.size());
    for (const auto& x : vec2) {
        vec1.push_back(op(x));
    }
}

template <typename T1, typename T2, typename UnaryOperation>
void transform_concat(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
{
    vec1.reserve(vec1.size() + vec2.size());
    std::transform(vec2.begin(), vec2.end(), std::back_inserter(vec1), op);
}

int main(int argc, char **argv)
{
    unsigned num_tests {1000};
    size_t vec1_size {10000000};
    size_t vec2_size {10000000};

    auto vec1 = generate_random_ints(vec1_size);
    auto vec2 = generate_random_ints(vec1_size);

    auto f_std_insert = [&vec1, &vec2] () {
        std_insert(vec1, vec2);
    };
    auto f_push_back_id = [&vec1, &vec2] () {
        push_back_concat(vec1, vec2, [] (int i) { return i; });
    };
    auto f_transform_id = [&vec1, &vec2] () {
        transform_concat(vec1, vec2, [] (int i) { return i; });
    };

    auto std_insert_time   = benchmark<std::chrono::milliseconds>(f_std_insert, num_tests).count();
    auto push_back_id_time = benchmark<std::chrono::milliseconds>(f_push_back_id, num_tests).count();
    auto transform_id_time = benchmark<std::chrono::milliseconds>(f_transform_id, num_tests).count();

    std::cout << "std_insert: " << std_insert_time << "ms" << std::endl;
    std::cout << "push_back_id: " << push_back_id_time << "ms" << std::endl;
    std::cout << "transform_id: " << transform_id_time << "ms" << std::endl;

    return 0;
}

Compiled with:

g++ vector_insert_demo.cpp -std=c++11 -O3 -o vector_insert_demo

Output:

std_insert: 44ms
push_back_id: 61ms
transform_id: 61ms

The compiler will have inlined the lambda, so that cost can be safely be discounted. Unless anyone else has a viable explanation for these results (or is willing to check the assembly), I think it's reasonable to conclude there is a noticeable cost of the multiple capacity checks.

解决方案

UPDATE: The performance difference is due to the reserve() calls, which, in libstdc++ at least, make the capacity be exactly what you request instead of using the exponential growth factor.


I did some timing tests, with interesting results. Using std::vector::insert along with boost::transform_iterator was the fastest way I found by a large margin:

Version 1:

void
  appendTransformed1(
    std::vector<int> &vec1,
    const std::vector<float> &vec2
  )
{
  auto v2begin = boost::make_transform_iterator(vec2.begin(),f);
  auto v2end   = boost::make_transform_iterator(vec2.end(),f);
  vec1.insert(vec1.end(),v2begin,v2end);
}

Version 2:

void
  appendTransformed2(
    std::vector<int> &vec1,
    const std::vector<float> &vec2
  )
{
  vec1.reserve(vec1.size()+vec2.size());
  for (auto x : vec2) {
    vec1.push_back(f(x));
  }
}

Version 3:

void
  appendTransformed3(
    std::vector<int> &vec1,
    const std::vector<float> &vec2
  )
{
  vec1.reserve(vec1.size()+vec2.size());
  std::transform(vec2.begin(),vec2.end(),std::inserter(vec1,vec1.end()),f);
}

Timing:

    Version 1: 0.59s
    Version 2: 8.22s
    Version 3: 8.42s

main.cpp:

#include <algorithm>
#include <cassert>
#include <chrono>
#include <iterator>
#include <iostream>
#include <random>
#include <vector>
#include "appendtransformed.hpp"

using std::cerr;

template <typename Engine>
static std::vector<int> randomInts(Engine &engine,size_t n)
{
  auto distribution = std::uniform_int_distribution<int>(0,999);
  auto generator = [&]{return distribution(engine);};
  auto vec = std::vector<int>();
  std::generate_n(std::inserter(vec,vec.end()),n,generator);
  return vec;
}

template <typename Engine>
static std::vector<float> randomFloats(Engine &engine,size_t n)
{
  auto distribution = std::uniform_real_distribution<float>(0,1000);
  auto generator = [&]{return distribution(engine);};
  auto vec = std::vector<float>();
  std::generate_n(std::inserter(vec,vec.end()),n,generator);
  return vec;
}

static auto
  appendTransformedFunction(int version) ->
    void(*)(std::vector<int>&,const std::vector<float> &)
{
  switch (version) {
    case 1: return appendTransformed1;
    case 2: return appendTransformed2;
    case 3: return appendTransformed3;
    default:
      cerr << "Unknown version: " << version << "\n";
      exit(EXIT_FAILURE);
  }

  return 0;
}

int main(int argc,char **argv)
{
  if (argc!=2) {
    cerr << "Usage: appendtest (1|2|3)\n";
    exit(EXIT_FAILURE);
  }
  auto version = atoi(argv[1]);
  auto engine = std::default_random_engine();
  auto vec1_size = 1000000u;
  auto vec2_size = 1000000u;
  auto count = 100;
  auto vec1 = randomInts(engine,vec1_size);
  auto vec2 = randomFloats(engine,vec2_size);
  namespace chrono = std::chrono;
  using chrono::system_clock;
  auto appendTransformed = appendTransformedFunction(version);
  auto start_time = system_clock::now();
  for (auto i=0; i!=count; ++i) {
    appendTransformed(vec1,vec2);
  }
  auto end_time = system_clock::now();
  assert(vec1.size() == vec1_size+count*vec2_size);
  auto sum = std::accumulate(vec1.begin(),vec1.end(),0u);
  auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();

  cerr << "Using version " << version << ":\n";
  cerr << "  sum=" << sum << "\n";
  cerr << "  elapsed: " << elapsed_seconds << "s\n";
}

Compiler: g++ 4.9.1

Options: -std=c++11 -O2

这篇关于在转换一个向量的元素时连接两个向量的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆