通过引用传递过多的论点可能效率低下? [英] Passing too many arguments by reference could be inefficient?

查看:99
本文介绍了通过引用传递过多的论点可能效率低下?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

免责声明::我正在使用Intel Compiler 2017,如果您想知道为什么要这样做,请在问题末尾进行.

Disclamer: I'm using Intel Compiler 2017 and if you want to know why I'm doing this, go at the end of the question.

我有此代码:

class A{
  vector<float> v;
  ...
  void foo();
  void bar();
}

void A::foo(){
  for(int i=0; i<bigNumber;i++){
    //something very expensive
    //call bar() many times per cycle;
  }
}

void A::bar(){
  //...
  v.push_back(/*something*/);
}

现在,让我们假设我要并行化foo(),因为它非常昂贵.但是,由于v.push_back(),我不能简单地使用#pragma omp parallel for.

Now, let's suppose I want to parallelize foo() since it's very expensive. However, I can't simply use #pragma omp parallel for because of v.push_back().

据我所知,这里有两种选择:

To my knowledge, there are two alternatives here:

  1. 我们使用#pragma omp critical
  2. 我们为每个线程创建一个本地版本的v,然后在并行部分的末尾将它们连接起来,或多或少地按
  1. We use #pragma omp critical
  2. We create a local version of v for each thread and then we joint them at the end of the parallel section, more or less as explained here.

解决方案1.通常被认为是不好的解决方案,因为竞争条件会产生一致的开销.

Solution 1. is often considered a bad solution because race-condition creates a consistent overhead.

但是,解决方案2需要以这种方式修改bar():

However, solution 2. requires to modify bar() in this way:

class A{
  vector<float> v;
  ...
  void foo();
  void bar(std::vector<float> &local_v);
}

void A::foo(){
  #pragma omp parallel
  {
    std::vector<float> local_v;
    #pragma omp for
    for(int i=0; i<bigNumber;i++){
      //something very expensive
      //call bar(local_v) many times per cycle;
    }
    #pragma omp critical
    {
      v.insert(v.end(), local_v.begin(), local_v.end());
    }
  }
}

void A::bar(std::vector<float> &local_v){
  //...
  v.push_back(/*something*/);
}

到目前为止,一切都很好.现在,让我们假设不仅有v,而且有10个向量,例如v1v2,...,v10,或者总之有10个共享变量.另外,我们假设bar不是直接在foo()内部调用,而是在许多嵌套调用之后调用.类似于foo()会调用foo1(std::vector<float> v1, ..., std::vector<float> v10)会调用foo2(std::vector<float> v1, ..., std::vector<float> v10),多次重复此嵌套调用,直到最后一个调用bar(std::vector<float> v1, ..., std::vector<float> v10).

So far so good. Now, let's suppose that there is not only v, but there are 10 vectors, say v1, v2, ..., v10, or anyway 10 shared variables. And in addition, let's suppose that that bar isn't called directly inside foo() but is called after many nested calls. Something like foo() which calls foo1(std::vector<float> v1, ..., std::vector<float> v10) which calls foo2(std::vector<float> v1, ..., std::vector<float> v10), repeating this nested calling many other times until finally the last one calls bar(std::vector<float> v1, ..., std::vector<float> v10).

因此,这似乎是可维护性的噩梦(我必须修改所有嵌套函数的所有标头和调用)...但更重要的是:我们同意按引用传递是有效的,但它始终是指针复制.如您所见,这里很多指针被复制了很多次.所有这些副本是否可能导致效率低下?

So, this looks like a nightmare for maintainability (I have to modify all the headers and callings for all the nested functions)...But even more important: we agree that passing by reference is efficient, but it's always a pointer copy. As you can see, here a lot of pointers are copied many times. Is it possible that all these copies result as inefficiency?

实际上,我最关心的是性能,因此,如果您告诉我很好,因为编译器是超级智能的,并且它们会执行某些魔术操作,因此您可以复制一万亿个引用,并且性能不会下降".会很好,但我不知道是否存在这样的法术.

Actually what I care most here is performance, so if you tell me "nah, it's fine because compilers are super intelligent and they do some sorcery so you can copy one trillion of references and there is no drop in performance" then it will be fine, but I don't know if such a sorcery exists or not.

我为什么要这样做: 我正在尝试并行化代码.特别是,我将while 此处重写为for可以并行化,但是如果您遵循该代码,则会发现来自onAffineShapeFound .cpp"rel =" nofollow noreferrer>此处被调用,它将修改共享对象keys的状态.许多其他变量也会发生这种情况,但这是此代码最深"的情况.

Why I'm doing this: I'm trying to parallelize this code. In particular, I'm rewriting the while here as a for which can be parallelized, but if you follow the code you'll find out that the call-back onAffineShapeFound from here is called, which modify the state of the shared object keys. This happens for many others variable, but this is the "deepest" case for this code.

推荐答案

a::Bar()a::Bar(std::vector<float> & v)之间的直接比较中,区别在于第二个版本必须将堆栈的大小再增加8个个字节超过原始版本所要做的工作.就性能而言,这是一个很小的影响:无论函数是否包含参数,都必须调整堆栈指针(因此,唯一的区别是单个指针副本,甚至可以根据编译器对其进行优化) ),并且就函数本身的实际性能而言,向std::vector不断添加元素将是昂贵得多的操作,尤其是在需要重新分配向量的情况下(根据不同的情况可能会经常发生)向量需要多大),这意味着这些开销将远远超过指针副本的开销.

In a direct comparison between a::Bar() and a::Bar(std::vector<float> & v), the difference is that the second version will have to increase the size of the stack by an additional 8 bytes over what the original version has to do. In terms of performance, this is a pretty minimal effect: the stack pointer has to be adjusted no matter whether the function contains arguments or not (so the only real difference is a single pointer copy, which might even be optimized away depending on the compiler), and in terms of the actual performance of the function itself, constantly adding elements to a std::vector is going to be a far more expensive operation, especially if the vector ever needs to be reallocated (which will probably happen frequently, depending on how big the vector needs to get), which means that those costs will far exceed the costs of the pointer copy.

因此,简短的版本:仔细阅读参考文献.

So, short version: Go nuts with the references.

这篇关于通过引用传递过多的论点可能效率低下?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆