同步push_back和std :: thread [英] Synchronise push_back and std::thread

查看:133
本文介绍了同步push_back和std :: thread的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码

void build(std::vector<RKD <DivisionSpace> >& roots, ...) {
  try {
    // using a local lock_guard to lock mtx guarantees unlocking on destruction / exception:
    std::lock_guard<std::mutex> lck (mtx);
    roots.push_back(RKD<DivisionSpace>(...));
  }
  catch (const std::bad_alloc&) {
    std::cout << "[exception caught when constructing tree]\n";
    return;
  }
}

现在,实际工作应按顺序进行,而不是

Now, the actual work should be done serially, not in parallel.

RKD 的构造函数可以与 RKD的其他构造函数并行运行。但是,将对象推回到 std :: Vector 是关键部分,对吧?

The constructor of RKD can run in parallel with other constructors of RKD. However, pushing the objects back in std::Vector is a critical section, right?

我将要构建的对象是已知的。实际上,它将在[2,16]范围内。从理论上讲,它可以是任何正数。

The number of the objects I am going to be build is known. It is going to be something in the range [2, 16] in practise. In theory it could be any positive number.

此外,对于将它们插入容器中的顺序,我也不感兴趣。

Also, I am not interesting for the order they will be inserted in the container.

所以我可以做类似的事情:

So I could do something like:

RKD tree = RKD(...);
mutex_lock(...);
roots.push_back(tree);

但是这意味着要复制吗?

However this would imply copying, wouldn't it?

我应该怎么做才能使代码并行?

What should I do to make my code parallel?

我决定使用锁(而不是锁)。只是一个互斥锁),原因是答案。

I decided to use a lock (instead of just a mutex) because of this answer.

推荐答案

Tomasz Lewowski在他的评论中提出的建议,我已经扩展了,这一建议非常简单,并基于以下观察:A push_back std :: vector 可能需要重新分配后备存储并复制(或最好移动)元素。这是一个需要同步的关键部分。

The suggestion that Tomasz Lewowski has brought up in his comment and I have expanded upon is pretty simple and based upon the following observation: A push_back on a std::vector potentially needs to re-allocate the backing store and copy (or, preferably, move) the elements. This constitutes a critical section that needs to be synchronized.

在下面的示例中,假设我们要让矢量填充前12个素数,但我们不在乎关于他们的订购。 (我在这里只是对数字进行了硬编码,但假定它们是通过一些有意义的并行并行计算获得的。)在以下情况下存在危险的竞态条件。

For the next examples, assume we want to have a vector filled with the first 12 primes but we don't care about their ordering. (I have just hard-coded the numbers here but assume they are obtained via some expensive computation that makes sense to do in parallel.) There is a dangerous race condition in the following scenario.

std::vector<int> numbers {};  // an empty vector

// thread A             // thread B             // thread C

numbers.push_back( 2);  numbers.push_back(11);  numbers.push_back(23);
numbers.push_back( 3);  numbers.push_back(13);  numbers.push_back(27);
numbers.push_back( 5);  numbers.push_back(17);  numbers.push_back(29);
numbers.push_back( 7);  numbers.push_back(19);  numbers.push_back(31);

push_back 还有另一个问题。如果两个线程同时调用它,它们都将尝试以相同的索引构造对象,从而可能造成灾难性的后果。因此,在分叉线程之前,不能通过 reserve(n)解决问题。

There is also another problem with the push_back. If two threads call it simultaneously, they will both attempt to construct an object at the same index with potentially disastrous consequences. So the problem is not solved with a reserve(n) before forking the threads.

但是,因为您知道预先设置元素的数量,您只需将它们分配 std :: vector 内的特定位置即可。如果您不更改大小,则没有关键部分。因此,在以下情况下不会出现竞赛。

However, since you know the number of elements in advance, you can simply assign them to a specific location inside a std::vector without changing its size. If you don't change the size, there is no critical section. Therefore, there is no race in the following scenario.

std::vector<int> numbers(12);  // 12 elements initialized with 0

// thread A          // thread B          // thread C

numbers[ 0] =  2;    numbers[ 1] =  3;    numbers[ 2] =  5;
numbers[ 3] =  7;    numbers[ 4] = 11;    numbers[ 5] = 13;
numbers[ 6] = 17;    numbers[ 7] = 19;    numbers[ 8] = 23;
numbers[ 9] = 29;    numbers[10] = 31;    numbers[11] = 37;

当然,如果两个线程都尝试写入相同索引,比赛将再次出现。幸运的是,在实践中保护这一点并不困难。如果您的向量具有 n 个元素并且您具有 p 个线程,则线程 i 仅写入元素[ i n / p ,( i +1) n / p )。请注意,这比让线程 i 仅在 j mod p = i ,因为它导致更少的缓存失效。因此,上面示例中的访问模式不是最佳的,最好是这样。

Of course, if both threads attempt to write to the same index, the race will be there again. Fortunately, protecting against this is not difficult in practice. If your vector has n elements and you have p threads, thread i writes only to elements [i n / p, (i + 1) n / p). Note that this is preferable over having thread i write to elements at index j only if j mod p = i because it leads to fewer cache invalidations. So the access pattern in the above example is sub-optimal and had better been like this.

std::vector<int> numbers(12);  // 12 elements initialized with 0

// thread A          // thread B          // thread C

numbers[ 0] =  2;    numbers[ 4] = 11;    numbers[ 8] = 23;
numbers[ 1] =  3;    numbers[ 5] = 13;    numbers[ 9] = 29;
numbers[ 2] =  5;    numbers[ 6] = 17;    numbers[10] = 31;
numbers[ 3] =  7;    numbers[ 7] = 19;    numbers[11] = 37;

到目前为止很好。但是,如果您没有 std :: vector< int> 而又没有 std :: vector< Foo> ?如果 Foo 没有默认的构造函数,则

So far so good. But what if you don't have a std::vector<int> but a std::vector<Foo>? If Foo does not have a default constructor, then

std::vector<Foo> numbers(10);

将无效。即使有一个对象,创建许多昂贵的默认构造的对象只是在不获取值的情况下尽快对其进行重新分配也将是非常荒谬的。

will be invalid. And even if it has one, it would be outrageous to create many expensive default-constructed objects just to re-assign them soon without ever retrieving the value.

当然,大多数设计良好的类都应具有非常便宜的默认构造函数。例如, std :: string 默认情况下构造为不需要内存分配的空字符串。一个好的实现可以将默认构造字符串的成本降低到

Of course, most well-designed classes should have a very cheap default constructor. For example, a std::string is default constructed to an empty string that requires no memory allocation. A good implementation will reduce the cost of default-constructing a string to just

std::memset(this, 0, sizeof(std::string));

如果编译器足够聪明,可以弄清楚我们正在分配和初始化整个 std :: vector< std :: string>(n),也许可以通过单次调用

And if the compiler is smart enough to figure out that we are allocating and initializing an entire std::vector<std::string>(n), it might be able to optimize this further to a single call to

std::calloc(n, sizeof(std::string));

因此,如果有机会,您可以使 Foo 廉价地默认可构造和可分配,您就完成了。但是,如果发现这很困难,则可以通过将其移至堆来避免此问题。智能指针可以廉价地默认构造,因此

So if there is any chance you can make Foo be cheaply default-constructible and assignable, you are done. However, if this turns out to be difficult, you can avoid the problem by moving it to the heap. A smart pointer is cheaply default-constructible, so

std::vector<std::unique_ptr<Foo>> foos(n);

最终将减少为a

std::calloc(n, sizeof(std::unique_ptr<Foo>));

您无需对 Foo 做任何事情。当然,此便利是以为每个元素分配动态内存为代价的。

without you doing anything to Foo. Of course, this convenience comes at the price of a dynamic memory allocation for each element.

std::vector<std::unique_ptr<Foo>> foos(n);

// thread A                    // thread B                           // thread C

foos[0].reset(new Foo {...});  foos[n / 3 + 0].reset(new Foo {...});  foos[2 * n / 3 + 0].reset(new Foo {...});
foos[1].reset(new Foo {...});  foos[n / 3 + 1].reset(new Foo {...});  foos[2 * n / 3 + 1].reset(new Foo {...});
foos[2].reset(new Foo {...});  foos[n / 3 + 2].reset(new Foo {...});  foos[2 * n / 3 + 2].reset(new Foo {...});
...                            ...                                    ...

这可能不如您糟糕可能会想,因为虽然动态内存分配不是免费的,但是 sizeof std :: unique_ptr 很小,因此如果 sizeof(Foo)很大,您可以获得更紧凑的向量的好处,该向量迭代速度更快。当然,这完全取决于您打算如何使用数据。

This might not be as bad as you might think because while dynamic memory allocations are not free, the sizeof a std::unique_ptr is very small so if sizeof(Foo) is large, you get the bonus of a more compact vector that is faster to iterate. It all depends of course how you intend to use your data.

如果您事先不知道确切的元素数,或者担心会弄乱数据索引,还有另一种方法:让每个线程填充自己的向量,并在最后合并它们。继续素数示例,我们得到这个。

If you don't know the exact number of elements in advance or are afraid you'll mess up the indexing, there is yet another way to do it: Have each thread fill its own vector and merge them at the end. Continuing the primes example, we get this.

std::vector<int> numbersA {};  // private store for thread A
std::vector<int> numbersB {};  // private store for thread B
std::vector<int> numbersC {};  // private store for thread C

// thread A              // thread B              // thread C

numbersA.push_back( 2);  numbersB.push_back(11);  numbersC.push_back(23);
numbersA.push_back( 3);  numbersB.push_back(13);  numbersC.push_back(27);
numbersA.push_back( 5);  numbersB.push_back(17);  numbersC.push_back(29);
numbersA.push_back( 7);  numbersB.push_back(21);  numbersC.push_back(31);

// Back on the main thread after A, B and C are joined:

std::vector<int> numbers(
    numbersA.size() + numbersB.size() + numbersC.size());
auto pos = numbers.begin();
pos = std::move(numbersA.begin(), numbersA.end(), pos);
pos = std::move(numbersB.begin(), numbersB.end(), pos);
pos = std::move(numbersC.begin(), numbersC.end(), pos);
assert(pos == numbers.end());

// Now dispose of numbersA, numbersB and numbersC as soon as possible
// in order to release their no longer needed memory.

std :: move 用于上面的代码是算法库中的代码。)

(The std::move used in the above code is the one from the algorithms library.)

此方法是所有方法中最理想的内存访问方式,因为 numbersA numbersB numbersC 正在写入完全独立分配的内存。当然,我们必须进行附加的顺序工作以加入中间结果。请注意,效率在很大程度上取决于以下事实:与查找/创建元素相比,移动分配元素的成本可以忽略不计。至少如上所述,代码还假定您的类型具有便宜的默认构造函数。当然,如果您的类型不是这种情况,则可以再次使用智能指针。

This approach has the most desirable memory access pattern of all because numbersA, numbersB and numbersC are writing to completely independently allocated memory. Of course, we got to do the additional sequential work of joining the intermediate results. Note that the efficiency relies heavily on the fact that the cost of move-assigning an element is negligible compared to the cost of finding / creating it. At least as written above, the code also assumes that your type has a cheap default constructor. Of course, if this is not the case for your type, you can again use smart pointers.

我希望这可以为您提供足够的想法来优化您的问题。

I hope this provided you with enough ideas to optimize your problem.

如果您以前从未使用过智能指针,请查看 RAII和C ++中的智能指针 ,并查看标准库的动态内存管理库。上面显示的技术当然也可以与 std :: vector< Foo *> 一起使用,但是在现代C ++中,我们不再使用像这样的拥有原始指针的资源

If you have never uses smart pointers before, have a look at "RAII and smart pointers in C++" and check out the standard library's dynamic memory management library. The techniques shown above would of course also work with a std::vector<Foo *> but we don't use resource owning raw pointers like this in modern C++ any more.

这篇关于同步push_back和std :: thread的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆