线程池/排队系统在C ++ [英] Threadpool / Queueing system in C++

查看:171
本文介绍了线程池/排队系统在C ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一种情况,我需要做一些沉重的计算。我发现,细分我的数据,然后合并回来在一起是最快的(因为大小增加,时间增加更快,所以拆分是逻辑的)。

I have a situation in which I need to do some heavy computation. I found out that subdividing my data and then merging it back together is the fastest (as the size increases, time increases faster, so splitting is logical).

能够给应用程序一个数据大小,让我们说例如一百万个双精度值。

It should be able to give a data size to the application, let's say for example one million double values.

现在我已经有了,是基于这个大小发送创建的数据对于某些函数,在计算后返回它,然后循环返回以将该数据卸载到主向量中。

What I have in place now, is sending created data based on this size off to some function, returning it after computation, and then looping over the return to unload this data into the main vector.

我想发送200的一部分,最后部分。例如,给定size = 1000005将最初执行此函数5000次,然后最后一个数据为大小5的数据。

I want to send parts of 200, with one "last" part. For example, giving size = 1000005 will perform this function 5000 times initially, and then the last one with data of size 5.

int size = 1000000;
int times = size / 200; // 5000
int leftover = size % 200; // 0, this not performed

QVector<double> x(size);
QVector<double> y(size);

x = createData(size);
x = createData(size);

for (int i = 0; i < times; i++)
{
    holder = createData(200);
    QVector<double> tempx = x.mid(i*200, 200);
    QVector<double> tempy = y.mid(i*200, 200);
    holder = myfunction(tempx, tempy, 200);  // let it now just return `tempy`
    for (int j = 0; j < 200; j++)
    {
        y[i*200 + j] = holder[j];
    }
}
// leftover function here, really similar to this part before.

// plotting function here

c> x 将保持初始化, y 将进行计算。

At the end, x will remain as initialized, y will have had the computation upon.

下面进一步描述了这种情况:

The following further characterizes the situation:


  • 这些函数调用彼此独立,只有当向量完成时才需要绘制结果。

  • 每次通话的完成时间会有很大变化。

  • 次数应该是可变的。

  • These function calls are independent of each other, only in the end when the vectors are complete do I want to plot the result.
  • The time of completion for each call will be varying a lot.
  • The amount of times should be variable.

我读了一些关于最大线程建议是核心的数量点),因为使用太多的线程可能会减慢进程。考虑到情况,排队系统/线程池似乎是有意义的,因为不会失去时间,而一个线程有一些简单的工作,其他人正在减缓一切困难的工作。

I read something about that maximum threads are advised to be the amount of cores (at least as a starting point), since using too many threads could slow the process down. Considering the situation a queueing system / threadpool would seem to make sense as not to lose time while one thread has some easy jobs and the others are slowing everything down by harder jobs.

虽然在几十个教程中使用一些(通常是2个)线程打印一些消息似乎很容易,但是任何人都可以提供更详细的帮助,如何返回向量并将这些线程安全地卸载到主函数中,以及如何创建线程池时间不会浪费?

While it seems easy to print some messages using some (usually 2) threads in some dozen of tutorials, could anyone provide more detailed help on how to return vectors and unload these thread safely into a main function, and how to create a threadpool so time would not be wasted?

使用Ubuntu 13.04,Qt和C ++ 11x,虽然不重要。

Using Ubuntu 13.04, Qt and C++11x, though it should not matter.

推荐答案

首先,写一个胎槽很难。如果你真的想学习如何写一本书,那么由Antony Williams撰写的C ++ Concurrency in Action的书教你如何完成。

First of all, write a tread pool is hard. If you really want to learn how to write one, the book C++ Concurrency in Action written by Antony Williams teach you how to accomplish that.

但是,你的情况似乎一个简单的parallel_for将完全适合的情况。因此,我建议使用英特尔线程构建基块库

However, your case seems to be a situation where a simple parallel_for will fit perfectly. So I suggest using Intel Threading Building Blocks library . The advantage of that library is that it has a very good thread pool and works quite nicely with C++11 features.

示例代码:

#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_for.h"
#include "tbb/tbb_thread.h"
#include <vector>

int main() {
  tbb::task_scheduler_init init(tbb::tbb_thread::hardware_concurrency());
  std::vector<double> a(1000);
  std::vector<double> c(1000);
  std::vector<double> b(1000);

  std::fill(b.begin(), b.end(), 1);
  std::fill(c.begin(), c.end(), 1);

  auto f = [&](const tbb::blocked_range<size_t>& r) {
    for(size_t j=r.begin(); j!=r.end(); ++j) a[j] = b[j] + c[j];    
  };
  size_t hint_number_iterations_per_thread = 100;
  tbb::parallel_for(tbb::blocked_range<size_t>(0, 1000, hint_number_iterations_per_thread), f);
  return 0;
}

完成!英特尔TBB有一个非常好的线程池,将尝试调整每个线程的工作量。只要hint_number_iterations_per_thread不是一个疯狂的数字,它将非常接近最优解决方案

Done! Intel TBB has a very good thread pool that will try to ajust the workload of each thread. As long as hint_number_iterations_per_thread is not a crazy number, it will be very close to the optimal solution

顺便说一句:intel TBB是一个开源库,与大多数的编译器!

By the way: intel TBB is a open source library that work with the majority of compilers!

这篇关于线程池/排队系统在C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆