使用OpenMP并行输出 [英] Parallelize output using OpenMP

查看:123
本文介绍了使用OpenMP并行输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个必须处理大量数据的C ++应用程序.使用OpenMP,我很好地并行化了处理阶段,令人尴尬的是,发现输出写入现在是瓶颈.我决定在那里也使用parallel for,因为我输出项目的顺序是无关紧要的.它们只需要作为连贯的块输出即可.

I've written a C++ app that has to process a lot of data. Using OpenMP I parallelized the processing phase quite well and, embarrassingly, found that the output writing is now the bottleneck. I decided to use a parallel for there as well, as the order in which I output items is irrelevant; they just need to be output as coherent chunks.

下面是输出代码的简化版本,显示了收集相关数据"循环中两个自定义迭代器以外的所有变量.我的问题是:这是解决此问题的正确和最佳方法吗?我读到有关barrier编译指示的信息,我需要吗?

Below is a simplified version of the output code, showing all the variables except for two custom iterators in the "collect data in related" loop. My question is: is this the correct and optimal way to solve this problem? I read about the barrier pragma, do I need that?

long i, n = nrows();

#pragma omp parallel for
for (i=0; i<n; i++) {
    std::vector<MyData> related;
    for (size_t j=0; j < data[i].size(); j++)
        related.push_back(data[i][j]);
    sort(related.rbegin(), related.rend());

    #pragma omp critical
    {
        std::cout << data[i].label << "\n";
        for (size_t j=0; j<related.size(); j++)
            std::cout << "    " << related[j].label << "\n";
    }
}

(我将这个问题标记为c,因为我认为OpenMP在C和C ++中非常相似.如果我错了,请更正我.)

(I labeled this question c as I imagine OpenMP is very similar in C and C++. Please correct me if I'm wrong.)

推荐答案

一种解决输出争用的方法是将线程本地输出写入字符串流(可以并行执行),然后将内容推送到cout(需要同步).

One way to get around output contention is to write the thread-local output to a string stream, (can be done in parallel) and then push the contents to cout (requires synchronization).

类似这样的东西:

#pragma omp parallel for
for (i=0; i<n; i++) {
    std::vector<MyData> related;
    for (size_t j=0; j < data[i].size(); j++)
        related.push_back(data[i][j]);
    sort(related.rbegin(), related.rend());

    std::stringstream buf;
    buf << data[i].label << "\n";
    for (size_t j=0; j<related.size(); j++)
        buf << "    " << related[j].label << "\n";

    #pragma omp critical
    std::cout << buf.rdbuf();
}

这提供了更细粒度的锁定,性能应相应提高.另一方面,此 still 使用锁定.因此,另一种方法是使用流缓冲区数组,每个线程一个,然后在并行循环之后 将它们依次推入cout.这样的好处是避免了昂贵的锁定,并且cout的输出无论如何都必须序列化.

This offers much more fine-grained locking and the performance should increase accordingly. On the other hand, this still uses locking. So another way would be to use an array of stream buffers, one for each thread, and pushing them to cout sequentially after the parallel loop. This has the advantage of avoiding costly locks, and the output to cout must be serialized anyway.

另一方面,您甚至可以尝试忽略上述代码中的critical部分.以我的经验,这是可行的,因为基础流具有自己的控制并发性的方式.但是我相信这种行为是严格定义为实现的,而不是可移植的.

On the other hand, you can even try to omit the critical section in the above code. In my experience, this works since the underlying streams have their own way of controlling concurrency. But I believe that this behaviour is strictly implementation defined and not portable.

这篇关于使用OpenMP并行输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆