C ++ OpenMP任务 - 通过引用问题传递 [英] C++ OpenMP Tasks - passing by reference issue

查看:181
本文介绍了C ++ OpenMP任务 - 通过引用问题传递的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在开发一个系统,我在这个系统中读取超过2亿条记录(行)的文件,因此我在缓冲记录并使用OpenMP任务管理每个批处理,同时继续处理输入。缓冲区中的每条记录大约需要60μ才能在 work_on_data 中处理,并生成字符串结果。为了避免关键区域,我创建了一个结果向量,并通过地址将记录占位符(我插入此向量)传递给 work_on_data 函数:

I am currently working on a system in which I reading in a file of over ~200 million records (lines), so I am buffering the records and using OpenMP tasks to manage each batch while continuing to process input. Each record in the buffer takes roughly 60μ to process in work_on_data, and will generate a string result. To avoid critical regions, I create a vector for results, and pass record placeholders (that I insert into this vector) by address to the work_on_data function :

int i = 0;
string buffer[MAX_SIZE];
vector<string> task_results;

#pragma omp parallel shared(map_a, task_results), num_threads(X) 
#pragma omp single
{
    while (getline(fin, line) && !fin.eof())
    {
        buffer[i] = line;
        if (++i == MAX_SIZE)
        {
            string result = "";
            task_results.push_back(result);
#pragma omp task firstprivate(buffer)
            work_on_data(buffer, map_a, result);
            i = 0;
        }
    }
}

// eventually merge records in task_results

work_on_data 结尾处,传入的每个结果都不会为空string(初始化)。 然而,当合并结果时,每个结果仍然是一个空字符串。我可能在这里做了一些关于作用域/寻址的蠢事,但我不知道问题是什么。有什么想法吗?

At the end of work_on_data, each result passed in will not be an empty string (as initialized). However, when merging results, each result is still an empty string. I may be doing something stupid here regarding scoping/addressing, but I don't see what the problem is. Any thoughts?

提前致谢。

推荐答案

推送进入向量使得它的副本在向量内构造。所以你的 work_on_data 函数不会获得对向量内部字符串的引用,而是对if块内的字符串的引用。要解决此问题,您可以重写代码以使其能够访问push_back之后的最后一个元素,如下所示:

Pushing something into a vector causes a copy of it to be constructed inside the vector. So your work_on_data function doesn't get a reference to the string inside the vector, but to the string inside the if block. To fix this you could rewrite your code to give it access to the last element after the push_back, like so:

if (++i == MAX_SIZE)
{
    task_results.push_back("");
#pragma omp task firstprivate(buffer)
    work_on_data(buffer, map_a, task_results.back());
    i = 0;
}

修改

我忘记了向量重新分配时迭代器失效,另外对 back()的调用导致竞争条件。使用(智能)指针(如评论所暗示的)和专用计数器,这对我没有任何段错误:

I had forgotten about iterator invalidation on vector reallocation, and additionally the call to back() leads to race conditions. With (smart) pointers (as the comments are suggesting) and a dedicated counter this works for me with no segfault:

vector<shared_ptr<string>> task_results;

int ctr = 0
...
if (++i == MAX_SIZE) {
    task_results.push_back(make_shared<string>());
#pragma omp task firstprivate(buffer, ctr) 
    work_on_data(buffer, map_a, *task_results.back[ctr]);
    i = 0;
    ++ctr;

}

我认为返回() 版本段错误,因为该函数同时由许多不同的线程调用,如果主线程在其间的某处管理 push_back ,线程将处理相同的数据。

I think the back() version segfaults because that function is being called by many different threads at the same time and if the main thread manages to push_back somewhere in between as well, threads would be working on the same data.

这篇关于C ++ OpenMP任务 - 通过引用问题传递的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆