OpenMP多个线程更新同一阵列 [英] OpenMP multiple threads update same array
问题描述
我的程序中包含以下代码,我想使用OpenMP对其进行加速.
I have the following code in my program and I want to accelerate it using OpenMP.
...
for(i=curr_index; i < curr_index + rx_size; i+=2){
int64_t tgt = rcvq[i];
int64_t src = rcvq[i+1];
if (!TEST(tgt)) {
pred[tgt] = src;
newq[newq_count++] = tgt;
}
}
当前,我的版本如下:
...
chunk = rx_sz / omp_nthreads;
#pragma omp parallel for num_threads(omp_nthreads)
for (ii = 0; ii < omp_nthreads; ii++) {
int start = curr_index + ii * chunk;
for (index = start; index < start + chunk; index +=2) {
int64_t tgt = rcvq[index];
int64_t src = rcvq[index+1];
if (!TEST(tgt)) {
pred[tgt] = src;
#pragma omp critical
newq[newq_count++] = tgt;
}
}
}
运行OpenMP版本时,与原始版本相比,性能会大大下降.我认为问题可能是由于紧急操作"而导致,它阻止了并行处理.我想知道我的代码可以增强什么,因此我可以获得比串行版本更好的性能.在代码中,rx_sz始终是omp_nthreads的倍数.
When I run the OpenMP version, I see a big performance degradation compared to the original version. I think the issue could be because of "omp critical" which prevents parallel processing. I want to know what could be enhanced with my code, so I could get better performance over the serial version. In the code, rx_sz is always a multiple of omp_nthreads.
推荐答案
我很确定omp关键部分目前会限制您的性能.
I'm pretty sure omp critical section limiting your performance at this point.
我建议您将结果收集到单独的缓冲区/向量中,并在并行处理完成后合并它们(当然,如果顺序对您而言无关紧要)
I'd recommend you to collect the results into separate buffers/vectors and merge them after the parallel processing is done (of course, if the order doesn't matter for you)
vector<vector<int64_t>> res;
res.resize(num_threads);
#pragma omp parallel for
for (index = 0; index < rx_sz/2; ++index) {
int64_t tgt = rcvq[2*index];
int64_t src = rcvq[2*index+1];
if (!TEST(tgt)) {
pred[tgt] = src;
res[omp_get_thread_num()].push_back(tgt);
}
}
// Merge all res vectors if needed
这篇关于OpenMP多个线程更新同一阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!