OpenMP在C数组中的减少/并行化代码 [英] OpenMP in C array reduction / parallelize the code
问题描述
我的代码有问题,它应该打印出一定数量的外观.
I have a problem with my code, it should print number of appearances of a certain number.
我想将此代码与OpenMP并行化,我试图对数组使用归约法,但是显然它并没有按我的意愿工作.
I want parallelize this code with OpenMP, and I tried to use reduction for arrays but it's obviously didn't working as I wanted.
错误是:分段错误".一些变量应该是私有的吗?还是我尝试使用归约方式的问题?
The error is: "segmentation fault". Should some variables be private? or it's the problem with the way I'm trying to use the reduction?
我认为每个线程应该计数数组的某个部分,然后以某种方式合并它.
I think each thread should count some part of array, and then merge it somehow.
#pragma omp parallel for reduction (+: reasult[:i])
for (i = 0; i < M; i++) {
for(j = 0; j < N; j++) {
if ( numbers[j] == i){
result[i]++;
}
}
}
其中 N
是大数字,告诉我我有多少个数字.数字是所有数字的数组,是每个数字之和的结果数组.
Where N
is big number telling how many numbers I have. Numbers is array of all numbers and result array with sum of each number.
推荐答案
首先,您要在名称上输入错字
First you have a typo on the name
#pragma omp parallel for reduction (+: reasult[:i])
实际上应该是结果",不是诱因"
should actually be "result" not "reasult"
尽管如此,为什么还要 section 具有 result [:i]
的数组?根据您的代码,您似乎想减少整个数组,即:
Nonetheless, why are you section the array with result[:i]
? Based on your code, it seems that you wanted to reduce the entire array, namely:
#pragma omp parallel for reduction (+: result)
for (i = 0; i < M; i++)
for(j = 0; j < N; j++)
if ( numbers[j] == i)
result[i]++;
某人的编译器不支持 OpenMP 4.5数组精简功能可以替代地明确实现精简(
When one's compiler does not support the OpenMP 4.5 array reduction feature one can alternatively explicitly implement the reduction (check this SO thread to see how).
@ Hristo Iliev 在评论中指出的
假设M * sizeof(result [0])/#threads是缓存行大小,即使M的值较大也不会足够,绝对不需要减少过程.除非程序在NUMA系统上运行,否则就是这样.
Provided that M * sizeof(result[0]) / #threads is a multiple of the cache line size, and even if it isn't when the value of M is large enough, there is absolutely no need to involve reduction in the process. Unless the program is running on a NUMA system, that is.
假定满足上述条件,并且如果仔细分析,则会将最外层的循环迭代( ie ,变量 i
)分配给线程,并且由于该变量 i
用于访问 result
数组,每个线程将更新 result
数组的不同位置.因此,您可以将代码简化为:
Assuming that the aforementioned conditions are met, and if you analyze carefully the outermost loop iterations (i.e., variable i
) are assigned to the threads, and since the variable i
is used to access the result
array, each thread will be updating a different position of the result
array. Therefore, you can simplified your code to:
#pragma omp parallel for
for (i = 0; i < M; i++)
for(j = 0; j < N; j++)
if ( numbers[j] == i)
result[i]++;
这篇关于OpenMP在C数组中的减少/并行化代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!