阵列的OpenMP平均值 [英] OpenMP average of an array

查看:177
本文介绍了阵列的OpenMP平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为正在编写的程序学习OpenMP.对于它的一部分,我试图实现一个函数以查找大型数组的平均值.这是我的代码:

I'm trying to learn OpenMP for a program I'm writing. For part of it I'm trying to implement a function to find the average of a large array. Here is my code:

double mean(double* mean_array){
    double mean = 0;

    omp_set_num_threads( 4 );
    #pragma omp parallel for reduction(+:mean)



    for (int i=0; i<aSize; i++){
        mean = mean + mean_array[i];

    }

    printf("hello %d\n", omp_get_thread_num());



    mean = mean/aSize;

    return mean;

}

但是,如果我运行代码,它的运行速度将比顺序版本慢.同样对于打印语句,我得到:

However if I run the code it runs slower than the sequential version. Also for the print statement I get:

hello 0
hello 0

对我来说没有什么意义,难道不应该有4个问候吗?

Which doesn't make much sense to me, shouldn't there be 4 hellos?

任何帮助将不胜感激.

推荐答案

首先,您没有看到4个"hello"的原因是因为并行执行的程序的唯一部分是所谓的包含在#pragma omp parallel内的平行区域.在作为循环体的代码中(由于omp并行指令附加到了for语句),printf位于程序的顺序部分.

First, the reason why you are not seeing 4 "hello"s, is because the only part of the program which is executed in parallel is the so called parallel region enclosed within an #pragma omp parallel. In your code that is the loop body (since the omp parallel directive is attached to the for statement), the printf is in the sequential part of the program.

按如下所示重写代码即可达到目的:

rewriting the code as follows would do the trick:

#pragma omp parallel num_threads(4)
{
  #pragma omp for reduction(+:mean)
  for (int i=0; i<aSize; i++) {
     mean = mean + mean_array[i];
  }
  printf("hello %d\n", omp_get_thread_num());
}

第二,您的程序运行速度比顺序版本慢,这可能取决于多个因素.首先,您需要确保数组足够大,以使创建这些线程的开销(通常在创建并行区域时发生)可以忽略不计.另外,对于小型阵列,您可能会遇到缓存错误共享"问题,其中线程竞争同一条缓存行,从而导致性能下降.

Second, the fact your program runs slower than the sequential version, it can depend on multiple factors. First of all, you need to make sure the array is large enough so that the overhead of creating those threads (which usually happens when the parallel region is created) is negligible. Also, for small arrays you may be running into "cache false sharing" issues in which threads are competing for the same cache line causing performance degradation.

这篇关于阵列的OpenMP平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆