OpenMP的和C并行for循环:为什么使用OpenMP时,我的code放缓? [英] OpenMP and C parallel for loop: why does my code slow down when using OpenMP?

查看:276
本文介绍了OpenMP的和C并行for循环:为什么使用OpenMP时,我的code放缓?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新来的,在C.初学者水平的程序员我在与使用OpenMP用来加快for循环的一些问题。下面是简单的例子:

I'm new here and a beginner level programmer in C. I'm having some problem with using openmp to speedup the for-loop. Below is simple example:

#include <stdlib.h>
#include <stdio.h>
#include <gsl/gsl_rng.h>
#include <omp.h>

gsl_rng *rng;

main()
{
int i, M=100000000;
double tmp;

/* initialize RNG */
gsl_rng_env_setup();
rng = gsl_rng_alloc (gsl_rng_taus);
gsl_rng_set (rng,(unsigned long int)791526599);

// option 1: parallel        
  #pragma omp parallel for default(shared) private( i, tmp ) schedule(dynamic)
  for(i=0;i<=M-1;i++){
     tmp=gsl_ran_gamma_mt(rng, 4, 1./3 );
  }


// option 2: sequential       
  for(i=0;i<=M-1;i++){
     tmp=gsl_ran_gamma_mt(rng, 4, 1./3 );
  }
}

在code从并购迭代的伽玛随机分布绘制。原来使用OpenMP(选项1)并行的方式约需要1分钟,而顺序方法(选项2)只需要20秒。虽然使用OpenMP运行时,我可以看到CPU使用率是800%(我使用的服务器有8个CPU)。而且系统是Linux使用GCC 4.1.3。我使用的编译命令是gcc -fopenmp -lgsl -lgslcblas -lm(我使用GSL)

The code draws from a gamma random distribution for M iterations. It turns out the parallel approach with openmp (option 1) takes about 1 minute while the sequential approach (option 2) takes only 20 seconds. While running with openmp, I can see the cpu usage is 800% ( the server I'm using has 8 CPUs ). And the system is linux with GCC 4.1.3. The compile command I'm using is gcc -fopenmp -lgsl -lgslcblas -lm (I'm using GSL )

我是不是做错了什么?请帮帮我!谢谢!

Am I doing something wrong? Please help me! Thanks!

P.S。如由某些用户所指出的,它可能是由R​​NG引起。但是即使我代替

P.S. As pointed out by some users, it might be caused by rng. But even if I replace

tmp=gsl_ran_gamma_mt(rng, 4, 1./3 );

这是说

tmp=1000*10000;

问题仍然存在...

the problem still there...

推荐答案

gsl_ran_gamma_mt 可能锁住 RNG 至prevent并发问题(如果没有,您的并行code可能包含一个竞争条件,从而产生错误的结果)。那么解决办法是有一个单独的 RNG 实例为每个线程,从而避免锁定。

gsl_ran_gamma_mt probably locks on rng to prevent concurrency issues (if it didn’t, your parallel code probably contains a race condition and thus yields wrong results). The solution then would be to have a separate rng instance for each thread, thus avoiding locking.

这篇关于OpenMP的和C并行for循环:为什么使用OpenMP时,我的code放缓?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆