蒙特卡洛pi逼近的并行化 [英] Parallelization for Monte Carlo pi approximation

查看:124
本文介绍了蒙特卡洛pi逼近的并行化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个c脚本,以将pi近似与OpenMp并行化.我认为我的代码可以令人信服地输出正常工作.我现在用4个线程运行它.我不确定的是,此代码是否容易受到竞争条件的影响?如果是的话,如何在此代码中协调线程操作?

I am writing a c script to parallelize pi approximation with OpenMp. I think my code works fine with a convincing output. I am running it with 4 threads now. What I am not sure is that if this code is vulnerable to race condition? and if it is, how do I coordinate the thread action in this code ?

代码如下:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <math.h>
#include <omp.h>

double sample_interval(double a, double b) {

  double x = ((double) rand())/((double) RAND_MAX);
  return (b-a)*x + a;

}

int main (int argc, char **argv) {


  int N = atoi( argv[1] ); // convert command-line input to N = number of points
  int i;
  int NumThreads = 4;
  const double pi = 3.141592653589793;
  double x, y, z;
  double counter = 0;



  #pragma omp parallel firstprivate(x, y, z, i) reduction(+:counter) num_threads(NumThreads)
 {
  srand(time(NULL));
  for (int i=0; i < N; ++i) 
  {
    x = sample_interval(-1.,1.);
    y = sample_interval(-1.,1.);
    z = ((x*x)+(y*y));

    if (z<= 1) 
   {
      counter++;
    }
  }

 } 
  double approx_pi = 4.0 * counter/ (double)N;

  printf("%i %1.6e %1.6e\n ", N, 4.0 * counter/ (double)N, fabs(4.0 * counter/ (double)N - pi) / pi);


  return 0;

}

我也想知道应该在并行化的内部还是外部声明随机数的种子.我的输出看起来像这样:

Also I was wondering if the seed for random number should be declared inside or outside parallelization. my output looks like this:

10 3.600000e+00 1.459156e-01
100 3.160000e+00 5.859240e-03
1000 3.108000e+00 1.069287e-02
10000 3.142400e+00 2.569863e-04
100000 3.144120e+00 8.044793e-04
1000000 3.142628e+00 3.295610e-04
10000000 3.141379e+00 6.794439e-05
100000000 3.141467e+00 3.994585e-05
1000000000 3.141686e+00 2.971945e-05

目前看起来还不错.非常欢迎您提出关于比赛条件和种子位置的建议.

Which looks OK for now. your suggestion for race condition and seed placement is most welcome.

推荐答案

您的代码中有一些我可以看到的问题.从我的角度来看,主要的一点是它不是并行化的.或更准确地说,您在编译时没有启用OpenMP引入的并行性.这是人们可以看到的方式:

There are a few problems in your code that I can see. The main one is from my standpoint that it isn't parallelized. Or more precisely, you didn't enable the parallelism you introduced with OpenMP while compiling it. Here is the way one can see that:

并行化代码的方式,主for循环应由所有线程完全执行(这里没有工作共享,没有#pragma omp parallel for,只有一个#pragma omp parallel).因此,考虑将线程数设置为4,全局迭代数应为4*N.因此,您的输出应缓慢收敛到4 * Pi,而不是Pi.

The way the code is parallelized, the main for loop should be executed in full by all the threads (there is no worksharing here, no #pragma omp parallel for, only a #pragma omp parallel). Therefore, considering you set the number of threads to be 4, the global number of iterations should be 4*N. Thus, your output should slowly converge towards 4*Pi, not towards Pi.

确实,我在笔记本电脑上尝试了您的代码,并在OpenMP支持下对其进行了编译,这几乎就是我得到的.但是,当我不启用OpenMP时,会得到与您相似的输出.因此,总而言之,您需要:

Indeed, I tried your code on my laptop, compiled it with OpenMP support, and that is pretty-much what I get. However, when I don't enable OpenMP, I get an output similar to yours. So in conclusion, you need to:

  1. 在编译时启用OpenMP以获取并行版本的代码.
  2. 将结果除以NumThreads以获得Pi的有效"近似值(或将循环分布在N上,例如以#pragma omp for表示)
  1. Enable OpenMP at compilation time for getting a parallel version of your code.
  2. Divide your result by NumThreads to get a "valid" approximation of Pi (or distribute your loop over N with a #pragma omp for for example)

但是,这是/,而您的代码在其他地方正确时(尚未). 正如BitTickler所暗示的那样,rand()并不是线程安全的.因此,您必须使用另一个随机数生成器,这将使您可以私有化其状态.例如,可能是rand_r().也就是说,这仍然有很多问题:

But that is if / when your code is correct elsewhere, which it isn't yet. As BitTickler already hinted, rand() isn't thread-safe. So you have to go for another random number generator, which will allow you to privatize it's state. That could be rand_r() for example. That said, this still has quite a few issues:

    就随机性和周期性而言,
  1. rand()/rand_r()是一种糟糕的RNG.在增加尝试次数的同时,您将快速进入RNG的整个过程,并一次又一次地重复相同的序列.您需要更强大的功能来进行远程处理.
  2. 即使具有良好"的RNG,在您希望并行的序列彼此之间不相关的意义上,并行性方面也可能是一个问题.而且,仅对每个线程使用不同的种子值并不能保证这一点(尽管具有足够大的RNG,但您还有一定的余地)
  1. rand() / rand_r() is a terrible RNG in term of randomness and periodicity. While increasing your number of tries, you'll rapidly go over the period of the RNG and repeat over and over again the same sequence. You need something more robust to do anything remotely serious.
  2. Even with a "good" RNG, the parallelism aspect can be an issue in the sense that you want your sequences in parallel to be uncorrelated between each-other. And just using a different seed value per thread doesn't guaranty that to you (although with a wide-enough RNG, you have a bit of headroom for that)

无论如何,最重要的是:

Anyway, bottom line is:

  • 使用更好的线程安全RNG(我发现drand48_r()random_r()可以在Linux上用于玩具代码)
  • 例如,根据线程ID初始化每个线程的状态,同时请注意,这在某些情况下(以及调用函数的次数越多,您最终拥有重叠系列的可能性就越大.
  • Use a better thread-safe RNG (I find drand48_r() or random_r() to be OK for toy codes on Linux)
  • Initialize its state per-thread based on the thread id for example, while keeping in mind that this won't ensure a proper decorrelation of the random series in some circumstances (and the larger the number of times you call the functions, the more likely you are to finally have overlapping series).

完成此操作(以及一些较小的修复)后,您的代码如下所示:

This done (along with a few minor fixes), your code becomes for example as follows:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <math.h>
#include <omp.h>

typedef struct drand48_data RNGstate;

double sample_interval(double a, double b, RNGstate *state) {
    double x;
    drand48_r(state, &x);
    return (b-a)*x + a;
}

int main (int argc, char **argv) {

    int N = atoi( argv[1] ); // convert command-line input to N = number of points
    int NumThreads = 4;
    const double pi = 3.141592653589793;
    double x, y, z;
    double counter = 0;
    time_t ctime = time(NULL);

    #pragma omp parallel private(x, y, z) reduction(+:counter) num_threads(NumThreads)
    {
        RNGstate state;
        srand48_r(ctime+omp_get_thread_num(), &state);
        for (int i=0; i < N; ++i) {
            x = sample_interval(-1, 1, &state);
            y = sample_interval(-1, 1, &state);
            z = ((x*x)+(y*y));

            if (z<= 1) {
                counter++;
            }
        }

    } 
    double approx_pi = 4.0 * counter / (NumThreads * N);

    printf("%i %1.6e %1.6e\n ", N, approx_pi, fabs(approx_pi - pi) / pi);

    return 0;
}

我这样编译:

gcc -std=gnu99 -fopenmp -O3 -Wall pi.c -o pi_omp

这篇关于蒙特卡洛pi逼近的并行化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆