如何并行生成随机数? [英] How to generate random numbers in parallel?

查看:60
本文介绍了如何并行生成随机数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 openMP 并行生成伪随机数,如下所示:

I want to generate pseudorandom numbers in parallel using openMP, something like this:

int i;
#pragma omp parallel for
for (i=0;i<100;i++)
{
    printf("%d %d %d
",i,omp_get_thread_num(),rand());
} 
return 0; 

我已经在 Windows 上对其进行了测试,并且获得了巨大的加速,但是每个线程生成的数字完全相同.我也在 Linux 上对其进行了测试,但速度明显变慢,8 核处理器上的并行版本比顺序版本慢了大约 10 倍,但每个线程生成的数字不同.

I've tested it on windows and I got huge speedup, but each thread generated exactly the same numbers. I've tested it also on Linux and I got huge slowdown, parallel version on 8core processor was about 10 time slower than sequential, but each thread generated different numbers.

有什么办法可以同时拥有加速和不同的数字吗?

Is there any way to have both speedup and different numbers?

编辑 27.11.2010
我想我已经使用 Jonathan Dursi 帖子中的想法解决了这个问题.似乎以下代码在 linux 和 windows 上都可以快速运行.数字也是伪随机的.你怎么看?

Edit 27.11.2010
I think I've solved it using an idea from Jonathan Dursi post. It seems that following code works fast on both linux and windows. Numbers are also pseudorandom. What do You think about it?

int seed[10];

int main(int argc, char **argv) 
{
int i,s;
for (i=0;i<10;i++)
    seed[i] = rand();

#pragma omp parallel private(s)
{
    s = seed[omp_get_thread_num()];
    #pragma omp for
    for (i=0;i<1000;i++)
    {
        printf("%d %d %d
",i,omp_get_thread_num(),s);
        s=(s*17931+7391); // those numbers should be choosen more carefully
    }
    seed[omp_get_thread_num()] = s;
}
return 0; 
} 

PS.:我还没有接受任何答案,因为我需要确保这个想法是好的.

PS.: I haven't accepted any answer yet, because I need to be sure that this idea is good.

推荐答案

我会在这里发布我发布的内容 并发随机数生成 :

I'll post here what I posted to Concurrent random number generation :

我认为您正在寻找 rand_r(),它明确地将当前 RNG 状态作为参数.然后每个线程都应该有它自己的种子数据副本(您希望每个线程以相同的种子还是不同的种子开始取决于您在做什么,在这里您希望它们不同,否则您将获得相同的行一次又一次).这里有一些关于 rand_r() 和线程安全的讨论:是否 rand_r 是真正的线程安全吗? .

I think you're looking for rand_r(), which explicitly takes the current RNG state as a parameter. Then each thread should have it's own copy of seed data (whether you want each thread to start off with the same seed or different ones depends on what you're doing, here you want them to be different or you'd get the same row again and again). There's some discussion of rand_r() and thread-safety here: whether rand_r is real thread safe? .

因此,假设您希望每个线程的种子都以其线程编号开始(这可能不是您想要的,因为每次使用相同数量的线程运行时它都会给出相同的结果,但就像示例):

So say you wanted each thread to have its seed start off with its thread number (which is probably not what you want, as it would give the same results every time you ran with the same number of threads, but just as an example):

#pragma omp parallel default(none)
{
    int i;
    unsigned int myseed = omp_get_thread_num();
    #pragma omp for
    for(i=0; i<100; i++)
            printf("%d %d %d
",i,omp_get_thread_num(),rand_r(&myseed));
}

编辑:只是在云雀上,检查一下上面的内容是否会得到任何加速.完整代码是

Edit: Just on a lark, checked to see if the above would get any speedup. Full code was

#define NRANDS 1000000
int main(int argc, char **argv) {

    struct timeval t;
    int a[NRANDS];

    tick(&t);
    #pragma omp parallel default(none) shared(a)
    {
        int i;
        unsigned int myseed = omp_get_thread_num();
        #pragma omp for
        for(i=0; i<NRANDS; i++)
                a[i] = rand_r(&myseed);
    }
    double sum = 0.;
    double time=tock(&t);
    for (long int i=0; i<NRANDS; i++) {
        sum += a[i];
    }
    printf("Time = %lf, sum = %lf
", time, sum);

    return 0;
}

其中,tick 和 tock 只是 gettimeofday() 的包装器,而 tock() 以秒为单位返回差值.打印 Sum 只是为了确保没有优化掉,并展示一个小点;您将获得具有不同线程数的不同编号,因为每个线程都有自己的 threadnum 作为种子;如果您使用相同数量的线程一次又一次地运行相同的代码,您将获得相同的总和,出于相同的原因.无论如何,计时(在没有其他用户的情况下在 8 核 nehalem 机器上运行):

where tick and tock are just wrappers to gettimeofday(), and tock() returns the difference in seconds. Sum is printed just to make sure that nothing gets optimized away, and to demonstrate a small point; you will get different numbers with different numbers of threads because each thread gets its own threadnum as a seed; if you run the same code again and again with the same number of threads you'll get the same sum, for the same reason. Anyway, timing (running on a 8-core nehalem box with no other users):

$ export OMP_NUM_THREADS=1
$ ./rand
Time = 0.008639, sum = 1074808568711883.000000

$ export OMP_NUM_THREADS=2
$ ./rand
Time = 0.006274, sum = 1074093295878604.000000

$ export OMP_NUM_THREADS=4
$ ./rand
Time = 0.005335, sum = 1073422298606608.000000

$ export OMP_NUM_THREADS=8
$ ./rand
Time = 0.004163, sum = 1073971133482410.000000

所以加速,如果不是很好的话;正如@ruslik 指出的那样,这并不是一个真正的计算密集型过程,内存带宽等其他问题开始发挥作用.因此,在 8 核上的加速仅超过 2 倍.

So speedup, if not great; as @ruslik points out, this is not really a compute-intensive process, and other issues like memory bandwidth start playing a role. Thus, only a shade over 2x speedup on 8 cores.

这篇关于如何并行生成随机数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆