如何生成并行随机数? [英] How to generate random numbers in parallel?

查看:322
本文介绍了如何生成并行随机数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要使用OpenMP,像这样生成并行伪随机数:

I want to generate pseudorandom numbers in parallel using openMP, something like this:

int i;
#pragma omp parallel for
for (i=0;i<100;i++)
{
    printf("%d %d %d\n",i,omp_get_thread_num(),rand());
} 
return 0; 

我测试过它的窗户,我得到了极大的提速,但是产生的每个线程完全一样的号码。我也测试了它在Linux上,我得到了巨大的增长放缓,对8core处理器并行版本是10的时间比连续慢,但每个线程产生不同的号码。

I've tested it on windows and I got huge speedup, but each thread generated exactly the same numbers. I've tested it also on Linux and I got huge slowdown, parallel version on 8core processor was about 10 time slower than sequential, but each thread generated different numbers.

有没有办法同时拥有加速和不同的数字?

Is there any way to have both speedup and different numbers?

修改2010年11月27日结果
我想我已经用乔纳森Dursi张贴想法解决它。看来,继code工作快于Linux和Windows。号码也伪。你对此怎么看?

Edit 27.11.2010
I think I've solved it using an idea from Jonathan Dursi post. It seems that following code works fast on both linux and windows. Numbers are also pseudorandom. What do You think about it?

int seed[10];

int main(int argc, char **argv) 
{
int i,s;
for (i=0;i<10;i++)
    seed[i] = rand();

#pragma omp parallel private(s)
{
    s = seed[omp_get_thread_num()];
    #pragma omp for
    for (i=0;i<1000;i++)
    {
        printf("%d %d %d\n",i,omp_get_thread_num(),s);
        s=(s*17931+7391); // those numbers should be choosen more carefully
    }
    seed[omp_get_thread_num()] = s;
}
return 0; 
} 

PS:我没有接受任何答案,因为我需要确保这种想法是不错的。

PS.: I haven't accepted any answer yet, because I need to be sure that this idea is good.

推荐答案

我会在这里发布我要发布<一个href=\"http://stackoverflow.com/questions/4234480/concurrent-random-number-generation/4234555\">Concurrent随机数生成:

I'll post here what I posted to Concurrent random number generation :

我认为你正在寻找rand_r(),其中明确接受当前RNG状态作为参数。然后,每个线程应该有它的种子数据的自己的副本(您是否希望每个线程使用相同的种子或不同的人开始依赖于你在做什么,在这里,你想他们是不同的,或者你会得到相同的行一遍又一遍)。有rand_r()和线程安全的一些讨论在这里: rand_r是否是真实的线程安全吗?

I think you're looking for rand_r(), which explicitly takes the current RNG state as a parameter. Then each thread should have it's own copy of seed data (whether you want each thread to start off with the same seed or different ones depends on what you're doing, here you want them to be different or you'd get the same row again and again). There's some discussion of rand_r() and thread-safety here: whether rand_r is real thread safe? .

所以说,你想每个线程有它的种子,其线程数开始(这可能不是你想要的,因为这将给予同样的结果,每次以相同数量的线程运行时间,但只是作为一个示例):

So say you wanted each thread to have its seed start off with its thread number (which is probably not what you want, as it would give the same results every time you ran with the same number of threads, but just as an example):

#pragma omp parallel default(none)
{
    int i;
    unsigned int myseed = omp_get_thread_num();
    #pragma omp for
    for(i=0; i<100; i++)
            printf("%d %d %d\n",i,omp_get_thread_num(),rand_r(&myseed));
}

修改:只需在一个云雀,检查,看看是否上面会得到任何的加速。全code是

Edit: Just on a lark, checked to see if the above would get any speedup. Full code was

#define NRANDS 1000000
int main(int argc, char **argv) {

    struct timeval t;
    int a[NRANDS];

    tick(&t);
    #pragma omp parallel default(none) shared(a)
    {
        int i;
        unsigned int myseed = omp_get_thread_num();
        #pragma omp for
        for(i=0; i<NRANDS; i++)
                a[i] = rand_r(&myseed);
    }
    double sum = 0.;
    double time=tock(&t);
    for (long int i=0; i<NRANDS; i++) {
        sum += a[i];
    }
    printf("Time = %lf, sum = %lf\n", time, sum);

    return 0;
}

在那里滴答滴答和只是包装到函数gettimeofday(),和滴答()返回以秒为单位的差异。心印只是为了确保没有被优化掉,并展示一个小点;你会得到不同数目的线程数不​​同,因为每个线程都有自己的threadnum作为种子;如果您有相同数量的线程一次又一次运行相同的code,你会得到相同的总和,出于同样的原因。总之,时间(没有其他用户在8核Nehalem运行框中):

where tick and tock are just wrappers to gettimeofday(), and tock() returns the difference in seconds. Sum is printed just to make sure that nothing gets optimized away, and to demonstrate a small point; you will get different numbers with different numbers of threads because each thread gets its own threadnum as a seed; if you run the same code again and again with the same number of threads you'll get the same sum, for the same reason. Anyway, timing (running on a 8-core nehalem box with no other users):

$ export OMP_NUM_THREADS=1
$ ./rand
Time = 0.008639, sum = 1074808568711883.000000

$ export OMP_NUM_THREADS=2
$ ./rand
Time = 0.006274, sum = 1074093295878604.000000

$ export OMP_NUM_THREADS=4
$ ./rand
Time = 0.005335, sum = 1073422298606608.000000

$ export OMP_NUM_THREADS=8
$ ./rand
Time = 0.004163, sum = 1073971133482410.000000

所以提速,如果不是很大;作为@ruslik指出,这是不是一个真正的计算密集型进程,如内存带宽等问题开始发挥作用。因此,只有在8核阴影超过2倍的速度提升。

So speedup, if not great; as @ruslik points out, this is not really a compute-intensive process, and other issues like memory bandwidth start playing a role. Thus, only a shade over 2x speedup on 8 cores.

这篇关于如何生成并行随机数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆