在OMP中使用线程生成相同的随机数 [英] Generating the same random numbers with threads in OMP

查看:165
本文介绍了在OMP中使用线程生成相同的随机数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用OMP对某些代码进行多线程处理.目前,我的顺序版本使用rand()生成了一组具有一致种子的随机数,以便每次运行时它们都返回相同的结果.我想并行化我的代码,但是rand()不是线程安全的.有人可以告诉我如何使用在线程上工作的随机数生成器,以便我可以在每次测试时生成相同的数据集,类似于使用带有rand()的种子的数据集.我的代码并行化如下:

    long linkCnt;
    long j;
    long t;
    srand(randInit);

    linkCnt=0; //Keep track of # of outgoing edges
 #pragma omp parallel for schedule(runtime) private(t)
    for(j = 0; j < G->N; j++)
    {

        if(i == j){
            t = NO_CONN;
        } else {
            t = (rand() % ((MAX_EDGE_WEIGTH-1) * 2)+1); //50% of having no connection
            if(t > MAX_EDGE_WEIGTH){
                //t = INF; //Like no connection
                t = NO_CONN; //Like no connection
            } else {
                linkCnt++;
                G->visited[j] = VISITED; //Do this to find isolated nods that have no incomming edge
            }
        }

        G->node[i][j] = t;
    }

解决方案

在这里合并似乎有几个问题.

首先,rand()函数的非线程安全性质意味着与从顺序调用的同时从不同的线程调用rand()可能会产生不同的值.用一个简单的例子来解释这个问题可能是最容易的,所以让我们看一下PCG的32位版本,因为它很好而且很简单.它具有32位状态,并生成32位数字,如下所示:

static uint32_t state = ...;

static uint32_t generate(void) {
  uint32_t s = state;
  uint32_t v = ((s >> ((s >> 28) + 4)) ^ s) * (277803737U);
  v ^= v >> 22;
  state = state * 747796405U + 1729U;
  return v;
}

现在考虑如果两个线程大致同时调用generate()会发生什么.也许它们的state值都相同,因此两次生成相同的随机数.也许一个更新state在另一个读取之前,所以它们获得不同的值.

我们可以通过使用互斥锁或在32位PGC的情况下保护generate()函数来解决此问题(这就是为什么 https://burtleburtle.net/bob/hash/integer.html 有一些选择.例如,

for (int i = 0 ; i < LEN ; i++) {
  do_stuff(hash(i));
}

当然,您可以加入一些盐,甚至可以使用rand()生成盐.

I am attempting to multithread some code with OMP. Currently my sequentially version using rand() to generate a set of random numbers with a consistent seed so that they return the same results when run each time. I want to parallelise my code but rand() is not thread safe. Can someone please show me how i would go about using a random number generator that works on threads so i can produce the same data set upon each test similar to that of using a seed with rand(). My code im parallelising is as follows:

    long linkCnt;
    long j;
    long t;
    srand(randInit);

    linkCnt=0; //Keep track of # of outgoing edges
 #pragma omp parallel for schedule(runtime) private(t)
    for(j = 0; j < G->N; j++)
    {

        if(i == j){
            t = NO_CONN;
        } else {
            t = (rand() % ((MAX_EDGE_WEIGTH-1) * 2)+1); //50% of having no connection
            if(t > MAX_EDGE_WEIGTH){
                //t = INF; //Like no connection
                t = NO_CONN; //Like no connection
            } else {
                linkCnt++;
                G->visited[j] = VISITED; //Do this to find isolated nods that have no incomming edge
            }
        }

        G->node[i][j] = t;
    }

解决方案

There seem to be a couple problems getting conflated here.

First, the non-thread-safe nature of rand() function means that calling rand() at the same time from different threads might yield different values than if it were called sequentially. It's probably easiest to explain this with a simple example, so let's look at a 32-bit version of PCG since it's nice and simple. It has a 32-bit state, and generates 32-bit numbers like this:

static uint32_t state = ...;

static uint32_t generate(void) {
  uint32_t s = state;
  uint32_t v = ((s >> ((s >> 28) + 4)) ^ s) * (277803737U);
  v ^= v >> 22;
  state = state * 747796405U + 1729U;
  return v;
}

Now think about what happens if two threads call generate() at roughly the same time. Maybe they both get the same value for state, and so generate the same random number twice. Maybe one updates state before the other reads it, so they get different values.

We can eliminate the problem by protecting the generate() function with a mutex or, in the case of 32-bit PGC (and this is why I use it for reproducible numbers), using atomics. If we do that then we'll always get the same numbers in the same order.

Part two of the problem is what happens when the order of the callers gets mixed up in your code. Let's say you have two threads (called A and B), and they each have to run two iterations of your loop. Even if you're getting your random numbers from a thread-safe source, the order of the calls could be AABB, ABAB, ABBA, BBAA, BABA, or BAAB, each of which would cause your code to generate a different result.

There are several straightforward ways around this. First, you could use synchronization primitives to ensure that each iteration calls generate in the order you want. The easiest way would probably be to use a queue, but you'll waste a lot of time synchronizing and you'll lose some opportunities for parallelism (and you have to restructure your code significantly).

If you have a relatively small number of iterations, you might consider generating an array ahead of time. Think:

int i;
int nums[LEN];
for (i = 0 ; i < LEN ; i++)
  nums[i] = generate();
#pragma omp parallel for ...
for (i = 0 ; i < LEN ; i++) {
  do_stuff(nums[i]);
}

A better solution, though, might be to just ditch the idea of generating random numbers altogether and instead use hashing. https://burtleburtle.net/bob/hash/integer.html has some options. An example of that would be something like

for (int i = 0 ; i < LEN ; i++) {
  do_stuff(hash(i));
}

Of course you can throw in some salt, maybe even use rand() to generate your salt.

这篇关于在OMP中使用线程生成相同的随机数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆