CUDA CURAND是否容易受到数据争夺的影响? [英] Is CUDA CURAND susceptible to data races?

查看:121
本文介绍了CUDA CURAND是否容易受到数据争夺的影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从我的 Setup()内核中生成了 1个256个线程的块,以建立一个数组 RNGstates 具有256个CURAND状态:

I spawn 1 block of 256 threads from my Setup() kernel to set up an array RNGstates with 256 CURAND states:

__global__ void Setup(curandState *RNGstates, long seed) {
    int tid = threadIdx.x;
    curand_init(seed, tid, 0, &RNGstates[tid]);
}

现在,我生成 1000个256个线程的块从我的 Generate()内核中,用256,000个随机数填充数组结果。但是,我只使用 RNGstates 的256个状态,这样每个状态将被1000个线程(每个块中的一个)访问:

Now, I spawn 1000 blocks of 256 threads from my Generate() kernel to fill array result with 256,000 random numbers. However, I do so using only the 256 states of RNGstates, such that each state will be accessed by 1000 threads (one from each block):

__global__ void Generate(curandState *RNGstates, float *result) {
    int tid = blockIdx.x*blockDim.x + threadIdx.x;
    float rnd = curand_uniform(&RNGstates[threadIdx.x]);
    result[tid] = rnd;
}

我知道调用 curand_uniform()会以某种方式更新状态,因此我假定正在执行一些写操作。

I know that calling curand_uniform() updates the states somehow, so I presume some write operation is taking place.

所以我应该担心当1000个线程映射到每个线程时会发生数据争夺的情况。 256个CURAND状态尝试通过 curand_uniform()隐式更新状态?这会影响我的随机数的质量(例如获得频繁的重复值)吗?

So should I be worried about data races occuring when the 1000 threads mapped to each of the 256 CURAND states try to update the state implicitly through curand_uniform()? Will this impact the quality of my random numbers (e.g. get frequent duplicate values)?

非常感谢。

推荐答案

我认为共享状态肯定会影响质量。值重复是共享状态的最佳情况。数据竞争可能会完全破坏状态。

I think sharing states will definitely impact the quality. Duplicate values are the best situation for sharing states. Data race could totally ruin the states.

您可以为每个线程保留一个状态。

You could keep one state for each of your threads.

何时使用1000个块,需要256,000个状态。代码应类似于

When using 1000 blocks, 256,000 states are required for your case. The code should be like

__global__ void Setup(curandState *RNGstates, long seed) {
  int tid = blockIdx.x*blockDim.x + threadIdx.x;
  curand_init(seed, tid, 0, &RNGstates[tid]);
}

__global__ void Generate(curandState *RNGstates, float *result) {
  int tid = blockIdx.x*blockDim.x + threadIdx.x;
  float rnd = curand_uniform(&RNGstates[tid]);
  result[tid] = rnd;
}

要减少多个块的内存需求,可以将#block限制为一个较小的数字,并且每个线程生成多个随机数,而不是每个线程生成1个随机数。

To reduce mem requirement for multiple blocks, you could limit your #block to a small number, and generate multiple random numbers per thread, instead of 1 random number per thread.

__global__ void generate_uniform_kernel(curandState *state, 
                                unsigned int *result)
{
    int id = threadIdx.x + blockIdx.x * 64;
    unsigned int count = 0;
    float x;
    /* Copy state to local memory for efficiency */
    curandState localState = state[id];
    /* Generate pseudo-random uniforms */
    for(int n = 0; n < 10000; n++) {
        x = curand_uniform(&localState);
        /* Check if > .5 */
        if(x > .5) {
            count++;
        }
    }
    /* Copy state back to global memory */
    state[id] = localState;
    /* Store results */
    result[id] += count;
}

请参见设备API示例有关如何处理多个块的完整示例。

See the section Device API Examples in cuRAND ref manual for complete examples on how to deal with mutiple blocks.

这篇关于CUDA CURAND是否容易受到数据争夺的影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆