在生成一个不同的范围Cuda的内核中的随机数 [英] Generating random number within Cuda kernel in a varying range

查看:571
本文介绍了在生成一个不同的范围Cuda的内核中的随机数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成CUDA内核中的随机数的随机数。我想从均匀分布,并在整数形式生成的随机数,从1到8的随机数开始将是为每个线程的不同。最多可产生随机数的范围也将发生变化从一个线程到另一个。在一个线程的范围内的最大可能低至2或在其他线程它可以高达8,但不高于。所以,我在下面提供的我想如何产生的数字为例:

I am trying to generate random number random numbers within the cuda kernel. I wish to generate the random numbers from uniform distribution and in the integer form, starting from 1 up to 8. The random numbers would be different for each of the threads. The range up to which random number can be generated would also vary from one thread to another. The maximum of the range in one thread might be as low as 2 or in the other thread it can be high as 8, but not higher than that. So, I am providing an example below of how I want the numbers to get generated :

In thread#1 --> maximum of the range is 2 and so the random number should be between 1 and 2
In thread#2 --> maximum of the range is 6  and so the random number should be between 1 and 6
In thread#3 --> maximum of the range is 5 and so the random number should be between 1 and 5

等等...

任何帮助将非常AP preciated。谢谢你。

Any help would be very much appreciated. Thank you.

推荐答案

编辑:我已经编辑我的答案来解决的一些不足,在其他的答案中指出(@tudorturcu)和评论

I've edited my answer to fix some of the deficiencies pointed out in the other answers (@tudorturcu) and comments.


  1. 使用CURAND生成一个制服
    分布

    0.0和1.0之间

  2. 然后由所需的范围(乘以这个最大值 - 最小
    值+ 0.999999)。

  3. 然后添加偏移量(+最小值)。

  4. 然后截断为整数。

像这样的事情在你的设备code:

Something like this in your device code:

int idx = threadIdx.x+blockDim.x*blockIdx.x;
// assume have already set up curand and generated state for each thread...
// assume ranges vary by thread index
float myrandf = curand_uniform(&(my_curandstate[idx]));
myrandf *= (max_rand_int[idx] - min_rand_int[idx] + 0.999999);
myrandf += min_rand_int[idx];
int myrand = (int)truncf(myrandf);

您应该:​​

#include <math.h>

truncf

下面是一个完整的工作例如:

Here's a fully worked example:

$ cat t527.cu
#include <stdio.h>
#include <curand.h>
#include <curand_kernel.h>
#include <math.h>
#include <assert.h>
#define MIN 2
#define MAX 7
#define ITER 10000000

__global__ void setup_kernel(curandState *state){

  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  curand_init(1234, idx, 0, &state[idx]);
}

__global__ void generate_kernel(curandState *my_curandstate, const unsigned int n, const unsigned *max_rand_int, const unsigned *min_rand_int,  unsigned int *result){

  int idx = threadIdx.x + blockDim.x*blockIdx.x;

  int count = 0;
  while (count < n){
    float myrandf = curand_uniform(my_curandstate+idx);
    myrandf *= (max_rand_int[idx] - min_rand_int[idx]+0.999999);
    myrandf += min_rand_int[idx];
    int myrand = (int)truncf(myrandf);

    assert(myrand <= max_rand_int[idx]);
    assert(myrand >= min_rand_int[idx]);
    result[myrand-min_rand_int[idx]]++;
    count++;}
}

int main(){

  curandState *d_state;
  cudaMalloc(&d_state, sizeof(curandState));
  unsigned *d_result, *h_result;
  unsigned *d_max_rand_int, *h_max_rand_int, *d_min_rand_int, *h_min_rand_int;
  cudaMalloc(&d_result, (MAX-MIN+1) * sizeof(unsigned));
  h_result = (unsigned *)malloc((MAX-MIN+1)*sizeof(unsigned));
  cudaMalloc(&d_max_rand_int, sizeof(unsigned));
  h_max_rand_int = (unsigned *)malloc(sizeof(unsigned));
  cudaMalloc(&d_min_rand_int, sizeof(unsigned));
  h_min_rand_int = (unsigned *)malloc(sizeof(unsigned));
  cudaMemset(d_result, 0, (MAX-MIN+1)*sizeof(unsigned));
  setup_kernel<<<1,1>>>(d_state);

  *h_max_rand_int = MAX;
  *h_min_rand_int = MIN;
  cudaMemcpy(d_max_rand_int, h_max_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice);
  cudaMemcpy(d_min_rand_int, h_min_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice);
  generate_kernel<<<1,1>>>(d_state, ITER, d_max_rand_int, d_min_rand_int, d_result);
  cudaMemcpy(h_result, d_result, (MAX-MIN+1) * sizeof(unsigned), cudaMemcpyDeviceToHost);
  printf("Bin:    Count: \n");
  for (int i = MIN; i <= MAX; i++)
    printf("%d    %d\n", i, h_result[i-MIN]);

  return 0;
}


$ nvcc -arch=sm_20 -o t527 t527.cu -lcurand
$ cuda-memcheck ./t527
========= CUDA-MEMCHECK
Bin:    Count:
2    1665496
3    1668130
4    1667644
5    1667435
6    1665026
7    1666269
========= ERROR SUMMARY: 0 errors
$

这篇关于在生成一个不同的范围Cuda的内核中的随机数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆