CUDA内核 - 嵌套for循环 [英] CUDA kernel - nested for loop

查看:135
本文介绍了CUDA内核 - 嵌套for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hello
我试图写一个CUDA内核来执行下面的代码。

Hello I'm trying to write a CUDA kernel to perform the following piece of code.

for (n = 0; n < (total-1); n++)
{
  a = values[n];

  for ( i = n+1; i < total ; i++)
  {
    b = values[i] - a;
    c = b*b;

    if( c < 10)
        newvalues[i] = c;
    }
}

这是我目前的状态,似乎给了正确的结果?有人知道我做错了什么。欢呼

This is what I have currently, but it does not seem to be giving the correct results? does anyone know what I'm doing wrong. Cheers

__global__ void calc(int total, float *values, float *newvalues){

float a,b,c;

int idx = blockIdx.x * blockDim.x + threadIdx.x;

for (int n = idx; n < (total-1); n += blockDim.x*gridDim.x){
    a = values[n];

    for(int i = n+1; i < total; i++){
        b = values[i] - a;
        c = b*b;

    if( c < 10)
        newvalues[i] = c;

    }
}


推荐答案

在2D中实现此问题,并使用2D线程块启动内核。在x和y维度中的总线数将等于总计。内核代码应如下所示:

Realize this problem in 2D and launch your kernel with 2D thread blocks. The total number of threads in x and y dimension will be equal to total . The kernel code should look like this:

__global__ void calc(float *values, float *newvalues, int total){


float a,b,c;

int n= blockIdy.y * blockDim.y + threadIdx.y;
int i= blockIdx.x * blockDim.x + threadIdx.x;

  if (n>=total || i>=total)
        return;

a = values[n];
b = values[i] - a;
c = b*b;
 if( c < 10)
        newvalues[i] = c;  

// I don't know your problem statement but i think it should be like: newvalues[n*total+i] = c;  


}

更新:

这是您应该如何调用内核

This is how you should call the kernel

dim3 block(16,16);
dim3 grid (  (total+15)/16,  (total+15)/16  );
calc<<<grid,block>>>(float *val, float *newval, int T);

同时请确保在内核中添加此行(参见更新的内核)

Also make sure you add this line in kernel (see updated kernel)

if (n>=total || i>=total)
return;

这篇关于CUDA内核 - 嵌套for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆