Cuda三重嵌套循环赋值 [英] Cuda triple nested for loop assignement

查看：659 发布时间：2016/11/1 9:29:42 c++ for-loop cuda nested-loops

本文介绍了Cuda三重嵌套循环赋值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图将c ++代码转换为Cuda代码，我有以下三重嵌套for循环将填充一个数组进一步的OpenGL渲染（我只是创建一个坐标vertices数组）：

I'm trying to convert c++ code into Cuda code and I've got the following triple nested for loop that will fill an array for further OpenGL rendering (i'm simply creating a coordinate vertices array):

for(int z=0;z<263;z++) {                    
       for(int y=0;y<170;y++) {
           for(int x=0;x<170;x++) {
               g_vertex_buffer_data_3[i]=(float)x+0.5f;
               g_vertex_buffer_data_3[i+1]=(float)y+0.5f;
               g_vertex_buffer_data_3[i+2]=-(float)z+0.5f; 
               i+=3;            
           }
       }
   }

操作，所以我将使用Cuda的一些操作，如上所列。我想为最外层循环的每次迭代创建一个块，并且由于内层循环具有170 * 170 = 28900次迭代的迭代，为每个最内层循环迭代分配一个线程。我把c ++代码转换成这个（它只是一个小程序，我明白如何使用Cuda）：

I would like to get faster operations and so I'll use Cuda for some operations like the one listed above. I want to create one block for each iteration of the outermost loop and since the inner loops have iterations of 170 * 170 = 28900 total iterations, assign one thread to each innermost loop iteration. I converted the c++ code into this (it's just a small program that i made to understand how to use Cuda):

__global__ void mykernel(int k, float *buffer) {
int idz=blockIdx.x;
int idx=threadIdx.x;
int idy=threadIdx.y;

buffer[k]=idx+0.5;
buffer[k+1]=idy+0.5;
buffer[k+2]=idz+0.5;
k+=3;

}

int main(void) {
  int dim=3*170*170*263;
  float* g_vertex_buffer_data_2 = new float[dim];
  float* g_vertex_buffer_data_3;
  int i=0;

  HANDLE_ERROR(cudaMalloc((void**)&g_vertex_buffer_data_3, sizeof(float)*dim));

  dim3 dimBlock(170, 170);

  dim3 dimGrid(263);

  mykernel<<<dimGrid, dimBlock>>>(i, g_vertex_buffer_data_3);

  HANDLE_ERROR(cudaMemcpy(&g_vertex_buffer_data_2,g_vertex_buffer_data_3,sizeof(float)*dim,cudaMemcpyDeviceToHost));

  for(int j=0;j<100;j++){
    printf("g_vertex_buffer_data_2[%d]=%f\n",j,g_vertex_buffer_data_2[j]);
  }
  cudaFree(g_vertex_buffer_data_3);

  return 0;

}

尝试启动它我得到一个segmenation故障。你知道我做错了什么吗？
我想问题是threadIdx.x和threadIdx.y同时增长，而我想有threadIdx.x是内部和threadIdx.y是外部的。

Trying to launch it I get a segmenation fault. Do you know what am i doing wrong? I think the problem is that threadIdx.x and threadIdx.y grow at the same time, while I would like to have threadIdx.x to be the inner one and threadIdx.y to be the outer one.

推荐答案

这里有很多错误，但segfault的来源是：

There is a lot wrong here, but the source of the segfault is this:

cudaMemcpy(&g_vertex_buffer_data_2,g_vertex_buffer_data_3,
                sizeof(float)*dim,cudaMemcpyDeviceToHost);

您要么

cudaMemcpy(&g_vertex_buffer_data_2[0],g_vertex_buffer_data_3,
                sizeof(float)*dim,cudaMemcpyDeviceToHost);

或

cudaMemcpy(g_vertex_buffer_data_2,g_vertex_buffer_data_3,
                sizeof(float)*dim,cudaMemcpyDeviceToHost);

一旦你修复了，你会注意到内核实际上从不启动一个无效的启动错误。这是因为（170,170）的块大小是非法的。 CUDA在所有当前硬件上的每个块限制有1024个线程。

Once you fix that you will notice that the kernel is actually never launching with an invalid launch error. This is because a block size of (170,170) is illegal. CUDA has a 1024 threads per block limit on all current hardware.

您的代码中可能还有其他问题。我停止照顾我找到这两个。

There might well be other problems in your code. I stopped looking after I found these two.

这篇关于Cuda三重嵌套循环赋值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Cuda三重嵌套循环赋值 [英] Cuda triple nested for loop assignement

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

Cuda三重嵌套循环赋值 [英] Cuda triple nested for loop assignement

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭