Cuda网格大小限制 [英] Cuda grid size limitations

查看：467 发布时间：2020/10/13 1:42:14 cuda

本文介绍了Cuda网格大小限制的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我可以将CUDA内核的网格大小设置为多少有限制吗？我遇到了一个问题，即内核不是以33 x 33的网格大小启动的，而是能够在32 x 32的网格大小的情况下启动的。这是否有发生的原因？还是可能将块数从32 x 32更改为33 x 33打破了其他限制？

Are there limitations as to what I can set the grid size of a CUDA kernel to be? I ran into a problem where kernels were not launching with a grid size of 33 x 33 but were able to launch when the grid size was 32 x 32. Is there any reason for this to occur? Or is it likely that changing the number of blocks from 32 x 32 to 33 x 33 broke some other constraint?

dim3 blockSize(8, 8);
dim3 gridSize(32, 32);

cudaDeviceSynchronize();
set_start<<<gridSize, blockSize>>>(some_params);

以上工作。

dim3 blockSize(8, 8);
dim3 gridSize(33, 33);

cudaDeviceSynchronize();
set_start<<<gridSize, blockSize>>>(some_params);

以上操作无效。

内核和main：

__global__
void set_start(double * const H , double * const HU , double * const HV , 
           double * const E , const int Na)
{
int j = threadIdx.x + blockIdx.x*blockDim.x + 1;
int i = threadIdx.y + blockIdx.y*blockDim.y + 1;

if(i >= Na-1 || j >= Na-1)
    return;

H[i*Na+j]  = 1.0 + exp(-100.0*((E[j-1]-0.75)*(E[j-1]-0.75)+(E[i-1]-0.75)*(E[i-1]-0.75))) + 0.5*exp(-100.0*((E[j-1]-0.75)*(E[j-1]-0.75)+(E[i-1]-0.25)*(E[i-1]-0.25)));
HU[i*Na+j] = 0; 
HV[i*Na+j] = 0;
}

int main(int argc, char** argv){

double* E_d;
cudaMalloc(&E_d, sizeof(double) * (Nh+1));
set_E<<<64, (Nh/64) + 1>>>(E_d, dx, Nh);

int Na = 259;
double *H_d, *HU_d, *HV_d, *Ht_d, *HUt_d, *HVt_d;

cudaMalloc(&H_d , sizeof(double) * Na * Na);
cudaMalloc(&HU_d, sizeof(double) * Na * Na);
cudaMalloc(&HV_d, sizeof(double) * Na * Na);

dim3 blockSize(8, 8);
//dim3 gridSize(((Na-1)/blockSize.x) + 1, ((Na-1)/blockSize.y) + 1);
//dim3 gridSize(33, 33);
dim3 gridSize(32, 32);

cudaDeviceSynchronize();
set_start<<<blockSize, gridSize>>>(H_d, HU_d, HV_d, E_d, Na);
}

这是CUDA 7.0。

This was on CUDA 7.0.

推荐答案

调用内核时，您会混淆块大小和网格大小。

You have block size and grid size mixed up when calling the kernel.

set_start<<<blockSize, gridSize>>>(H_d, HU_d, HV_d, E_d, Na);

应读为：

set_start<<<gridSize, blockSize>>>(H_d, HU_d, HV_d, E_d, Na);

由于这个错误，您实际上试图启动一个大小为blockSize的网格，以及大小为gridSize的块。看来您的GPU上一个块的最大大小为1024个线程，因此启动33x33的块失败。

Because of this bug you are actually trying to launch a grid of size blockSize, and blocks of size gridSize. It would appear that the maximum size of a block on your GPU is 1024 threads, so launching blocks of 33x33 fails.

这篇关于Cuda网格大小限制的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Cuda网格大小限制 [英] Cuda grid size limitations

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Cuda网格大小限制 [英] Cuda grid size limitations

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭