在输入数据中使用各种偏移量时,CUDA 内核启动失败 [英] CUDA kernel launch fails when using various offsets into input data

查看:13
本文介绍了在输入数据中使用各种偏移量时,CUDA 内核启动失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码给出了一条错误消息,我正在尝试追查它的原因.为了更容易找到问题,我删除了显然与导致错误消息无关的代码.如果您能告诉我为什么以下简单代码会产生错误消息,那么我认为我应该能够修复我的原始代码:

My code is giving an error message and I am trying to track down the cause of it. To make it easier to find the problem, I have stripped away code that apparently is not relevant to causing the error message. If you can tell me why the following simple code produces an error message, then I think I should be able to fix my original code:

#include "cuComplex.h"
#include <cutil.h>

__device__ void compute_energy(void *data, int isample, int nsamples) {
  cuDoubleComplex * const nminusarray          = (cuDoubleComplex*)data;
  cuDoubleComplex * const f                    = (cuDoubleComplex*)(nminusarray+101);
  double          * const abs_est_errorrow_all = (double*)(f+3);
  double          * const rel_est_errorrow_all = (double*)(abs_est_errorrow_all+nsamples*51);
  int             * const iid_all              = (int*)(rel_est_errorrow_all+nsamples*51);
  int             * const iiu_all              = (int*)(iid_all+nsamples*21);
  int             * const piv_all              = (int*)(iiu_all+nsamples*21);
  cuDoubleComplex * const energyrow_all        = (cuDoubleComplex*)(piv_all+nsamples*12);
  cuDoubleComplex * const refinedenergyrow_all = (cuDoubleComplex*)(energyrow_all+nsamples*51);
  cuDoubleComplex * const btplus_all           = (cuDoubleComplex*)(refinedenergyrow_all+nsamples*51);

  cuDoubleComplex * const btplus           = btplus_all+isample*21021;

  btplus[0] = make_cuDoubleComplex(0.0, 0.0);
}

__global__ void computeLamHeight(void *data, int nlambda) {
  compute_energy(data, blockIdx.x, nlambda);
}

int main(int argc, char *argv[]) {
  void *device_data;

  CUT_DEVICE_INIT(argc, argv);
  CUDA_SAFE_CALL(cudaMalloc(&device_data, 184465640));
  computeLamHeight<<<dim3(101, 1, 1), dim3(512, 1, 1), 45000>>>(device_data, 101);
  CUDA_SAFE_CALL(cudaThreadSynchronize());
}

我使用的是 GeForce GTX 480,我正在编译代码:

I am using a GeForce GTX 480 and I am compiling the code like so:

nvcc -L /soft/cuda-sdk/4.0.17/C/lib -I /soft/cuda-sdk/4.0.17/C/common/inc -lcutil_x86_64 -arch sm_13 -O3 -Xopencc "-Wall" Main.cu

输出是:

Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 31 : unspecified launch failure.

我现在进一步简化了代码.以下更简单的代码仍然会产生错误消息:

I have now further simplified the code. The following simpler code still produces the error message:

#include <cutil.h>

__global__ void compute_energy(void *data) {
  *(double*)((int*)data+101) = 0.0;
}

int main(int argc, char *argv[]) {
  void *device_data;

  CUT_DEVICE_INIT(argc, argv);
  CUDA_SAFE_CALL(cudaMalloc(&device_data, 101*sizeof(int)+sizeof(double)));
  compute_energy<<<dim3(1, 1, 1), dim3(1, 1, 1)>>>(device_data);
  CUDA_SAFE_CALL(cudaThreadSynchronize());
}

现在很容易看出偏移量应该是有效的.我尝试运行 cuda-memcheck,它显示以下内容:

Now it is easy to see that the offset should be valid. I tried running cuda-memcheck and it says the following:

========= CUDA-MEMCHECK
Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 13 : unspecified launch failure.
========= Invalid __global__ write of size 8
=========     at 0x00000020 in compute_energy
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x200200194 is misaligned
=========
========= ERROR SUMMARY: 1 error

我尝试搜索互联网以查找地址未对齐的含义,但我未能找到解释.什么交易?

I tried searching the internet to find what is meant by the address being misaligned, but I failed to find an explanation. What is the deal?

推荐答案

用所有这些神奇的常量解析你的原始代码非常困难,但是你更新的 repro 案例让问题立即变得明显.GPU 架构要求所有指针与字边界对齐.您的内核包含一个未正确字对齐的指针访问.双精度是 64 位类型,并且您的寻址未与偶数 64 位边界对齐.这个:

It was very hard to parse your original code with all of those magic constants, but your updated repro case makes the problem immediately obvious. The GPU architecture requires all pointers to be aligned to word boundaries. Your kernel contains a pointer access which is not correctly word aligned. Doubles are an 64 bit type, and your addressing is not aligned to an even 64 bit boundary. This:

*(double*)((int*)data+100) = 0.0; // 50th double

或者这个:

*(double*)((int*)data+102) = 0.0; // 51st double

都是合法的.这个:

*(double*)((int*)data+101) = 0.0; // not aligned to a 64 bit boundary

不是.

这篇关于在输入数据中使用各种偏移量时,CUDA 内核启动失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆