在输入数据中使用各种偏移量时,CUDA内核启动失败 [英] CUDA kernel launch fails when using various offsets into input data

查看:82
本文介绍了在输入数据中使用各种偏移量时,CUDA内核启动失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码给出了一条错误消息,并且我试图找出原因.为了更容易发现问题,我剥离了显然与导致错误消息无关的代码.如果您可以告诉我以下简单代码为什么会产生错误消息,那么我认为我应该能够修复我的原始代码:

My code is giving an error message and I am trying to track down the cause of it. To make it easier to find the problem, I have stripped away code that apparently is not relevant to causing the error message. If you can tell me why the following simple code produces an error message, then I think I should be able to fix my original code:

#include "cuComplex.h"
#include <cutil.h>

__device__ void compute_energy(void *data, int isample, int nsamples) {
  cuDoubleComplex * const nminusarray          = (cuDoubleComplex*)data;
  cuDoubleComplex * const f                    = (cuDoubleComplex*)(nminusarray+101);
  double          * const abs_est_errorrow_all = (double*)(f+3);
  double          * const rel_est_errorrow_all = (double*)(abs_est_errorrow_all+nsamples*51);
  int             * const iid_all              = (int*)(rel_est_errorrow_all+nsamples*51);
  int             * const iiu_all              = (int*)(iid_all+nsamples*21);
  int             * const piv_all              = (int*)(iiu_all+nsamples*21);
  cuDoubleComplex * const energyrow_all        = (cuDoubleComplex*)(piv_all+nsamples*12);
  cuDoubleComplex * const refinedenergyrow_all = (cuDoubleComplex*)(energyrow_all+nsamples*51);
  cuDoubleComplex * const btplus_all           = (cuDoubleComplex*)(refinedenergyrow_all+nsamples*51);

  cuDoubleComplex * const btplus           = btplus_all+isample*21021;

  btplus[0] = make_cuDoubleComplex(0.0, 0.0);
}

__global__ void computeLamHeight(void *data, int nlambda) {
  compute_energy(data, blockIdx.x, nlambda);
}

int main(int argc, char *argv[]) {
  void *device_data;

  CUT_DEVICE_INIT(argc, argv);
  CUDA_SAFE_CALL(cudaMalloc(&device_data, 184465640));
  computeLamHeight<<<dim3(101, 1, 1), dim3(512, 1, 1), 45000>>>(device_data, 101);
  CUDA_SAFE_CALL(cudaThreadSynchronize());
}

我正在使用GeForce GTX 480,并且正在像这样编译代码:

I am using a GeForce GTX 480 and I am compiling the code like so:

nvcc -L /soft/cuda-sdk/4.0.17/C/lib -I /soft/cuda-sdk/4.0.17/C/common/inc -lcutil_x86_64 -arch sm_13 -O3 -Xopencc "-Wall" Main.cu

输出为:

Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 31 : unspecified launch failure.

我现在进一步简化了代码.以下更简单的代码仍会生成错误消息:

I have now further simplified the code. The following simpler code still produces the error message:

#include <cutil.h>

__global__ void compute_energy(void *data) {
  *(double*)((int*)data+101) = 0.0;
}

int main(int argc, char *argv[]) {
  void *device_data;

  CUT_DEVICE_INIT(argc, argv);
  CUDA_SAFE_CALL(cudaMalloc(&device_data, 101*sizeof(int)+sizeof(double)));
  compute_energy<<<dim3(1, 1, 1), dim3(1, 1, 1)>>>(device_data);
  CUDA_SAFE_CALL(cudaThreadSynchronize());
}

现在很容易看出偏移量是有效的.我尝试运行cuda-memcheck,它显示以下内容:

Now it is easy to see that the offset should be valid. I tried running cuda-memcheck and it says the following:

========= CUDA-MEMCHECK
Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 13 : unspecified launch failure.
========= Invalid __global__ write of size 8
=========     at 0x00000020 in compute_energy
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x200200194 is misaligned
=========
========= ERROR SUMMARY: 1 error

我尝试在Internet上查找未对齐地址的含义,但是找不到解释.怎么了?

I tried searching the internet to find what is meant by the address being misaligned, but I failed to find an explanation. What is the deal?

推荐答案

很难用所有这些魔术常量来解析原始代码,但是更新后的repro案例使问题立即显而易见. GPU体系结构要求所有指针都与单词边界对齐.您的内核包含未正确字对齐的指针访问.双打是64位类型,并且您的地址未与偶数64位边界对齐.这个:

It was very hard to parse your original code with all of those magic constants, but your updated repro case makes the problem immediately obvious. The GPU architecture requires all pointers to be aligned to word boundaries. Your kernel contains a pointer access which is not correctly word aligned. Doubles are an 64 bit type, and your addressing is not aligned to an even 64 bit boundary. This:

*(double*)((int*)data+100) = 0.0; // 50th double

或者这个:

*(double*)((int*)data+102) = 0.0; // 51st double

都是合法的.这个:

*(double*)((int*)data+101) = 0.0; // not aligned to a 64 bit boundary

不是.

这篇关于在输入数据中使用各种偏移量时,CUDA内核启动失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆