CUDA:“未指定的启动失败” [英] CUDA: "unspecified launch failure"

查看:163
本文介绍了CUDA:“未指定的启动失败”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码正在给出错误消息,我正在尝试跟踪其原因。为了更容易找到问题,我已经剥离显然与导致错误消息不相关的代码。如果你能告诉我为什么下面的简单代码产生一个错误消息,那么我想我应该能够修复我的原始代码:

My code is giving an error message and I am trying to track down the cause of it. To make it easier to find the problem, I have stripped away code that apparently is not relevant to causing the error message. If you can tell me why the following simple code produces an error message, then I think I should be able to fix my original code:

#include "cuComplex.h"
#include <cutil.h>

__device__ void compute_energy(void *data, int isample, int nsamples) {
  cuDoubleComplex * const nminusarray          = (cuDoubleComplex*)data;
  cuDoubleComplex * const f                    = (cuDoubleComplex*)(nminusarray+101);
  double          * const abs_est_errorrow_all = (double*)(f+3);
  double          * const rel_est_errorrow_all = (double*)(abs_est_errorrow_all+nsamples*51);
  int             * const iid_all              = (int*)(rel_est_errorrow_all+nsamples*51);
  int             * const iiu_all              = (int*)(iid_all+nsamples*21);
  int             * const piv_all              = (int*)(iiu_all+nsamples*21);
  cuDoubleComplex * const energyrow_all        = (cuDoubleComplex*)(piv_all+nsamples*12);
  cuDoubleComplex * const refinedenergyrow_all = (cuDoubleComplex*)(energyrow_all+nsamples*51);
  cuDoubleComplex * const btplus_all           = (cuDoubleComplex*)(refinedenergyrow_all+nsamples*51);

  cuDoubleComplex * const btplus           = btplus_all+isample*21021;

  btplus[0] = make_cuDoubleComplex(0.0, 0.0);
}

__global__ void computeLamHeight(void *data, int nlambda) {
  compute_energy(data, blockIdx.x, nlambda);
}

int main(int argc, char *argv[]) {
  void *device_data;

  CUT_DEVICE_INIT(argc, argv);
  CUDA_SAFE_CALL(cudaMalloc(&device_data, 184465640));
  computeLamHeight<<<dim3(101, 1, 1), dim3(512, 1, 1), 45000>>>(device_data, 101);
  CUDA_SAFE_CALL(cudaThreadSynchronize());
}



我使用的是GeForce GTX 480,我编译的代码如下:

I am using a GeForce GTX 480 and I am compiling the code like so:

nvcc -L /soft/cuda-sdk/4.0.17/C/lib -I /soft/cuda-sdk/4.0.17/C/common/inc -lcutil_x86_64 -arch sm_13 -O3 -Xopencc "-Wall" Main.cu


$ b b

输出为:

The output is:

Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 31 : unspecified launch failure.

编辑:我现在进一步简化了代码。以下简单代码仍会生成错误消息:

I have now further simplified the code. The following simpler code still produces the error message:

#include <cutil.h>

__global__ void compute_energy(void *data) {
  *(double*)((int*)data+101) = 0.0;
}

int main(int argc, char *argv[]) {
  void *device_data;

  CUT_DEVICE_INIT(argc, argv);
  CUDA_SAFE_CALL(cudaMalloc(&device_data, 101*sizeof(int)+sizeof(double)));
  compute_energy<<<dim3(1, 1, 1), dim3(1, 1, 1)>>>(device_data);
  CUDA_SAFE_CALL(cudaThreadSynchronize());
}

现在很容易看到偏移量应该有效。我试过运行cuda-memcheck,它说如下:

Now it is easy to see that the offset should be valid. I tried running cuda-memcheck and it says the following:

========= CUDA-MEMCHECK
Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 13 : unspecified launch failure.
========= Invalid __global__ write of size 8
=========     at 0x00000020 in compute_energy
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x200200194 is misaligned
=========
========= ERROR SUMMARY: 1 error

我尝试在互联网上搜索找到地址意味着什么是错位,但我没有找到解释。什么是交易?

I tried searching the internet to find what is meant by the address being misaligned, but I failed to find an explanation. What is the deal?

推荐答案

用所有这些魔法常量解析原始代码非常困难,情况使问题立即显现。 GPU体系结构要求所有指针与字边界对齐。你的内核包含一个不正确的字对齐的指针访问。双字是64位类型,并且您的寻址未与偶数64位边界对齐。这:

It was very hard to parse your original code with all of those magic constants, but your updated repro case makes the problem immediately obvious. The GPU architecture requires all pointers to be aligned to word boundaries. Your kernel contains a pointer access which is not correctly word aligned. Doubles are an 64 bit type, and your addressing is not aligned to an even 64 bit boundary. This:

*(double*)((int*)data+100) = 0.0; // 50th double

或:

*(double*)((int*)data+102) = 0.0; // 51st double

都是合法的。这:

*(double*)((int*)data+101) = 0.0; // not aligned to a 64 bit boundary

不是。

这篇关于CUDA:“未指定的启动失败”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆