cuModuleLoad期间为CUDA_ERROR_INVALID_IMAGE [英] CUDA_ERROR_INVALID_IMAGE during cuModuleLoad
问题描述
我创建了一个非常简单的内核(可以在这里找到),我成功编译使用
I've created a very simple kernel (can be found here) which I successfully compile using
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" --cl-version 2012 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -cudart static -cubin temp.cu
然后使用以下代码在
CUresult err = cuInit(0);
CUdevice device;
err = cuDeviceGet(&device, 0);
CUcontext ctx;
err = cuCtxCreate(&ctx, 0, device);
CUmodule module;
string path = string(dir) + "\\temp.cubin";
err = cuModuleLoad(&module, path.c_str());
cuCtxDetach(ctx);
CUDA_ERROR_INVALID_IMAGE
。有人能告诉我为什么会发生这种情况吗?内核的有效和编译没有问题。
Unfortunately, during cuModuleLoad
I get a result of CUDA_ERROR_INVALID_IMAGE
. Can someone tell me why this could be happening? The kernel's valid and compiles without issues.
推荐答案
CUDA_ERROR_INVALID_IMAGE
当模块文件无效时,只有 cuModuleLoad
才会返回。如果缺少或包含体系结构不匹配,您应该可能会看到 CUDA_ERROR_FILE_NOT_FOUND
或 CUDA_ERROR_INVALID_SOURCE
错误。你没有给我们足够的细节或代码来说明发生了什么,但至少原则上你的API代码应该工作。
The CUDA_ERROR_INVALID_IMAGE
error should only be returned by cuModuleLoad
when the module file is invalid. If it is missing or contains an architecture mismatch you should probably see a CUDA_ERROR_FILE_NOT_FOUND
or CUDA_ERROR_INVALID_SOURCE
error. You haven't given us enough details or code to say for certain what is happening, but in principle at least, the API code you have should work.
这应该工作,考虑以下工作示例在Linux与CUDA 5.5:
To show how this should work, consider the following working example on Linux with CUDA 5.5:
首先您的内核:
#include <cmath>
using namespace std;
__device__ __inline__ float trim(unsigned char value)
{
return fminf((unsigned char)255, fmaxf(value, (unsigned char)0));
}
__constant__ char z = 1;
__global__ void kernel(unsigned char* img, const float* a)
{
int ix = blockIdx.x;
int iy = threadIdx.x;
int tid = iy*blockDim.x + ix;
float x = (float)ix / blockDim.x;
float y = (float)iy / gridDim.x;
//placeholder
img[tid*4+0] = trim((a[0]*z*z+a[1]*z+a[2]) * 255.0f);
img[tid*4+1] = trim((a[3]*z*z+a[4]*z+a[5]) * 255.0f);
img[tid*4+2] = trim((a[6]*z*z+a[7]*z+a[8]) * 255.0f);
img[tid*4+3] = 255;
}
然后一个简单的程序在运行时将cubin加载到上下文中: p>
Then a simple program to load the cubin into a context at runtime:
#include <cuda.h>
#include <string>
#include <iostream>
#define Errchk(ans) { DrvAssert((ans), __FILE__, __LINE__); }
inline void DrvAssert( CUresult code, const char *file, int line)
{
if (code != CUDA_SUCCESS) {
std::cout << "Error: " << code << " " << file << "@" << line << std::endl;
exit(code);
} else {
std::cout << "Success: " << file << "@" << line << std::endl;
}
}
int main(void)
{
Errchk( cuInit(0) );
CUdevice device;
Errchk( cuDeviceGet(&device, 0) );
CUcontext ctx;
Errchk( cuCtxCreate(&ctx, 0, device) );
CUmodule module;
std::string path = "qkernel.cubin";
Errchk( cuModuleLoad(&module, path.c_str()) );
cuCtxDetach(ctx);
return 0;
}
为主机中存在的设备架构(GTX670在这种情况下):
Build the cubin for the architecture of the device present in the host (a GTX670 in this case):
$ nvcc -arch=sm_30 -Xptxas="-v" --cubin qkernel.cu
ptxas info : 11 bytes gmem, 1 bytes cmem[3]
ptxas info : Compiling entry function '_Z6kernelPhPKf' for 'sm_30'
ptxas info : Function properties for _Z6kernelPhPKf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 10 registers, 336 bytes cmem[0]
和主机程式:
$ nvcc -o qexe qmain.cc -lcuda
然后运行:
$ ./qexe
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Success: qmain.cc@26
模块代码加载。如果我删除cubin并再次运行,我看到:
The module code loads. If I delete the cubin and run again, I see this:
$ rm qkernel.cubin
$ ./qexe
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Error: 301 qmain.cc@26
如果我编译一个不兼容的架构,我看到:
If I compile for an incompatible architecture, I see this:
$ nvcc -arch=sm_10 -Xptxas="-v" --cubin qkernel.cu
ptxas info : 0 bytes gmem, 1 bytes cmem[0]
ptxas info : Compiling entry function '_Z6kernelPhPKf' for 'sm_10'
ptxas info : Used 5 registers, 32 bytes smem, 4 bytes cmem[1]
$ ./qexe
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Error: 300 qmain.cc@26
如果我编译到一个目标文件,而不是cubin,我看到:
If I compile to an object file, not a cubin, I see this:
$ nvcc -arch=sm_30 -Xptxas="-v" -c -o qkernel.cubin qkernel.cu
ptxas info : 11 bytes gmem, 1 bytes cmem[3]
ptxas info : Compiling entry function '_Z6kernelPhPKf' for 'sm_30'
ptxas info : Function properties for _Z6kernelPhPKf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 10 registers, 336 bytes cmem[0]
$ ./qexe
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Error: 200 qmain.cc@26
这是我可以获得代码以发出 CUDA_ERROR_INVALID_IMAGE
错误。我可以建议的是尝试我的代码和食谱,看看是否可以让它工作。
This is the only way I can get the code to emit a CUDA_ERROR_INVALID_IMAGE
error. All I can suggest is to try my code and recipe and see if you can get it to work.
这篇关于cuModuleLoad期间为CUDA_ERROR_INVALID_IMAGE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!