cuModuleLoad 期间的 CUDA_ERROR_INVALID_IMAGE [英] CUDA_ERROR_INVALID_IMAGE during cuModuleLoad

查看:11
本文介绍了cuModuleLoad 期间的 CUDA_ERROR_INVALID_IMAGE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个非常简单的内核(可以在 here 找到),我已成功编译使用

I've created a very simple kernel (can be found here) which I successfully compile using

"C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.5in
vcc.exe" --cl-version 2012 -ccbin "C:Program Files (x86)Microsoft Visual Studio 11.0VCin" -I"C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.5include" -cudart static -cubin temp.cu

随后使用下面的代码将内核加载进去

and subsequently use the following code to load the kernel in

CUresult err = cuInit(0);
CUdevice device;
err = cuDeviceGet(&device, 0);
CUcontext ctx;
err = cuCtxCreate(&ctx, 0, device);

CUmodule module;
string path = string(dir) + "\temp.cubin";
err = cuModuleLoad(&module, path.c_str());

cuCtxDetach(ctx);

不幸的是,在 cuModuleLoad 期间,我得到了 CUDA_ERROR_INVALID_IMAGE 的结果.有人可以告诉我为什么会发生这种情况吗?内核有效,编译没有问题.

Unfortunately, during cuModuleLoad I get a result of CUDA_ERROR_INVALID_IMAGE. Can someone tell me why this could be happening? The kernel's valid and compiles without issues.

推荐答案

CUDA_ERROR_INVALID_IMAGE错误应该只在模块文件无效时由cuModuleLoad返回.如果它丢失或包含架构不匹配,您可能会看到 CUDA_ERROR_FILE_NOT_FOUNDCUDA_ERROR_INVALID_SOURCE 错误.您没有向我们提供足够的详细信息或代码来确定正在发生的事情,但至少在原则上,您拥有的 API 代码应该可以工作.

The CUDA_ERROR_INVALID_IMAGE error should only be returned by cuModuleLoad when the module file is invalid. If it is missing or contains an architecture mismatch you should probably see a CUDA_ERROR_FILE_NOT_FOUND or CUDA_ERROR_INVALID_SOURCE error. You haven't given us enough details or code to say for certain what is happening, but in principle at least, the API code you have should work.

为了说明这应该如何工作,请考虑以下在 Linux 上使用 CUDA 5.5 的工作示例:

To show how this should work, consider the following working example on Linux with CUDA 5.5:

首先是你的内核:

#include <cmath>
using namespace std;

__device__ __inline__ float trim(unsigned char value)
{
    return fminf((unsigned char)255, fmaxf(value, (unsigned char)0));
}

__constant__ char z = 1;

__global__ void kernel(unsigned char* img, const float* a)
{
    int ix = blockIdx.x;
    int iy = threadIdx.x;
    int tid = iy*blockDim.x + ix;

    float x = (float)ix / blockDim.x;
    float y = (float)iy / gridDim.x;

    //placeholder

    img[tid*4+0] = trim((a[0]*z*z+a[1]*z+a[2]) * 255.0f);
    img[tid*4+1] = trim((a[3]*z*z+a[4]*z+a[5]) * 255.0f);
    img[tid*4+2] = trim((a[6]*z*z+a[7]*z+a[8]) * 255.0f);
    img[tid*4+3] = 255;
}

然后是一个在运行时将 cubin 加载到上下文中的简单程序:

Then a simple program to load the cubin into a context at runtime:

#include <cuda.h>
#include <string>
#include <iostream>

#define Errchk(ans) { DrvAssert((ans), __FILE__, __LINE__); }
inline void DrvAssert( CUresult code, const char *file, int line)
{
    if (code != CUDA_SUCCESS) {
        std::cout << "Error: " << code << " " <<  file << "@" << line << std::endl;
        exit(code);
    } else {
        std::cout << "Success: " << file << "@" << line << std::endl;
    }
}

int main(void)
{
    Errchk( cuInit(0) );
    CUdevice device;
    Errchk( cuDeviceGet(&device, 0) );
    CUcontext ctx;
    Errchk( cuCtxCreate(&ctx, 0, device) );

    CUmodule module;
    std::string path = "qkernel.cubin";
    Errchk( cuModuleLoad(&module, path.c_str()) );

    cuCtxDetach(ctx);
    return 0;
}

为主机中存在的设备(在本例中为 GTX670)的架构构建 cubin:

Build the cubin for the architecture of the device present in the host (a GTX670 in this case):

$ nvcc -arch=sm_30 -Xptxas="-v" --cubin qkernel.cu 
ptxas info    : 11 bytes gmem, 1 bytes cmem[3]
ptxas info    : Compiling entry function '_Z6kernelPhPKf' for 'sm_30'
ptxas info    : Function properties for _Z6kernelPhPKf
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 10 registers, 336 bytes cmem[0]

和宿主程序:

$ nvcc -o qexe qmain.cc -lcuda

然后运行:

$ ./qexe 
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Success: qmain.cc@26

模块代码加载.如果我删除 cubin 并再次运行,我会看到:

The module code loads. If I delete the cubin and run again, I see this:

$ rm qkernel.cubin 
$ ./qexe 
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Error: 301 qmain.cc@26

如果我针对不兼容的架构进行编译,我会看到:

If I compile for an incompatible architecture, I see this:

$ nvcc -arch=sm_10 -Xptxas="-v" --cubin qkernel.cu 
ptxas info    : 0 bytes gmem, 1 bytes cmem[0]
ptxas info    : Compiling entry function '_Z6kernelPhPKf' for 'sm_10'
ptxas info    : Used 5 registers, 32 bytes smem, 4 bytes cmem[1]
$ ./qexe 
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Error: 300 qmain.cc@26

如果我编译为目标文件,而不是 cubin,我会看到:

If I compile to an object file, not a cubin, I see this:

$ nvcc -arch=sm_30 -Xptxas="-v" -c -o qkernel.cubin qkernel.cu 
ptxas info    : 11 bytes gmem, 1 bytes cmem[3]
ptxas info    : Compiling entry function '_Z6kernelPhPKf' for 'sm_30'
ptxas info    : Function properties for _Z6kernelPhPKf
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 10 registers, 336 bytes cmem[0]
$ ./qexe 
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Error: 200 qmain.cc@26

这是我可以让代码发出 CUDA_ERROR_INVALID_IMAGE 错误的唯一方法.我所能建议的就是试试我的代码和配方,看看你能不能让它工作.

This is the only way I can get the code to emit a CUDA_ERROR_INVALID_IMAGE error. All I can suggest is to try my code and recipe and see if you can get it to work.

这篇关于cuModuleLoad 期间的 CUDA_ERROR_INVALID_IMAGE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆