cuModuleLoad期间为CUDA_ERROR_INVALID_IMAGE [英] CUDA_ERROR_INVALID_IMAGE during cuModuleLoad

查看:451
本文介绍了cuModuleLoad期间为CUDA_ERROR_INVALID_IMAGE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个非常简单的内核(可以在这里找到),我成功编译使用

I've created a very simple kernel (can be found here) which I successfully compile using

"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" --cl-version 2012 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -cudart static -cubin temp.cu

然后使用以下代码在

CUresult err = cuInit(0);
CUdevice device;
err = cuDeviceGet(&device, 0);
CUcontext ctx;
err = cuCtxCreate(&ctx, 0, device);

CUmodule module;
string path = string(dir) + "\\temp.cubin";
err = cuModuleLoad(&module, path.c_str());

cuCtxDetach(ctx);



CUDA_ERROR_INVALID_IMAGE 。有人能告诉我为什么会发生这种情况吗?内核的有效和编译没有问题。

Unfortunately, during cuModuleLoad I get a result of CUDA_ERROR_INVALID_IMAGE. Can someone tell me why this could be happening? The kernel's valid and compiles without issues.

推荐答案

CUDA_ERROR_INVALID_IMAGE 当模块文件无效时,只有 cuModuleLoad 才会返回。如果缺少或包含体系结构不匹配,您应该可能会看到 CUDA_ERROR_FILE_NOT_FOUND CUDA_ERROR_INVALID_SOURCE 错误。你没有给我们足够的细节或代码来说明发生了什么,但至少原则上你的API代码应该工作。

The CUDA_ERROR_INVALID_IMAGE error should only be returned by cuModuleLoad when the module file is invalid. If it is missing or contains an architecture mismatch you should probably see a CUDA_ERROR_FILE_NOT_FOUND or CUDA_ERROR_INVALID_SOURCE error. You haven't given us enough details or code to say for certain what is happening, but in principle at least, the API code you have should work.

这应该工作,考虑以下工作示例在Linux与CUDA 5.5:

To show how this should work, consider the following working example on Linux with CUDA 5.5:

首先您的内核:

#include <cmath>
using namespace std;

__device__ __inline__ float trim(unsigned char value)
{
    return fminf((unsigned char)255, fmaxf(value, (unsigned char)0));
}

__constant__ char z = 1;

__global__ void kernel(unsigned char* img, const float* a)
{
    int ix = blockIdx.x;
    int iy = threadIdx.x;
    int tid = iy*blockDim.x + ix;

    float x = (float)ix / blockDim.x;
    float y = (float)iy / gridDim.x;

    //placeholder

    img[tid*4+0] = trim((a[0]*z*z+a[1]*z+a[2]) * 255.0f);
    img[tid*4+1] = trim((a[3]*z*z+a[4]*z+a[5]) * 255.0f);
    img[tid*4+2] = trim((a[6]*z*z+a[7]*z+a[8]) * 255.0f);
    img[tid*4+3] = 255;
}

然后一个简单的程序在运行时将cubin加载到上下文中: p>

Then a simple program to load the cubin into a context at runtime:

#include <cuda.h>
#include <string>
#include <iostream>

#define Errchk(ans) { DrvAssert((ans), __FILE__, __LINE__); }
inline void DrvAssert( CUresult code, const char *file, int line)
{
    if (code != CUDA_SUCCESS) {
        std::cout << "Error: " << code << " " <<  file << "@" << line << std::endl;
        exit(code);
    } else {
        std::cout << "Success: " << file << "@" << line << std::endl;
    }
}

int main(void)
{
    Errchk( cuInit(0) );
    CUdevice device;
    Errchk( cuDeviceGet(&device, 0) );
    CUcontext ctx;
    Errchk( cuCtxCreate(&ctx, 0, device) );

    CUmodule module;
    std::string path = "qkernel.cubin";
    Errchk( cuModuleLoad(&module, path.c_str()) );

    cuCtxDetach(ctx);
    return 0;
}

为主机中存在的设备架构(GTX670在这种情况下):

Build the cubin for the architecture of the device present in the host (a GTX670 in this case):

$ nvcc -arch=sm_30 -Xptxas="-v" --cubin qkernel.cu 
ptxas info    : 11 bytes gmem, 1 bytes cmem[3]
ptxas info    : Compiling entry function '_Z6kernelPhPKf' for 'sm_30'
ptxas info    : Function properties for _Z6kernelPhPKf
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 10 registers, 336 bytes cmem[0]

和主机程式:

$ nvcc -o qexe qmain.cc -lcuda

然后运行:

$ ./qexe 
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Success: qmain.cc@26

模块代码加载。如果我删除cubin并再次运行,我看到:

The module code loads. If I delete the cubin and run again, I see this:

$ rm qkernel.cubin 
$ ./qexe 
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Error: 301 qmain.cc@26

如果我编译一个不兼容的架构,我看到:

If I compile for an incompatible architecture, I see this:

$ nvcc -arch=sm_10 -Xptxas="-v" --cubin qkernel.cu 
ptxas info    : 0 bytes gmem, 1 bytes cmem[0]
ptxas info    : Compiling entry function '_Z6kernelPhPKf' for 'sm_10'
ptxas info    : Used 5 registers, 32 bytes smem, 4 bytes cmem[1]
$ ./qexe 
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Error: 300 qmain.cc@26

如果我编译到一个目标文件,而不是cubin,我看到:

If I compile to an object file, not a cubin, I see this:

$ nvcc -arch=sm_30 -Xptxas="-v" -c -o qkernel.cubin qkernel.cu 
ptxas info    : 11 bytes gmem, 1 bytes cmem[3]
ptxas info    : Compiling entry function '_Z6kernelPhPKf' for 'sm_30'
ptxas info    : Function properties for _Z6kernelPhPKf
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 10 registers, 336 bytes cmem[0]
$ ./qexe 
Success: qmain.cc@18
Success: qmain.cc@20
Success: qmain.cc@22
Error: 200 qmain.cc@26

这是我可以获得代码以发出 CUDA_ERROR_INVALID_IMAGE 错误。我可以建议的是尝试我的代码和食谱,看看是否可以让它工作。

This is the only way I can get the code to emit a CUDA_ERROR_INVALID_IMAGE error. All I can suggest is to try my code and recipe and see if you can get it to work.

这篇关于cuModuleLoad期间为CUDA_ERROR_INVALID_IMAGE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆