编译CUDA程序 [英] Compiling CUDA program
问题描述
使用CUDA并行化RayTracing程序时,我感到很困惑。我有顺序代码,我写了并行代码(内核)。
运行程序时,遇到以下错误(从VS2010复制)
错误1错误MSB3721:命令C:\Program Files \ NVIDIA GPU计算工具包\CUDA\v4.2\bin\\\
vcc.exe-gencode = arch = compute_21,代码= \sm_21,compute_21\-gencode = arch = compute_10,code = \sm_10,compute_10\--use-local-env --cl-version 2010 -ccbinC:\Program Files \Microsoft Visual Studio 10.0 \VC\bin-IC:\Program Files \ NVIDIA GPU计算工具包\CUDA \v4.2 \include--keep-dirRelease-maxrregcount = 0 --machine 32 --compile -Xcompiler/ EHsc / nologo / Od / Zi / MD-oRelease\CUDAraytracer.cu.objc:\Users\mc.choice\Desktop\\ \\ CUDAraytracer.cu从代码-1退出。 C:\Program Files\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 4.2.targets 361
我想我已经正确地包含了所有的库和标题。
关于如何编译&
提前Tnx
在这种特殊情况下,问题中最初描述的错误来源于传递到 nvcc
的特定命令行开关集:
-gencode = arch = compute_21,code = \sm_21,compute_21\
compute_21
是不是有效的虚拟架构。
Visual Studio正在生成那个特定的无效开关不清楚。但是,通过在显示 sm_21
的任何位置将项目设置更改为 sm_20
,可以解决该问题。这不应该对代码生成有重大影响,并且对代码的支持能力没有影响。
正如评论中所讨论的,OP似乎还有其他问题
编辑:我运行了您在最近评论中提供的程序。它似乎为我正确运行。我在linux下运行它,而不是windows,因为这是我方便做这种类型的测试的机器。我没有对你的程序做任何更改,除了更改一些包括文件匹配linux路径名等。我观察到的主要问题是,一般来说,每帧渲染大约需要17秒。如果你的GPU慢得多,你可能需要等待几分钟才能看到第一帧。下面是示例输出:
所以我会说主要的问题是提高渲染速度。我还没有花很多时间看你的程序,但是任何内核调用<<<< 1,1>>
真正有效地使用GPU。
我使用的GPU是一个Quadro1000M GPU,可能会比你的9500GS快得多。
I am strugling with parallelizing a RayTracing program, using CUDA. I have the sequential code, and I have wrote the parallel code (kernel).
When running the program, I encounter the following error (copied from VS2010)
Error 1 error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" -gencode=arch=compute_21,code=\"sm_21,compute_21\" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2010 -ccbin "C:\Program Files\Microsoft Visual Studio 10.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include" --keep-dir "Release" -maxrregcount=0 --machine 32 --compile -Xcompiler "/EHsc /nologo /Od /Zi /MD " -o "Release\CUDAraytracer.cu.obj" "c:\Users\mc.choice\Desktop\CUDAraytracer.cu"" exited with code -1. C:\Program Files\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 4.2.targets 361
I think I have all libs and headers included correctly.
And ideas on how to compile & run it successfully, and what the cause of the error would be?
Tnx in advance
In this particular case, the error initially described in the question is originating from this particular set of command line switches being passed to nvcc
:
-gencode=arch=compute_21,code=\"sm_21,compute_21\"
compute_21
is not a valid virtual architecture.
Why exactly Visual Studio is generating that particular invalid switch is not clear. However that particular issue can be worked around by changing the project settings to sm_20
in any place where sm_21
shows up. This should not have a significant effect on code generation, and has no effect on supported capability of the code.
As discussed in the comments, OP seems to have other issues as well with the Visual Studio configuration.
EDIT: I ran the program you provided in your recent comment. It seems to run "correctly" for me. I ran it under linux, rather than windows, because that was the machine I had handy to do this type of testing. I didn't make any changes to your program except to change some of the include files to match the linux pathnames, etc. The main issue I observed is that in general, it seems to take about 17 seconds per frame to render. If your GPU is much much slower, you may have to wait several minutes to see the first frame. Here's sample output:
So I would say that the main issue is to improve the rendering speed. I haven't spent a lot of time looking over your program yet, but any kernel called with <<<1,1>>>
configuration is not really making effective use of the GPU.
The GPU I used for this is a Quadro1000M GPU, which may be significantly faster than your 9500GS.
这篇关于编译CUDA程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!