Cuda Hello World printf即使在-arch = sm_20下也无法使用 [英] Cuda Hello World printf not working even with -arch=sm_20
问题描述
我不认为我是Cuda的完全新手,但显然我是。
我最近将cuda设备升级到了1.3到2.1的能力( Geforce GT 630)。我还想对Cuda工具包5.0进行全面升级。
我可以编译通用的cuda内核,但是即使设置了-arch = sm_20,printf也无法正常工作。 / p>
代码:
#include< stdio.h>
#include< assert.h>
#include< cuda.h>
#include< cuda_runtime.h>
__global__ void test(){
printf( Hi Cuda World);
}
int main(int argc,char ** argv)
{
test<<< 1,1>>>> ;();
返回0;
}
编译器:
错误2错误MSB3721:命令 C:\Program Files\NVIDIA GPU计算工具包\CUDA\v5.0\bin\nvcc.exe -gencode = arch = compute_10,code = \ sm_20,compute_10\ --use-local-env --cl-version 2010 -ccbin C:\Program Files(x86)\Microsoft Visual Studio 10.0\VC \bin -I C:\程序文件\NVIDIA GPU计算工具包\CUDA\v5.0\包括 -I C:\程序文件\NVIDIA GPU计算工具包\CUDA\ \v5.0\include -G --keep-dir Debug -maxrregcount = 0 --machine 32 --compile -arch = sm_20 -g -D_MBCS -Xcompiler / EHsc / W3 / nologo / Od / Zi / RTC1 / MDd -o Debug\main.cu.obj d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu,代码退出2. C:\Program Files(x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 5.0.targets 592 10 testCuda
错误1错误:调用__host__函数(不允许使用__global__函数(测试)中的printf) d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu 9 1 testCuda
由于这个问题,我即将毕生难忘...完成了。
在内核中,printf仅在计算能力2或更高版本的硬件中受支持。由于您的项目被设置为同时具有 计算能力1.0和2.1的能力,因此nvcc会多次编译代码并构建一个多体系结构的胖对象。正是在计算功能1.0编译周期内生成错误,因为 printf
调用不受该体系结构的支持。
如果从项目中删除了计算能力1.0的构建目标,该错误将消失。
您也可以这样编写内核:
__ global__ void test()
{
#if __CUDA_ARCH__> = 200
printf( Hi Cuda World) ;
#endif
}
__ CUDA_ARCH __
符号仅在为计算能力2.0或更高目标而构建时为> = 200,这将使您可以为计算能力1.x设备编译此代码而不会遇到语法错误。
在为正确的体系结构进行编译且没有输出时,您还需要确保内核完成并且驱动程序刷新了输出缓冲区。为此,在内核启动后在主机代码中添加一个同步调用
,例如:
int main(int argc,char ** argv)
{
test(< 1,1>)();
cudaDeviceSynchronize();
返回0;
}
[免责声明:所有用浏览器编写的代码,未经编译,使用风险自负]
如果同时做这两种事情,则应该能够编译,运行并查看输出。
I didn't think I was a complete newbie with Cuda, but apparently I am.
I recently upgraded my cuda device to one capable capability 1.3 to 2.1 (Geforce GT 630). I thought to do a full upgrade to Cuda toolkit 5.0 as well.
I can compile general cuda kernels, but printf is not working even with -arch=sm_20 set.
Code:
#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>
__global__ void test(){
printf("Hi Cuda World");
}
int main( int argc, char** argv )
{
test<<<1,1>>>();
return 0;
}
Compiler:
Error 2 error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_20,compute_10\" --use-local-env --cl-version 2010 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -arch=sm_20 -g -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o "Debug\main.cu.obj" "d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu"" exited with code 2. C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 5.0.targets 592 10 testCuda
Error 1 error : calling a __host__ function("printf") from a __global__ function("test") is not allowed d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu 9 1 testCuda
I'm about done with life because of this problem...done done done. Please talk me down from the rooftops with an answer.
In kernel printf is only supported in compute capability 2 or higher hardware. Because your project is set to build for both compute capability 1.0 and compute 2.1, nvcc compiles the code multiple times and builds a multi-architecture fatbinary object. It is during the compute capability 1.0 compilation cycle that the error is being generated, because the printf
call is unsupported for that architecture.
If you remove the compute capability 1.0 build target from your project, the error will disappear.
You could alternatively, write the kernel like this:
__global__ void test()
{
#if __CUDA_ARCH__ >= 200
printf("Hi Cuda World");
#endif
}
The __CUDA_ARCH__
symbol will only be >= 200 when building for compute capability 2.0 or high targets and this would allow you to compile this code for compute capability 1.x devices without encountering a syntax error.
When compiling for the correct architecture and getting no output, you also need to ensure that the kernel finishes and the driver flushes the output buffer. To do this add a synchronizing call after the kernel launch in the host code
for example:
int main( int argc, char** argv )
{
test<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
[disclaimer: all code written in browser, never compiled, use at own risk]
If you do both things, you should be able to compile, run and see output.
这篇关于Cuda Hello World printf即使在-arch = sm_20下也无法使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!