CUDA:为什么compute_35设备上的compute_20代码失败? [英] CUDA: Why does compute_20 code fail on compute_35 device?

查看:223
本文介绍了CUDA:为什么compute_35设备上的compute_20代码失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于装有Titan GPU( compute_35,sm_35 )的计算机,我在 CMakeLists.txt

For a computer with Titan GPU (compute_35,sm_35), I compiled some code using this line in CMakeLists.txt:

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode arch=compute_35,code=sm_35)

代码可以编译并且可以正常运行。

The code compiles and also runs fine.

我想检查此代码会对使用GTS 450( compute_20,sm_21 )的朋友造成什么编译问题。因此,我将上面的行更改为:

I wanted to check what compilation problems this code would cause for a friend who uses a GTS 450 (compute_20,sm_21). So, I changed the above line to:

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode arch=compute_20,code=sm_21)

使用Titan可以在我的计算机上正确编译代码。但是,当我再次运行它时(在我的Titan计算机上),它在 thrust :: copy 调用后失败,并显示以下错误:

The code compiles without any errors on my computer with Titan. But when I run it (again on my Titan computer), its fails after a thrust::copy call with the following error:

$ ./foobar
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  invalid device function 
"foobar" terminated by signal SIGABRT (Abort)

终止Google说上述错误是由于到GPU体系结构不匹配。

Google says the above error is caused due to GPU architecture mismatch.

最奇怪的部分是上述代码( arch = compute_20,code = sm_21 ),代码可以在我朋友的计算机上使用GTS 450编译并正常运行!除了GPU外,她的Ubuntu 12.04,gcc和CUDA SDK 5.5版本与我的相同。

The strangest part is that with the above line (arch=compute_20,code=sm_21), the code compiles and runs without error on my friend's computer with GTS 450! Except for the GPU, her Ubuntu 12.04, gcc and CUDA SDK 5.5 versions are the same as mine.

这是否是导致此错误的真正原因? Titan为什么不能运行 compute_20 代码? CUDA GPU是否应该向后兼容PTX或SASS代码?即使不是,驱动程序JIT为什么不能将 compute_20 PTX编译为 sm_35 的SASS? p>

Is this the real cause of this error? Why cannot Titan run compute_20 code? Isn't a CUDA GPU supposed to be backwards compatible with PTX or SASS code? Even if it isn't, why cannot the driver JIT compile the compute_20 PTX to the SASS of sm_35?

推荐答案

如果指定:

-gencode arch=compute_20,code=compute_20

您的代码应(通过JIT)在任一GPU上运行。

your code should run (via JIT) on either GPU.

根据 nvcc手册,当您为 code 开关指定虚拟体系结构时,将直接启用JIT。您可以在一个命令中进行多个指定:

According to the nvcc manual, JIT is directly enabled when you specify a virtual architecture for the code switch. You can make multiple specifications in a single command:

-arch=compute_20 -code=compute20,sm_21,sm_35

(请注意,这代替了指定 -gencode ...

(note this is in lieu of specifying -gencode ...)

这将允许通过sm_20 PTX进行JIT,并直接在cc2.1或cc3.5设备上执行非JIT。

which would allow JIT from sm_20 PTX, and non-JIT execution directly on cc2.1 or cc3.5 devices.

这篇关于CUDA:为什么compute_35设备上的compute_20代码失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆