CUDA:如何检查正确的计算能力? [英] CUDA: How to check for the right compute capability?
问题描述
使用更高计算能力编译的CUDA代码在具有较低计算能力的设备上长时间执行,在某些内核中默认失败一天。我花了半天时间追逐一个难以捉摸的bug,只是意识到构建规则有 sm_21
,而设备(Tesla C2050)是 2.0
。
CUDA code compiled with a higher compute capability will execute perfectly for a long time on a device with lower compute capability, before silently failing one day in some kernel. I spent half a day chasing an elusive bug only to realize that the Build Rule had sm_21
while the device (Tesla C2050) was a 2.0
.
有没有可以添加的CUDA API代码,可以自行检查它是否在兼容计算能力的设备上运行?我需要编译和使用许多计算能力的设备。
Is there any CUDA API code I can add which can self-check if it is running on a device with compatible compute capability? I need to compile and work with devices of many compute capabilities. Is there any other action I can take to ensure such errors do not occur?
推荐答案
在运行时API中, cudaGetDeviceProperties 返回两个字段 major
和 minor
,返回任何给定的枚举CUDA设备的计算能力。您可以使用它来解析任何GPU的计算能力,然后在其上建立上下文,以确保它是您的代码所做的正确的架构。 nvcc
可以使用 -gencode
选项从单次调用生成包含多个体系结构的对象文件,例如:
In the runtime API, cudaGetDeviceProperties returns two fields major
and minor
which return the compute capability any given enumerated CUDA device. You can use that to parse the compute capability of any GPU before establishing a context on it to make sure it is the right architecture for what your code does. nvcc
can generate a object file containing multiple architectures from a single invocation using the -gencode
option, for example:
nvcc -c -gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_13,code=sm_13
source.cu
会生成一个输出对象文件,其中包含包含cubin文件的嵌入式fatbinary对象GT200和GF100卡。运行时API将自动处理体系结构检测,并尝试从fatbinary对象加载适当的设备代码,而不需要任何额外的主机代码。
would produce an output object file with an embedded fatbinary object containing cubin files for GT200 and GF100 cards. The runtime API will automagically handle architecture detection and try loading suitable device code from the fatbinary object without any extra host code.
这篇关于CUDA:如何检查正确的计算能力?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!