'code = sm_X'只嵌入二进制(cubin)代码,还是PTX代码,还是两者? [英] Does 'code=sm_X' embed only binary (cubin) code, or also PTX code, or both?

查看:214
本文介绍了'code = sm_X'只嵌入二进制(cubin)代码,还是PTX代码,还是两者?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对'-gencode'语句中的'code = sm_X'选项有点困惑。



一个例子:NVCC编译器选项



  -gencode arch = compute_13,code = sm_13 

$



只有用于带有CC 1.3的GPU的机器码(cubin代码),或具有CC 1.3的GPU的PTX代码?



在Maxwell兼容性指南中,声明仅指定后端目标版本通过'code ='子句将保留在生成的二进制。



因此,我推断给定的编译器选项只能嵌入具有CC 1.3和 PTX代码的GPU的机器码。这意味着可以运行此库。对于Maxwell一代卡,因为没有嵌入在库中的PTX代码,机器代码可以从其中准时(JIT)编译。



另一方面,在NVIDIA的GTC 2013演示文稿CUDA工具包作为应用程序构建工具的介绍中,声明-gencode arch = compute_13,code = sm_13对于所有具有CC> = 1.3的GPU都是足够的,并且对于具有CC> 1.3的GPU的该编译器选项,机器代码从PTX代码JIT编辑。因此,我在Maxwell兼容性指南和此GTC演示中提供的信息在我看来是冲突的。

解决方案

nvcc 有许多格式可以指定代码生成选项。阅读 nvcc手册第6部分

  

nvcc -gencode arch = compute_13,code = sm_13 ...

将保留sm_13(cc 1.3)设备的SASS代码。在可执行对象中不会保留PTX,因此代码只能在能够运行cc1.3 SASS的设备上运行



使用上述命令格式,为了将源代码的PTX版本嵌入可执行对象,必须使用虚拟体系结构规范来提供给代码= ... 。由于这种特殊格式(使用 -gencode )不允许在单个开关中指定多个目标,我们必须通过 -gencode 多次切换到nvcc,每个目标我们希望嵌入到可执行对象中。



因此扩展上面的例子,我们可以使用下面的代码:

  nvcc -gencode arch = compute_13,code = sm_13 -gencode arch = compute_13,code = compute_13 ... 

这将嵌入cc1.3 SASS(由第一个 gencode switch)和cc1.3 PTX(通过第二个 gencode 开关)。能够直接运行cc1.3 SASS代码的设备将使用它。其他设备(计算能力大于cc 1.3)将由驱动程序执行JIT编译步骤,以将cc1.3 PTX代码转换为具有适用于相关设备的架构的SASS代码。



我同意 GTC 2013演示文稿(例如幻灯片37)似乎表明

  nvcc -gencode arch = compute_13,code = sm_13 ... 

足以满足所有计算能力的设备1.3以上。这不是,这很容易展示。如果您使用上述格式编译代码,并尝试在cc 2.0设备上运行它,它将失败,并与您的代码中的任何内核或内核相关联的无效的设备功能错误。



再次, nvcc 有多种命令格式和用于指定代码生成的快捷方式。一些相对简单的例如:

  nvcc -arch = sm_13 ... 

会在可执行对象中嵌入一个PTX和SASS版本的代码,从而产生向前兼容性建议。


I am little bit confused about the 'code=sm_X' option within the '-gencode' statement.

An example: What does the NVCC compiler option

-gencode arch=compute_13,code=sm_13

embed in the library ?

Only the machine code (cubin code) for GPUs with CC 1.3, or also the PTX code for GPUs with CC 1.3 ?

In the 'Maxwell compatibility guide', it is stated "Only the back-end target versions(s) specified by the 'code=' clause will be retained in the resulting binary".

From that, I would infer that the given compiler option only embeds machine code for GPUs with CC 1.3 and no PTX code. This would mean that it would not be possible to run this library e.g. on aa Maxwell generation card, as there is no PTX code embeded within the library from which the machine code could be 'just-in-time' (JIT) compiled.

On the other side, on the GTC 2013 presentation 'Introduction to the CUDA Toolkit as an Application Build Tool' by NVIDIA it is stated that the '-gencode arch=compute_13,code=sm_13' is enough for all GPUs with CC >= 1.3, and that with this compiler option for GPUs with CC > 1.3 the machine code is JIT-ed from the PTX code. So, the information given in the Maxwell compatibility guide and this GTC presentation is conflicting in my opinion.

解决方案

nvcc has many formats by which the code generation options can be specified. A read of section 6 of the nvcc manual may be instructive.

when using this format:

nvcc -gencode arch=compute_13,code=sm_13 ...

only the SASS code for a sm_13 (cc 1.3) device will be retained. There will be no PTX retained in the executable object, and so the code can only run on a device capable of running cc1.3 SASS.

Using the above command format, in order to embed a PTX version of the source code into the executable object, it's necessary to use a virtual architecture specification for the option provided to code=.... Since this particular format (using -gencode) does not allow specification of multiple targets in a single switch, we must pass the -gencode switch multiple times to nvcc, one for each target we desire to be embedded in the executable object.

So extending the above example, we could use the following:

nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_13,code=compute_13 ...

This would embed both cc1.3 SASS (by the first gencode switch) and cc1.3 PTX (by the second gencode switch) in the executable. Devices capable of running cc1.3 SASS code directly will use that. Other devices (of compute capability greater than cc 1.3) will do a JIT-compile step by the driver, to convert the cc1.3 PTX code to a SASS code with an architecture suitable for the device in question.

I agree that the GTC 2013 presentation (e.g. slide 37) seems to suggest that

nvcc -gencode arch=compute_13,code=sm_13 ...

is sufficient for all devices of compute capability 1.3 or higher. It is not, and this is easy to demonstrate. If you compile a code using the above format, and attempt to run it on a cc 2.0 device, it will fail with an "invalid device function" error associated with any kernel or kernels you have in your code.

Again, nvcc has a variety of command formats and "shortcuts" for specifying code generation. Some relatively simple ones, such as:

nvcc -arch=sm_13 ...

will embed both a PTX and SASS version of the code in the executable object, resulting in the kind of forward-compatibility suggested.

这篇关于'code = sm_X'只嵌入二进制(cubin)代码,还是PTX代码,还是两者?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆