'code=sm_X' 是否仅嵌入二进制(立方)代码,或 PTX 代码,或两者兼而有之? [英] Does 'code=sm_X' embed only binary (cubin) code, or also PTX code, or both?

查看:18
本文介绍了'code=sm_X' 是否仅嵌入二进制(立方)代码,或 PTX 代码,或两者兼而有之?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对-gencode"语句中的code=sm_X"选项有点困惑.

I am little bit confused about the 'code=sm_X' option within the '-gencode' statement.

一个例子:NVCC 编译器选项有什么作用

An example: What does the NVCC compiler option

-gencode arch=compute_13,code=sm_13

嵌入到库中?

只有 CC 1.3 的 GPU 的机器代码(cubin 代码),或者 CC 1.3 的 GPU 的 PTX 代码?

Only the machine code (cubin code) for GPUs with CC 1.3, or also the PTX code for GPUs with CC 1.3 ?

在Maxwell 兼容性指南"中,声明只有 'code=' 子句指定的后端目标版本将保留在生成的二进制文件中".

In the 'Maxwell compatibility guide', it is stated "Only the back-end target versions(s) specified by the 'code=' clause will be retained in the resulting binary".

据此,我推断给定的编译器选项只为具有 CC 1.3 和 no PTX 代码的 GPU 嵌入机器代码.这意味着它不可能运行这个库,例如在 Maxwell 代卡上,因为库中没有嵌入 PTX 代码,机器代码可以从中即时"(JIT)编译.

From that, I would infer that the given compiler option only embeds machine code for GPUs with CC 1.3 and no PTX code. This would mean that it would not be possible to run this library e.g. on aa Maxwell generation card, as there is no PTX code embeded within the library from which the machine code could be 'just-in-time' (JIT) compiled.

另一方面,在 NVIDIA 的 GTC 2013 演示文稿将 CUDA 工具包作为应用程序构建工具介绍"中,声明-gencode arch=compute_13,code=sm_13"对于所有带有 CC 的 GPU 来说已经足够了>= 1.3,并且对于 CC > 1.3 的 GPU 使用此编译器选项,机器代码是来自 PTX 代码的 JIT-ed.因此,我认为 Maxwell 兼容性指南和此 GTC 演示文稿中提供的信息相互矛盾.

On the other side, on the GTC 2013 presentation 'Introduction to the CUDA Toolkit as an Application Build Tool' by NVIDIA it is stated that the '-gencode arch=compute_13,code=sm_13' is enough for all GPUs with CC >= 1.3, and that with this compiler option for GPUs with CC > 1.3 the machine code is JIT-ed from the PTX code. So, the information given in the Maxwell compatibility guide and this GTC presentation is conflicting in my opinion.

推荐答案

nvcc 有多种格式可以指定代码生成选项.阅读 nvcc 手册第 6 节 可能具有指导意义.

nvcc has many formats by which the code generation options can be specified. A read of section 6 of the nvcc manual may be instructive.

当使用这种格式时:

nvcc -gencode arch=compute_13,code=sm_13 ...

将保留 sm_13 (cc 1.3) 设备的 SASS 代码.可执行对象中不会保留任何 PTX,因此代码只能在能够运行 cc1.3 SASS 的设备上运行.

only the SASS code for a sm_13 (cc 1.3) device will be retained. There will be no PTX retained in the executable object, and so the code can only run on a device capable of running cc1.3 SASS.

使用上述命令格式,为了将 PTX 版本的源代码嵌入到可执行对象中,需要为 code= 提供的选项使用虚拟架构规范....由于这种特殊格式(使用 -gencode)不允许在单个开关中指定多个目标,因此我们必须将 -gencode 开关多次传递给 nvcc,每次一个我们希望嵌入到可执行对象中的目标.

Using the above command format, in order to embed a PTX version of the source code into the executable object, it's necessary to use a virtual architecture specification for the option provided to code=.... Since this particular format (using -gencode) does not allow specification of multiple targets in a single switch, we must pass the -gencode switch multiple times to nvcc, one for each target we desire to be embedded in the executable object.

所以扩展上面的例子,我们可以使用以下内容:

So extending the above example, we could use the following:

nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_13,code=compute_13 ...

这将在可执行文件中嵌入 cc1.3 SASS(通过第一个 gencode 开关)和 cc1.3 PTX(通过第二个 gencode 开关).能够直接运行 cc1.3 SASS 代码的设备将使用它.其他设备(计算能力大于 cc 1.3)将由驱动程序执行 JIT 编译步骤,以将 cc1.3 PTX 代码转换为具有适合相关设备架构的 SASS 代码.

This would embed both cc1.3 SASS (by the first gencode switch) and cc1.3 PTX (by the second gencode switch) in the executable. Devices capable of running cc1.3 SASS code directly will use that. Other devices (of compute capability greater than cc 1.3) will do a JIT-compile step by the driver, to convert the cc1.3 PTX code to a SASS code with an architecture suitable for the device in question.

我同意 GTC 2013演示文稿(例如幻灯片 37)似乎表明

I agree that the GTC 2013 presentation (e.g. slide 37) seems to suggest that

nvcc -gencode arch=compute_13,code=sm_13 ...

对于计算能力为 1.3 或更高的所有设备来说已经足够了.事实并非如此,这很容易证明.如果您使用上述格式编译代码,并尝试在 cc 2.0 设备上运行它,它将失败并出现与您代码中的任何一个或多个内核相关的无效设备功能"错误.

is sufficient for all devices of compute capability 1.3 or higher. It is not, and this is easy to demonstrate. If you compile a code using the above format, and attempt to run it on a cc 2.0 device, it will fail with an "invalid device function" error associated with any kernel or kernels you have in your code.

同样,nvcc 有多种命令格式和用于指定代码生成的快捷方式".一些比较简单的,比如:

Again, nvcc has a variety of command formats and "shortcuts" for specifying code generation. Some relatively simple ones, such as:

nvcc -arch=sm_13 ...

将在可执行对象中嵌入 PTX 和 SASS 版本的代码,从而产生建议的前向兼容性.

will embed both a PTX and SASS version of the code in the executable object, resulting in the kind of forward-compatibility suggested.

这篇关于'code=sm_X' 是否仅嵌入二进制(立方)代码,或 PTX 代码,或两者兼而有之?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆