CUDA:如何使用 -arch 和 -code 以及 SM 与 COMPUTE [英] CUDA: How to use -arch and -code and SM vs COMPUTE
问题描述
我仍然不确定在使用 nvcc 构建时如何正确指定代码生成的体系结构.我知道我的二进制文件中嵌入了机器代码和 PTX 代码,这可以通过控制器开关 -code
和 -arch
(或组合两者都使用 -gencode
).
I am still not sure how to properly specify the architectures for code generation when building with nvcc. I am aware that there is machine code as well as PTX code embedded in my binary and that this can be controlled via the controller switches -code
and -arch
(or a combination of both using -gencode
).
现在,根据 this 除了这两个编译器标志也有两种指定架构的方法:sm_XX
和 compute_XX
,其中 compute_XX
指的是虚拟和 sm_XX
到一个真正的架构.-arch
标志只接受虚拟架构的标识符(例如 compute_XX
),而 -code
标志同时接受真实和虚拟的标识符架构.
Now, according to this apart from the two compiler flags there are also two ways of specifying architectures: sm_XX
and compute_XX
, where compute_XX
refers to a virtual and sm_XX
to a real architecture. The flag -arch
only takes identifiers for virtual architectures (such as compute_XX
) whereas the -code
flag takes both, identifiers for real and for virtual architectures.
文档指出 -arch
指定了编译输入文件的虚拟架构.但是,这个 PTX 代码不会自动编译成机器码,而是一个预处理步骤".
The documentation states that -arch
specifies the virtual architectures for which the input files are compiled. However, this PTX code is not automatically compiled to machine code, but this is rather a "preprocessing step".
现在,-code
应该指定 PTX 代码针对哪些架构进行组装和优化.
Now, -code
is supposed to specify which architectures the PTX code is assembled and optimised for.
但是,不清楚将在二进制文件中嵌入哪个 PTX 或二进制代码.如果我指定例如 -arch=compute_30 -code=sm_52
,这是否意味着我的代码将首先被编译为功能级别 3.0 PTX,之后将创建功能级别 5.2 的机器代码?将嵌入什么?
However, it is not clear which PTX or binary code will be embedded in the binary. If I specify for example -arch=compute_30 -code=sm_52
, does that mean my code will first be compiled to feature level 3.0 PTX from which afterwards machine code for feature level 5.2 will be created? And what will be embedded?
如果我只指定 -code=sm_52
那么会发生什么?只有 V5.2 的机器代码会嵌入由 V5.2 PTX 代码创建的机器代码吗?和 -code=compute_52
有什么区别?
If I just specify -code=sm_52
what will happen then? Only machine code for V5.2 will be embedded that has been created out of V5.2 PTX code? And what would be the difference to -code=compute_52
?
推荐答案
Some related questions/answers are here and here.
我仍然不确定在使用 nvcc 构建时如何正确指定代码生成的架构.
I am still not sure how to properly specify the architectures for code generation when building with nvcc.
完整的描述有些复杂,但旨在提供相对简单、易于记忆的规范用法.为代表您希望定位的 GPU 的架构(虚拟和真实)进行编译.一个相当简单的形式是:
A complete description is somewhat complicated, but there are intended to be relatively simple, easy-to-remember canonical usages. Compile for the architecture (both virtual and real), that represents the GPUs you wish to target. A fairly simple form is:
-gencode arch=compute_XX,code=sm_XX
其中 XX 是您希望定位的 GPU 的两位数计算能力.如果您希望针对多个 GPU,只需为每个 XX 目标重复整个序列即可.这与 CUDA 示例代码项目所采用的方法大致相同.(如果您想在您的可执行文件中包含 PTX,请包含一个附加的 -gencode
和 code
选项,指定与 arch相同的 PTX 虚拟架构代码> 选项).
where XX is the two digit compute capability for the GPU you wish to target. If you wish to target multiple GPUs, simply repeat the entire sequence for each XX target. This is approximately the approach taken with the CUDA sample code projects. (If you'd like to include PTX in your executable, include an additional -gencode
with the code
option specifying the same PTX virtual architecture as the arch
option).
当仅针对单个 GPU 时,另一种相当简单的形式就是使用:
Another fairly simple form, when targetting only a single GPU, is just to use:
-arch=sm_XX
与 XX 的描述相同.此表单将包含指定架构的 SASS 和 PTX.
with the same description for XX. This form will include both SASS and PTX for the specified architecture.
现在,根据这一点,除了两个编译器标志外,还有两种指定架构的方法:sm_XX 和 compute_XX,其中 compute_XX 指的是虚拟架构,sm_XX 指的是真实架构.-arch 标志只接受虚拟架构的标识符(例如 compute_XX),而 -code 标志同时接受真实架构和虚拟架构的标识符.
Now, according to this apart from the two compiler flags there are also two ways of specifying architectures: sm_XX and compute_XX, where compute_XX refers to a virtual and sm_XX to a real architecture. The flag -arch only takes identifiers for virtual architectures (such as compute_XX) whereas the -code flag takes both, identifiers for real and for virtual architectures.
当 arch
和 code
被用作 -gencode
开关中的子开关时,或者如果 两者都使用,这基本上是正确的一起使用,如您所描述的那样独立使用.但是,例如,当 -arch
单独使用时(没有 -code
),它代表另一种速记"表示法,在这种情况下,您可以通过真实架构,例如 -arch=sm_52
That is basically correct when arch
and code
are used as sub-switches within the -gencode
switch, or if both are used together, standalone as you describe. But, for example, when -arch
is used by itself (without -code
), it represents another kind of "shorthand" notation, and in that case, you can pass a real architecture, for example -arch=sm_52
但是,不清楚将在二进制文件中嵌入哪个 PTX 或二进制代码.如果我指定例如 -arch=compute_30 -code=sm_52,这是否意味着我的代码将首先编译为功能级别 3.0 PTX,然后将从中创建功能级别 5.2 的机器代码?将嵌入什么?
However, it is not clear which PTX or binary code will be embedded in the binary. If I specify for example -arch=compute_30 -code=sm_52, does that mean my code will first be compiled to feature level 3.0 PTX from which afterwards machine code for feature level 5.2 will be created from? And what will be embedded?
嵌入内容的确切定义因使用形式而异.但是对于这个例子:
The exact definition of what gets embedded varies depending on the form of the usage. But for this example:
-gencode arch=compute_30,code=sm_52
或对于您确定的等效情况:
or for the equivalent case you identify:
-arch=compute_30 -code=sm_52
那么是的,这意味着:
- 将从您的源代码生成一个临时 PTX 代码,它将使用 cc3.0 PTX.
- 从该 PTX,
ptxas
工具将生成符合 cc5.2 的 SASS 代码. - SASS 代码将嵌入到您的可执行文件中.
- PTX 代码将被丢弃.
- A temporary PTX code will be generated from your source code, and it will use cc3.0 PTX.
- From that PTX, the
ptxas
tool will generate cc5.2-compliant SASS code. - The SASS code will be embedded in your executable.
- The PTX code will be discarded.
(我不确定你为什么要指定这样的组合,但它是合法的.)
(I'm not sure why you would actually specify such a combo, but it is legal.)
如果我只指定 -code=sm_52 会发生什么?只有 V5.2 的机器代码会嵌入由 V5.2 PTX 代码创建的机器代码吗?和 -code=compute_52 有什么区别?
If I just specify -code=sm_52 what will happen then? Only machine code for V5.2 will be embedded that has been created out of V5.2 PTX code? And what would be the difference to -code=compute_52?
-code=sm_52
将从中间 PTX 代码生成 cc5.2 SASS 代码.SASS 代码将被嵌入,PTX 将被丢弃.请注意,以这种形式单独指定此选项,没有 -arch
选项,将是非法的.(1)
-code=sm_52
will generate cc5.2 SASS code out of an intermediate PTX code. The SASS code will be embedded, the PTX will be discarded. Note that specifying this option by itself in this form, with no -arch
option, would be illegal. (1)
-code=compute_52
将生成 cc5.x PTX 代码(仅)并将该 PTX 嵌入到可执行文件/二进制文件中.请注意,以这种形式单独指定此选项,没有 -arch
选项,将是非法的.(1)
-code=compute_52
will generate cc5.x PTX code (only) and embed that PTX in the executable/binary. Note that specifying this option by itself in this form, with no -arch
option, would be illegal. (1)
cuobjdump
工具 可用于识别给定二进制文件中的确切组件.
The cuobjdump
tool can be used to identify what components exactly are in a given binary.
(1) 当没有使用-gencode
开关,也没有使用-arch
开关时,nvcc
假定一个默认的-arch=sm_20
附加到您的编译命令(这是针对 CUDA 7.5,默认 -arch
设置可能因 CUDA 版本而异).sm_20
是 real 架构,在 -arch
选项上指定 real 架构是不合法的还提供了一个 -code
选项.
(1) When no -gencode
switch is used, and no -arch
switch is used, nvcc
assumes a default -arch=sm_20
is appended to your compile command (this is for CUDA 7.5, the default -arch
setting may vary by CUDA version). sm_20
is a real architecture, and it is not legal to specify a real architecture on the -arch
option when a -code
option is also supplied.
这篇关于CUDA:如何使用 -arch 和 -code 以及 SM 与 COMPUTE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!