CUDA:如何使用-arch和-code和SM vs COMPUTE [英] CUDA: How to use -arch and -code and SM vs COMPUTE
问题描述
我仍然不知道如何正确地指定架构代码生成时使用nvcc构建。我知道有机器代码和PTX代码嵌入在我的二进制,这可以通过控制器开关 -code
和 - arch
(或两者的组合使用 -gencode
)。
到这个除了两个编译器标志外,还有两个指定架构的方法: sm_XX
和 compute_XX
,其中 compute_XX
指向真实体系结构的虚拟和 sm_XX
。标志 -arch
只使用虚拟架构的标识符(例如 compute_XX
),而
该文档声明 -arch
指定编译输入文件的虚拟体系结构。但是,这个PTX代码不会自动编译为机器代码,而是一个预处理步骤。
现在, code>应该指定哪些架构的PTX代码是组装和优化的。
但是,不清楚哪个PTX或二进制代码将被嵌入二进制。如果我指定例如 -arch = compute_30 -code = sm_52
,这是否意味着我的代码将首先编译为功能级别3.0 PTX,然后从功能级别5.2的机器代码将创建?
如果我只是指定 -code = sm_52
会发生什么?只有嵌入V5.2 PTX代码创建的V5.2的机器代码? -code = compute_52
?
我还是不知道如何正确指定使用nvcc构建代码生成的体系结构。
完整的描述有点复杂,但目的是相对简单,记忆规范用法。编译架构(虚拟和实际),代表您要定位的GPU。一个相当简单的形式是:
-gencode arch = compute_XX,code = sm_XX
其中XX是您要定位的GPU的两位数计算能力。如果您希望定位多个GPU,只需为每个XX目标重复整个序列。这大约是使用CUDA示例代码项目所采用的方法。 (如果您想在可执行档中加入PTX,请在代码
选项中加入额外的 -gencode
相同的PTX虚拟结构作为 arch
选项)。
另一个相当简单的形式,只是使用:
-arch = sm_XX
与XX相同的描述。这种形式将包括指定架构的SASS和PTX。
现在,除了两个编译器标志外,还有两个指定体系结构的方法:sm_XX和compute_XX,其中compute_XX表示虚拟,sm_XX表示真实体系结构。标志-arch只使用虚拟体系结构的标识符(例如compute_XX),而-code标志用于实体和虚拟体系结构的标识符。
当 arch
和代码
用作子开关时,基本上是正确的 -gencode
切换,或者如果二者一起使用,则如您所述独立。但是,例如,当 -arch
本身使用(不含 -code
)时,它代表另一种简写符号,在这种情况下,你可以传递一个真实的架构,例如 -arch = sm_52
但是,不清楚哪个PTX或二进制代码将嵌入二进制。如果我指定例如-arch = compute_30 -code = sm_52,这是否意味着我的代码将首先编译为功能级别3.0 PTX,然后将从中创建功能级别5.2的机器代码?
嵌入内容的确切定义取决于使用的形式。但对于此示例:
-gencode arch = compute_30,code = sm_52
/ pre>
或您指定的等效情况:
arch = compute_30 -code = sm_52
那么是,表示:
- 从您的源代码生成一个临时PTX代码,它将使用cc3.0 PTX。
- 从该PTX ,
ptxas
工具将生成符合cc5.2的SASS代码。
- SASS代码将嵌入到您的可执行文件中。 li>
- PTX代码将被丢弃。
(我不知道为什么会实际指定这样的组合,但它是合法的。)
如果我只是指定-code = sm_52会发生什么?只有嵌入V5.2 PTX代码创建的V5.2的机器代码?和-code = compute_52有什么区别?
-code = sm_52
将从中间PTX代码生成cc5.2 SASS代码。 SASS码将被嵌入,PTX将被丢弃。请注意,在此形式中指定此选项(不带-arch
选项)将是非法的。 (1)
-code = compute_52
将生成cc5.x PTX代码可执行文件/二进制文件。请注意,在此形式中指定此选项(不带-arch
选项)将是非法的。 (1)
cuobjdump
工具可用于标识给定二进制文件中的哪些组件。
(1)当未使用
-gencode
开关,并且未使用-arch
开关时,nvcc
假设一个默认值-arch = sm_20
附加到您的编译命令(这是为CUDA 7.5,默认-arch
设置可能因CUDA版本而异)。sm_20
是一种真实的架构,在真实体系结构是不合法的> -arch
选项。I am still not sure how to properly specify the architectures for code generation when building with nvcc. I am aware that there is machine code as well as PTX code embedded in my binary and that this can be controlled via the controller switches
-code
and-arch
(or a combination of both using-gencode
).Now, according to this apart from the two compiler flags there are also two ways of specifying architectures:
sm_XX
andcompute_XX
, wherecompute_XX
refers to a virtual andsm_XX
to a real architecture. The flag-arch
only takes identifiers for virtual architectures (such ascompute_XX
) whereas the-code
flag takes both, identifiers for real and for virtual architectures.The documentation states that
-arch
specifies the virtual architectures for which the input files are compiled. However, this PTX code is not automatically compiled to machine code, but this is rather a "preprocessing step".Now,
-code
is supposed to specify which architectures the PTX code is assembled and optimised for.However, it is not clear which PTX or binary code will be embedded in the binary. If I specify for example
-arch=compute_30 -code=sm_52
, does that mean my code will first be compiled to feature level 3.0 PTX from which afterwards machine code for feature level 5.2 will be created? And what will be embedded?If I just specify
-code=sm_52
what will happen then? Only machine code for V5.2 will be embedded that has been created out of V5.2 PTX code? And what would be the difference to-code=compute_52
?解决方案Some related questions/answers are here and here.
I am still not sure how to properly specify the architectures for code generation when building with nvcc.
A complete description is somewhat complicated, but there are intended to be relatively simple, easy-to-remember canonical usages. Compile for the architecture (both virtual and real), that represents the GPUs you wish to target. A fairly simple form is:
-gencode arch=compute_XX,code=sm_XX
where XX is the two digit compute capability for the GPU you wish to target. If you wish to target multiple GPUs, simply repeat the entire sequence for each XX target. This is approximately the approach taken with the CUDA sample code projects. (If you'd like to include PTX in your executable, include an additional
-gencode
with thecode
option specifying the same PTX virtual architecture as thearch
option).Another fairly simple form, when targetting only a single GPU, is just to use:
-arch=sm_XX
with the same description for XX. This form will include both SASS and PTX for the specified architecture.
Now, according to this apart from the two compiler flags there are also two ways of specifying architectures: sm_XX and compute_XX, where compute_XX refers to a virtual and sm_XX to a real architecture. The flag -arch only takes identifiers for virtual architectures (such as compute_XX) whereas the -code flag takes both, identifiers for real and for virtual architectures.
That is basically correct when
arch
andcode
are used as sub-switches within the-gencode
switch, or if both are used together, standalone as you describe. But, for example, when-arch
is used by itself (without-code
), it represents another kind of "shorthand" notation, and in that case, you can pass a real architecture, for example-arch=sm_52
However, it is not clear which PTX or binary code will be embedded in the binary. If I specify for example -arch=compute_30 -code=sm_52, does that mean my code will first be compiled to feature level 3.0 PTX from which afterwards machine code for feature level 5.2 will be created from? And what will be embedded?
The exact definition of what gets embedded varies depending on the form of the usage. But for this example:
-gencode arch=compute_30,code=sm_52
or for the equivalent case you identify:
-arch=compute_30 -code=sm_52
then yes, it means that:
- A temporary PTX code will be generated from your source code, and it will use cc3.0 PTX.
- From that PTX, the
ptxas
tool will generate cc5.2-compliant SASS code.- The SASS code will be embedded in your executable.
- The PTX code will be discarded.
(I'm not sure why you would actually specify such a combo, but it is legal.)
If I just specify -code=sm_52 what will happen then? Only machine code for V5.2 will be embedded that has been created out of V5.2 PTX code? And what would be the difference to -code=compute_52?
-code=sm_52
will generate cc5.2 SASS code out of an intermediate PTX code. The SASS code will be embedded, the PTX will be discarded. Note that specifying this option by itself in this form, with no-arch
option, would be illegal. (1)
-code=compute_52
will generate cc5.x PTX code (only) and embed that PTX in the executable/binary. Note that specifying this option by itself in this form, with no-arch
option, would be illegal. (1)The
cuobjdump
tool can be used to identify what components exactly are in a given binary.(1) When no
-gencode
switch is used, and no-arch
switch is used,nvcc
assumes a default-arch=sm_20
is appended to your compile command (this is for CUDA 7.5, the default-arch
setting may vary by CUDA version).sm_20
is a real architecture, and it is not legal to specify a real architecture on the-arch
option when a-code
option is also supplied.这篇关于CUDA:如何使用-arch和-code和SM vs COMPUTE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!