CUDA:如何使用-arch和-code和SM vs COMPUTE [英] CUDA: How to use -arch and -code and SM vs COMPUTE

查看:6090
本文介绍了CUDA:如何使用-arch和-code和SM vs COMPUTE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍然不知道如何正确地指定架构代码生成时使用nvcc构建。我知道有机器代码和PTX代码嵌入在我的二进制,这可以通过控制器开关 -code - arch (或两者的组合使用 -gencode )。



这个除了两个编译器标志外,还有两个指定架构的方法: sm_XX compute_XX ,其中 compute_XX 指向真实体系结构的虚拟和 sm_XX 。标志 -arch 只使用虚拟架构的标识符(例如 compute_XX ),而



该文档声明 -arch 指定编译输入文件的虚拟体系结构。但是,这个PTX代码不会自动编译为机器代码,而是一个预处理步骤。



现在, code>应该指定哪些架构的PTX代码是组装和优化的。



但是,不清楚哪个PTX或二进制代码将被嵌入二进制。如果我指定例如 -arch = compute_30 -code = sm_52 ,这是否意味着我的代码将首先编译为功能级别3.0 PTX,然后从功能级别5.2的机器代码将创建?



如果我只是指定 -code = sm_52 会发生什么?只有嵌入V5.2 PTX代码创建的V5.2的机器代码? -code = compute_52

解决方案会有什么区别

一些相关问题/答案这里这里


我还是不知道如何正确指定使用nvcc构建代码生成的体系结构。


完整的描述有点复杂,但目的是相对简单,记忆规范用法。编译架构(虚拟和实际),代表您要定位的GPU。一个相当简单的形式是:

  -gencode arch = compute_XX,code = sm_XX 

其中XX是您要定位的GPU的两位数计算能力。如果您希望定位多个GPU,只需为每个XX目标重复整个序列。这大约是使用CUDA示例代码项目所采用的方法。 (如果您想在可执行档中加入PTX,请在代码选项中加入额外的 -gencode 相同的PTX虚拟结构作为 arch 选项)。



另一个相当简单的形式,只是使用:

  -arch = sm_XX 

与XX相同的描述。这种形式将包括指定架构的SASS和PTX。


现在,除了两个编译器标志外,还有两个指定体系结构的方法:sm_XX和compute_XX,其中compute_XX表示虚拟,sm_XX表示真实体系结构。标志-arch只使用虚拟体系结构的标识符(例如compute_XX),而-code标志用于实体和虚拟体系结构的标识符。


arch 代码用作子开关时,基本上是正确的 -gencode 切换,或者如果二者一起使用,则如您所述独立。但是,例如,当 -arch 本身使用(不含 -code )时,它代表另一种简写符号,在这种情况下,你可以传递一个真实的架构,例如 -arch = sm_52


但是,不清楚哪个PTX或二进制代码将嵌入二进制。如果我指定例如-arch = compute_30 -code = sm_52,这是否意味着我的代码将首先编译为功能级别3.0 PTX,然后将从中创建功能级别5.2的机器代码?


嵌入内容的确切定义取决于使用的形式。但对于此示例:

  -gencode arch = compute_30,code = sm_52 
/ pre>

或您指定的等效情况:

  arch = compute_30 -code = sm_52 

那么是,表示:


  1. 从您的源代码生成一个临时PTX代码,它将使用cc3.0 PTX。

  2. 从该PTX , ptxas 工具将生成符合cc5.2的SASS代码。

  3. SASS代码将嵌入到您的可执行文件中。 li>
  4. PTX代码将被丢弃。

(我不知道为什么会实际指定这样的组合,但它是合法的。)


如果我只是指定-code = sm_52会发生什么?只有嵌入V5.2 PTX代码创建的V5.2的机器代码?和-code = compute_52有什么区别?


-code = sm_52 将从中间PTX代码生成cc5.2 SASS代码。 SASS码将被嵌入,PTX将被丢弃。请注意,在此形式中指定此选项(不带 -arch 选项)将是非法的。 (1)



-code = compute_52 将生成cc5.x PTX代码可执行文件/二进制文件。请注意,在此形式中指定此选项(不带 -arch 选项)将是非法的。 (1)



cuobjdump 工具可用于标识给定二进制文件中的哪些组件。



(1)当未使用 -gencode 开关,并且未使用 -arch 开关时, nvcc 假设一个默认值 -arch = sm_20 附加到您的编译命令(这是为CUDA 7.5,默认 -arch 设置可能因CUDA版本而异)。 sm_20 是一种真实的架构,在真实体系结构是不合法的> -arch 选项。


I am still not sure how to properly specify the architectures for code generation when building with nvcc. I am aware that there is machine code as well as PTX code embedded in my binary and that this can be controlled via the controller switches -code and -arch (or a combination of both using -gencode).

Now, according to this apart from the two compiler flags there are also two ways of specifying architectures: sm_XX and compute_XX, where compute_XX refers to a virtual and sm_XX to a real architecture. The flag -arch only takes identifiers for virtual architectures (such as compute_XX) whereas the -code flag takes both, identifiers for real and for virtual architectures.

The documentation states that -arch specifies the virtual architectures for which the input files are compiled. However, this PTX code is not automatically compiled to machine code, but this is rather a "preprocessing step".

Now, -code is supposed to specify which architectures the PTX code is assembled and optimised for.

However, it is not clear which PTX or binary code will be embedded in the binary. If I specify for example -arch=compute_30 -code=sm_52, does that mean my code will first be compiled to feature level 3.0 PTX from which afterwards machine code for feature level 5.2 will be created? And what will be embedded?

If I just specify -code=sm_52 what will happen then? Only machine code for V5.2 will be embedded that has been created out of V5.2 PTX code? And what would be the difference to -code=compute_52?

解决方案

Some related questions/answers are here and here.

I am still not sure how to properly specify the architectures for code generation when building with nvcc.

A complete description is somewhat complicated, but there are intended to be relatively simple, easy-to-remember canonical usages. Compile for the architecture (both virtual and real), that represents the GPUs you wish to target. A fairly simple form is:

-gencode arch=compute_XX,code=sm_XX

where XX is the two digit compute capability for the GPU you wish to target. If you wish to target multiple GPUs, simply repeat the entire sequence for each XX target. This is approximately the approach taken with the CUDA sample code projects. (If you'd like to include PTX in your executable, include an additional -gencode with the code option specifying the same PTX virtual architecture as the arch option).

Another fairly simple form, when targetting only a single GPU, is just to use:

-arch=sm_XX 

with the same description for XX. This form will include both SASS and PTX for the specified architecture.

Now, according to this apart from the two compiler flags there are also two ways of specifying architectures: sm_XX and compute_XX, where compute_XX refers to a virtual and sm_XX to a real architecture. The flag -arch only takes identifiers for virtual architectures (such as compute_XX) whereas the -code flag takes both, identifiers for real and for virtual architectures.

That is basically correct when arch and code are used as sub-switches within the -gencode switch, or if both are used together, standalone as you describe. But, for example, when -arch is used by itself (without -code), it represents another kind of "shorthand" notation, and in that case, you can pass a real architecture, for example -arch=sm_52

However, it is not clear which PTX or binary code will be embedded in the binary. If I specify for example -arch=compute_30 -code=sm_52, does that mean my code will first be compiled to feature level 3.0 PTX from which afterwards machine code for feature level 5.2 will be created from? And what will be embedded?

The exact definition of what gets embedded varies depending on the form of the usage. But for this example:

-gencode arch=compute_30,code=sm_52

or for the equivalent case you identify:

-arch=compute_30 -code=sm_52

then yes, it means that:

  1. A temporary PTX code will be generated from your source code, and it will use cc3.0 PTX.
  2. From that PTX, the ptxas tool will generate cc5.2-compliant SASS code.
  3. The SASS code will be embedded in your executable.
  4. The PTX code will be discarded.

(I'm not sure why you would actually specify such a combo, but it is legal.)

If I just specify -code=sm_52 what will happen then? Only machine code for V5.2 will be embedded that has been created out of V5.2 PTX code? And what would be the difference to -code=compute_52?

-code=sm_52 will generate cc5.2 SASS code out of an intermediate PTX code. The SASS code will be embedded, the PTX will be discarded. Note that specifying this option by itself in this form, with no -arch option, would be illegal. (1)

-code=compute_52 will generate cc5.x PTX code (only) and embed that PTX in the executable/binary. Note that specifying this option by itself in this form, with no -arch option, would be illegal. (1)

The cuobjdump tool can be used to identify what components exactly are in a given binary.

(1) When no -gencode switch is used, and no -arch switch is used, nvcc assumes a default -arch=sm_20 is appended to your compile command (this is for CUDA 7.5, the default -arch setting may vary by CUDA version). sm_20 is a real architecture, and it is not legal to specify a real architecture on the -arch option when a -code option is also supplied.

这篇关于CUDA:如何使用-arch和-code和SM vs COMPUTE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆