CUDA:如何使用 -arch 和 -code 以及 SM 与 COMPUTE [英] CUDA: How to use -arch and -code and SM vs COMPUTE

查看:42
本文介绍了CUDA:如何使用 -arch 和 -code 以及 SM 与 COMPUTE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍然不确定在使用 nvcc 构建时如何正确指定代码生成的体系结构.我知道我的二进制文件中嵌入了机器代码和 PTX 代码,这可以通过控制器开关 -code-arch (或组合两者都使用 -gencode).

I am still not sure how to properly specify the architectures for code generation when building with nvcc. I am aware that there is machine code as well as PTX code embedded in my binary and that this can be controlled via the controller switches -code and -arch (or a combination of both using -gencode).

现在,根据 this 除了这两个编译器标志也有两种指定架构的方法:sm_XXcompute_XX,其中 compute_XX 指的是虚拟和 sm_XX 到一个真正的架构.-arch 标志只接受虚拟架构的标识符(例如 compute_XX),而 -code 标志同时接受真实和虚拟的标识符架构.

Now, according to this apart from the two compiler flags there are also two ways of specifying architectures: sm_XX and compute_XX, where compute_XX refers to a virtual and sm_XX to a real architecture. The flag -arch only takes identifiers for virtual architectures (such as compute_XX) whereas the -code flag takes both, identifiers for real and for virtual architectures.

文档指出 -arch 指定了编译输入文件的虚拟架构.但是,这个 PTX 代码不会自动编译成机器码,而是一个预处理步骤".

The documentation states that -arch specifies the virtual architectures for which the input files are compiled. However, this PTX code is not automatically compiled to machine code, but this is rather a "preprocessing step".

现在,-code 应该指定 PTX 代码针对哪些架构进行组装和优化.

Now, -code is supposed to specify which architectures the PTX code is assembled and optimised for.

但是,不清楚将在二进制文件中嵌入哪个 PTX 或二进制代码.如果我指定例如 -arch=compute_30 -code=sm_52,这是否意味着我的代码将首先被编译为功能级别 3.0 PTX,之后将创建功能级别 5.2 的机器代码?将嵌入什么?

However, it is not clear which PTX or binary code will be embedded in the binary. If I specify for example -arch=compute_30 -code=sm_52, does that mean my code will first be compiled to feature level 3.0 PTX from which afterwards machine code for feature level 5.2 will be created? And what will be embedded?

如果我只指定 -code=sm_52 那么会发生什么?只有 V5.2 的机器代码会嵌入由 V5.2 PTX 代码创建的机器代码吗?和 -code=compute_52 有什么区别?

If I just specify -code=sm_52 what will happen then? Only machine code for V5.2 will be embedded that has been created out of V5.2 PTX code? And what would be the difference to -code=compute_52?

推荐答案

一些相关的问题/答案是 这里这里.

Some related questions/answers are here and here.

我仍然不确定在使用 nvcc 构建时如何正确指定代码生成的架构.

I am still not sure how to properly specify the architectures for code generation when building with nvcc.

完整的描述有些复杂,但旨在提供相对简单、易于记忆的规范用法.为代表您希望定位的 GPU 的架构(虚拟和真实)进行编译.一个相当简单的形式是:

A complete description is somewhat complicated, but there are intended to be relatively simple, easy-to-remember canonical usages. Compile for the architecture (both virtual and real), that represents the GPUs you wish to target. A fairly simple form is:

-gencode arch=compute_XX,code=sm_XX

其中 XX 是您希望定位的 GPU 的两位数计算能力.如果您希望针对多个 GPU,只需为每个 XX 目标重复整个序列即可.这与 CUDA 示例代码项目所采用的方法大致相同.(如果您想在您的可执行文件中包含 PTX,请包含一个附加的 -gencodecode 选项,指定与 arch 选项).

where XX is the two digit compute capability for the GPU you wish to target. If you wish to target multiple GPUs, simply repeat the entire sequence for each XX target. This is approximately the approach taken with the CUDA sample code projects. (If you'd like to include PTX in your executable, include an additional -gencode with the code option specifying the same PTX virtual architecture as the arch option).

当仅针对单个 GPU 时,另一种相当简单的形式就是使用:

Another fairly simple form, when targetting only a single GPU, is just to use:

-arch=sm_XX 

与 XX 的描述相同.此表单将包含指定架构的 SASS 和 PTX.

with the same description for XX. This form will include both SASS and PTX for the specified architecture.

现在,根据这一点,除了两个编译器标志外,还有两种指定架构的方法:sm_XX 和 compute_XX,其中 compute_XX 指的是虚拟架构,sm_XX 指的是真实架构.-arch 标志只接受虚拟架构的标识符(例如 compute_XX),而 -code 标志同时接受真实架构和虚拟架构的标识符.

Now, according to this apart from the two compiler flags there are also two ways of specifying architectures: sm_XX and compute_XX, where compute_XX refers to a virtual and sm_XX to a real architecture. The flag -arch only takes identifiers for virtual architectures (such as compute_XX) whereas the -code flag takes both, identifiers for real and for virtual architectures.

archcode 被用作 -gencode 开关中的子开关时,或者如果 两者都使用,这基本上是正确的一起使用,如您所描述的那样独立使用.但是,例如,当 -arch 单独使用时(没有 -code),它代表另一种速记"表示法,在这种情况下,您可以通过真实架构,例如 -arch=sm_52

That is basically correct when arch and code are used as sub-switches within the -gencode switch, or if both are used together, standalone as you describe. But, for example, when -arch is used by itself (without -code), it represents another kind of "shorthand" notation, and in that case, you can pass a real architecture, for example -arch=sm_52

但是,不清楚将在二进制文件中嵌入哪个 PTX 或二进制代码.如果我指定例如 -arch=compute_30 -code=sm_52,这是否意味着我的代码将首先编译为功能级别 3.0 PTX,然后将从中创建功能级别 5.2 的机器代码?将嵌入什么?

However, it is not clear which PTX or binary code will be embedded in the binary. If I specify for example -arch=compute_30 -code=sm_52, does that mean my code will first be compiled to feature level 3.0 PTX from which afterwards machine code for feature level 5.2 will be created from? And what will be embedded?

嵌入内容的确切定义因使用形式而异.但是对于这个例子:

The exact definition of what gets embedded varies depending on the form of the usage. But for this example:

-gencode arch=compute_30,code=sm_52

或对于您确定的等效情况:

or for the equivalent case you identify:

-arch=compute_30 -code=sm_52

那么是的,这意味着:

  1. 将从您的源代码生成一个临时 PTX 代码,它将使用 cc3.0 PTX.
  2. 从该 PTX,ptxas 工具将生成符合 cc5.2 的 SASS 代码.
  3. SASS 代码将嵌入到您的可执行文件中.
  4. PTX 代码将被丢弃.
  1. A temporary PTX code will be generated from your source code, and it will use cc3.0 PTX.
  2. From that PTX, the ptxas tool will generate cc5.2-compliant SASS code.
  3. The SASS code will be embedded in your executable.
  4. The PTX code will be discarded.

(我不确定你为什么要指定这样的组合,但它是合法的.)

(I'm not sure why you would actually specify such a combo, but it is legal.)

如果我只指定 -code=sm_52 会发生什么?只有 V5.2 的机器代码会嵌入由 V5.2 PTX 代码创建的机器代码吗?和 -code=compute_52 有什么区别?

If I just specify -code=sm_52 what will happen then? Only machine code for V5.2 will be embedded that has been created out of V5.2 PTX code? And what would be the difference to -code=compute_52?

-code=sm_52 将从中间 PTX 代码生成 cc5.2 SASS 代码.SASS 代码将被嵌入,PTX 将被丢弃.请注意,以这种形式单独指定此选项,没有 -arch 选项,将是非法的.(1)

-code=sm_52 will generate cc5.2 SASS code out of an intermediate PTX code. The SASS code will be embedded, the PTX will be discarded. Note that specifying this option by itself in this form, with no -arch option, would be illegal. (1)

-code=compute_52 将生成 cc5.x PTX 代码(仅)并将该 PTX 嵌入到可执行文件/二进制文件中.请注意,以这种形式单独指定此选项,没有 -arch 选项,将是非法的.(1)

-code=compute_52 will generate cc5.x PTX code (only) and embed that PTX in the executable/binary. Note that specifying this option by itself in this form, with no -arch option, would be illegal. (1)

cuobjdump 工具 可用于识别给定二进制文件中的确切组件.

The cuobjdump tool can be used to identify what components exactly are in a given binary.

(1) 当没有使用-gencode 开关,也没有使用-arch 开关时,nvcc 假定一个默认的-arch=sm_20 附加到您的编译命令(这是针对 CUDA 7.5,默认 -arch 设置可能因 CUDA 版本而异).sm_20real 架构,在 -arch 选项上指定 real 架构是不合法的还提供了一个 -code 选项.

(1) When no -gencode switch is used, and no -arch switch is used, nvcc assumes a default -arch=sm_20 is appended to your compile command (this is for CUDA 7.5, the default -arch setting may vary by CUDA version). sm_20 is a real architecture, and it is not legal to specify a real architecture on the -arch option when a -code option is also supplied.

这篇关于CUDA:如何使用 -arch 和 -code 以及 SM 与 COMPUTE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆