使用多个“拱”的目的是什么?标志在Nvidia的NVCC编译器? [英] What is the purpose of using multiple "arch" flags in Nvidia's NVCC compiler?

查看:346
本文介绍了使用多个“拱”的目的是什么?标志在Nvidia的NVCC编译器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



根据我的理解,当使用NVCC的-gencode选项时,arch 是程序员应用程序所需的最小计算架构,也是NVCC的JIT编译器将为其编译PTX代码的最小设备计算架构。



我也理解,-gencode的code参数是NVCC完全编译应用程序的计算架构,因此不需要JIT编译。



检查各种CUDA项目Makefiles后,我注意到了以下情况:

  -gencode arch = compute_20,code = sm_20 
-gencode arch = compute_20,code = sm_21
-gencode arch = compute_21,code = sm_21
/ pre>

后,我发现可以在一个二进制文件中编译多个设备结构 - 在这种情况下为sm_20,sm_21。



我的问题是为什么需要这么多的arch / code对?上面是否使用了arch的所有值?



它们之间的区别是什么:

  -arch compute_20 
-code sm_20
-code sm_21


b $ b

是否自动选择arch字段中最早的虚拟体系结构,还是有其他一些模糊的行为?



是否有其他编译和运行时我应该注意吗?



我已经阅读了手册, http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-compilation ,我还没有



干杯,



James。

$ b $



CUDA C / C ++器件代码(代码编译)代码编译流程如下:

source - > PTX - > SASS



虚拟架构(例如 compute_20 c $ c> -arch compute ... )确定将生成什么类型​​的PTX代码。附加的开关(例如 -code sm_21 )确定将生成什么类型​​的SASS代码。 SASS实际上是GPU的可执行目标代码(机器语言)。可执行文件可以包含SASS和/或PTX的多个版本,并且有一个运行时加载器机制,将根据实际使用的GPU选择适当的版本。



指出,GPU操作的一个方便的特性是JIT编译。 JIT编译将由GPU驱动程序完成(不需要安装CUDA工具包),只要有适当的PTX代码可用,但合适的SASS代码不可用。



然后,包括多个虚拟架构(即多个版本的PTX)的一个优点是,您可以与更多种类的目标GPU设备具有可执行兼容性(尽管一些设备可能会触发JIT编译来创建必要的SASS)。



包含多个真实GPU目标(即多个SASS版本)的一个优点是,当存在其中一个目标设备时,可以避免JIT编译步骤。



如果您指定了一组错误的选项,则可能会创建一个不能在特定GPU上正常运行的可执行文件。



指定很多这些选项的一个可能的缺点是代码大小膨胀。另一个可能的缺点是编译时间,通常指定更多选项时会更长。



也可以创建不包含PTX的可选序列,



创建适合JIT的PTX应该由代码开关指定虚拟体系结构


I've recently gotten my head around how NVCC compiles CUDA device code for different compute architectures.

From my understanding, when using NVCC's -gencode option, "arch" is the minimum compute architecture required by the programmer's application, and also the minimum device compute architecture that NVCC's JIT compiler will compile PTX code for.

I also understand that the "code" parameter of -gencode is the compute architecture which NVCC completely compiles the application for, such that no JIT compilation is necessary.

After inspection of various CUDA project Makefiles, I've noticed the following occur regularly:

-gencode arch=compute_20,code=sm_20
-gencode arch=compute_20,code=sm_21
-gencode arch=compute_21,code=sm_21

and after some reading, I found that multiple device architectures could be compiled for in a single binary file - in this case sm_20, sm_21.

My questions are why are so many arch / code pairs necessary? Are all values of "arch" used in the above?

what is the difference between that and say:

-arch compute_20
-code sm_20
-code sm_21

Is the earliest virtual architecture in the "arch" fields selected automatically, or is there some other obscure behaviour?

Is there any other compilation and runtime behaviour I should be aware of?

I've read the manual, http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-compilation and I'm still not clear regarding what happens at compilation or runtime.

Cheers,

James.

解决方案

Roughly speaking, the code compilation flow goes like this:

CUDA C/C++ device code source --> PTX --> SASS

The virtual architecture (e.g. compute_20, whatever is specified by -arch compute...) determines what type of PTX code will be generated. The additional switches (e.g. -code sm_21) determine what type of SASS code will be generated. SASS is actually executable object code for a GPU (machine language). An executable can contain multiple versions of SASS and/or PTX, and there is a runtime loader mechanism that will pick appropriate versions based on the GPU actually being used.

As you point out, one of the handy features of GPU operation is JIT-compile. JIT-compile will be done by the GPU driver (does not require the CUDA toolkit to be installed) anytime a suitable PTX code is available but a suitable SASS code is not.

One advantage of including multiple virtual architectures (i.e. multiple versions of PTX), then, is that you have executable compatibility with a wider variety of target GPU devices (although some devices may trigger a JIT-compile to create the necessary SASS).

One advantage of including multiple "real GPU targets" (i.e. multiple SASS versions) is that you can avoid the JIT-compile step, when one of those target devices is present.

If you specify a bad set of options, it's possible to create an executable that won't run (correctly) on a particular GPU.

One possible disadvantage of specifying a lot of these options is code size bloat. Another possible disadvantage is compile time, which will generally be longer as you specify more options.

It's also possible to create excutables that contain no PTX, which may be of interest to those trying to obscure their IP.

Creating PTX suitable for JIT should be done by specifying a virtual architecture for the code switch.

这篇关于使用多个“拱”的目的是什么?标志在Nvidia的NVCC编译器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆