我应该看看PTX来优化我的内核吗?如果是,如何? [英] Should I look into PTX to optimize my kernel? If so, how?

查看:393
本文介绍了我应该看看PTX来优化我的内核吗?如果是,如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您是否建议您阅读内核的PTX代码,以便进一步优化内核?

Do you recommend reading your kernel's PTX code to find out to optimize your kernels further?

一个例子:我读到,可以从PTX代码找出自动循环展开是否有效。如果不是这种情况,则必须在内核代码中手动展开循环。

One example: I read, that one can find out from the PTX code if the automatic loop unrolling worked. If this is not the case, one would have to unroll the loops manually in the kernel code.


  • PTX代码还有其他用例吗?

  • 你看看你的PTX代码吗?

  • 我在哪里可以找到如何读取CUDA为我的内核生成的PTX代码?

推荐答案

关于PTX的第一点是,它只是在GPU上运行的代码的一个中间表示 - 虚拟机汇编语言。 PTX在编译时通过 ptxas 或由驱动程序在运行时组装到目标机器代码。所以当你看PTX,你看看编译器发出了什么,但不是什么GPU实际上运行。也可以从头编写自己的PTX代码(这是CUDA支持的唯一的JIT编译模型),或者作为CUDA C代码中的内联汇编器部分的一部分(后者从CUDA 4.0开始正式支持,但是非官方支持的时间长得多)。 CUDA总是随工具包一起提供一份完整的PTX语言指南,并且有完整的文档。 ocelot项目已使用此文档实现自己的PTX交叉编译器,这允许CUDA代码可以在其他硬件上运行,最初是x86处理器,但是最近使用的是AMD GPU。

The first point to make about PTX is that it is only an intermediate representation of the code run on the GPU -- a virtual machine assembly language. PTX is assembled to target machine code either by ptxas at compile time, or by the driver at runtime. So when you are looking at PTX, you are looking at what the compiler emitted, but not at what the GPU will actually run. It is also possible to write your own PTX code, either from scratch (this is the only JIT compilation model supported in CUDA), or as part of inline-assembler sections in CUDA C code (the latter officially supported since CUDA 4.0, but "unofficially" supported for much longer than that). CUDA has always shipped with a complete guide to the PTX language with the toolkit, and it is fully documented. The ocelot project has used this documentation to implement their own PTX cross compiler, which allows CUDA code to run natively on other hardware, initially x86 processors, but more recently AMD GPUs.

如果你想看看GPU是什么(与编译器发出的内容相反),NVIDIA现在提供了一个名为 cudaobjdump 的二进制反汇编器工具,它可以显示为Fermi GPU编译的代码中的实际机器代码段。有一个较老的,非官方的工具 decuda ,它适用于G80和G90 GPU。

If you want to see what the GPU is actualy running (as opposed to what the compiler is emitting), NVIDIA now supply a binary disassembler tool called cudaobjdump which can show the actual machine code segments in code compiled for Fermi GPUs. There was an older, unofficialy tool called decuda which worked for G80 and G90 GPUs.

说到这里,有很多东西需要从PTX输出中学到,特别是在编译器如何应用优化以及它发出什么指令来实现某些C语言。每个版本的NVIDIA CUDA工具包都附带了指南 nvcc PTX语言的文档。这两个文档中包含大量信息,以了解如何将CUDA C / C ++内核代码编译为PTX,以及了解PTX指令的工作原理。

Having said that, there is a lot to be learned from PTX output, particularly at how the compiler is applying optimizations and what instructions it is emitting to implement certain C contructs. Every version of the NVIDIA CUDA toolkit comes with a guide to nvcc and documentation for the PTX language. There is plenty of information contained in both documents to both learn how to compile a CUDA C/C++ kernel code to PTX, and to understand what the PTX instructions will do.

这篇关于我应该看看PTX来优化我的内核吗?如果是,如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆