OpenCL在线编译:从cl :: program或cl :: kernel获取程序集 [英] OpenCL online compilation: get assembly from cl::program or cl::kernel

查看:113
本文介绍了OpenCL在线编译:从cl :: program或cl :: kernel获取程序集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用OpenCL运行内核基准测试.我知道我可以使用OpenCL供应商的各种工具(即ioc64poclcc)离线编译内核.问题是我无法从这些工具,OpenCL运行时开销或类似的程序中获得无法用程序集解释的性能结果.

I'm running kernel benchmarks with OpenCL. I know that I can compile kernels offline with various tools from OpenCL vendors (i.e. ioc64 or poclcc). The problem is that I get performance results that I cannot explain with the assembly from these tools, the OpenCL runtime overhead or similar.

我希望看到由我的基准程序编译并执行的在线编译内核的汇编. 有什么方法吗?

I would like to see the assembly of online compiled kernels that are compiled and executed by my benchmark program. Any ways to do that?

我的方法是从cl::programcl::kernel对象中获得此程序集,但我还没有找到任何方法来执行此操作.感谢您的建议或解决方案.

My approach is to get this assembly somewhere from the cl::program or cl::kernel objects but I haven't found any way to do that. I appreciate your advice or solutions.

推荐答案

对于Intel Graphics,您可以使用clGetKernelInfo(...,CL_KERNEL_BINARY_PROGRAM_INTEL,...)直接获取内核ISA位.要反汇编这些位,您可以获取最新的GEN ISA反汇编器并按照此处.具体来说,请参见Building an Intel GPU ISA Disassembler上的部分.我已经有一段时间没有使用它了,但是Intel OpenCL SDK曾经做得更好(不是GUI用户).而是一篇有关如何使用的好文章检查组装的工具.

For Intel Graphics you can use clGetKernelInfo(...,CL_KERNEL_BINARY_PROGRAM_INTEL,...) to directly get the kernel ISA bits. To disassemble those bits, you can get the latest GEN ISA disassembler and build it as described here. Specifically, see the section on Building an Intel GPU ISA Disassembler. I haven't used it in a while, but The Intel OpenCL SDK used to do a better job (not a GUI person). And this is a good article on how to use that tool to scrutinize the assembly.

对于NVidia,由clGetProgramInfo(...CL_PROGRAM_BINARIES...)返回的二进制"实际上返回ptx.这可能就足够了,但是如果您要执行确切的着色器程序集,则可以将ptx实际输入到ptxas中,然后使用--dump-sass选项反汇编cuobjdump以获得最低级别的程序集.请注意,我们只能猜测NVidia驱动程序使用的是与ptxas相同的算法,但似乎合乎逻辑.

For NVidia, the "binary" returned by clGetProgramInfo(...CL_PROGRAM_BINARIES...) actually returns ptx. This might be enough, but if you want the exact shader assembly executed, then you can actually feed the ptx into ptxas and then disassemble cuobjdump with the --dump-sass option to get the lowest level assembly. Note, we're reduced to guessing that the NVidia driver is using the same algorithm as ptxas, but it seems logical.

AMD可能具有类似的工具,但我对它们不那么了解.

AMD likely has similar tools, but I am less versed on them.

这篇关于OpenCL在线编译:从cl :: program或cl :: kernel获取程序集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆