NVRTC和__device__功能 [英] NVRTC and __device__ functions

查看:202
本文介绍了NVRTC和__device__功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过利用运行时编译来优化我的模拟器。我的代码是相当长而复杂,但我确定了一个特定的 __ device __ 函数,其性能可以通过删除所有全局内存访问强大提高。



CUDA允许动态编译和链接单个 __ device __ 函数(不是 __ global __ ) ,以便覆盖现有的函数?

解决方案

我很确定真正简短的答案是否。 >

虽然CUDA具有动态/ JIT设备链接器支持,但重要的是要记住链接过程本身仍然是静态的。



因此,您不能像在常规动态链接加载环境中那样在运行时延迟加载现有已编译GPU有效内容中的特定函数。并且链接器仍然要求在链接时存在所有代码对象和符号的单个实例,无论是先验的还是在运行时。所以你可以自由的JIT链接在一起预编译的对象与不同版本的相同的代码,只要会话被终止和代码加载到上下文中存在一切的单个实例。但是,只要你可以去。


I am trying to optimize my simulator by leveraging run-time compilation. My code is pretty long and complex, but I identified a specific __device__ function whose performances can be strongly improved by removing all global memory accesses.

Does CUDA allow the dynamic compilation and linking of a single __device__ function (not a __global__), in order to "override" an existing function?

解决方案

I am pretty sure the really short answer is no.

Although CUDA has dynamic/JIT device linker support, it is important to remember that the linkage process itself is still static.

So you can't delay load a particular function in an existing compiled GPU payload at runtime as you can in a conventional dynamic link loading environment. And the linker still requires that a single instance of all code objects and symbols be present at link time, whether that is a priori or at runtime. So you would be free to JIT link together precompiled objects with different versions of the same code, as long as a single instance of everything is present when the session is finalised and the code is loaded into the context. But that is as far as you can go.

这篇关于NVRTC和__device__功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆