__forceinline__在CUDA C __device__职能作用 [英] __forceinline__ effect at CUDA C __device__ functions
问题描述
有是在何时使用内联函数和,以避免它常规的C编码多多指教。什么是 __ __ forceinline
对CUDA C __ __设备
功能的影响?应该在哪里,他们使用何避免?
There is a lot of advice on when to use inline functions and when to avoid it in regular C coding. What is the effect of __forceinline__
on CUDA C __device__
functions? Where should they be used and where be avoided?
推荐答案
通常情况下, NVCC
设备code编译器将它何时内联特定自主决定 __设备__
功能,一般来讲,你也许并不需要担心重写,与 __ forceinline __
装饰/指令。
Normally the nvcc
device code compiler will make it's own decisions about when to inline a particular __device__
function and generally speaking, you probably don't need to worry about overriding that with the __forceinline__
decorator/directive.
CC 1.x设备不具备所有相同的硬件功能作为较新的设备,所以很多时候,编译器会为这些设备自动内联函数。
cc 1.x devices don't have all the same hardware capabilities as newer devices, so very often the compiler will automatically inline functions for those devices.
我想原因指定 __ __ forceinline
是一样的你可能已经了解了主机C code。它通常用于优化时,编译器可能无法以其他方式内联函数(例如在CC 2.x或更新的设备)。如果你曾经调用该函数这种优化(即函数调用的开销)可能是微不足道的,但如果你是调用,例如一个循环的功能,并确保它是内联可能会给在code执行明显改善。
I think the reason to specify __forceinline__
is the same as what you may have learned about host C code. It is usually used for optimization when the compiler might not otherwise inline the function (e.g. on cc 2.x or newer devices). This optimization (i.e. function call overhead) might be negligible if you were calling the function once, but if you were calling the function in a loop for example, making sure it was inlined might give noticeable improvement in code execution.
作为一个反例,内联和递归一般有禁忌症。对于自称是递归函数,我不认为这是可能的处理任意递归,也严格的内联。所以,如果你打算使用递归功能(在CC 2.x和以上版本支持),您可能不希望指定 __ __ forceinline
。
As a counter example, inlining and recursion generally have contra-indications. For a recursive function that calls itself, I don't think it's possible to handle arbitrary recursion and also strict inlining. So if you intend to use a function recursively (supported in cc 2.x and above) you probably wouldn't want to specify __forceinline__
.
在一般情况下,我认为你应该让编译器管理这个给你。它会智能地决定是否内联函数。
In general, I think you should let the compiler manage this for you. It will intelligently decide whether to inline a function.
这篇关于__forceinline__在CUDA C __device__职能作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!