__forceinline__在CUDA C __device__职能作用 [英] __forceinline__ effect at CUDA C __device__ functions

查看:1184
本文介绍了__forceinline__在CUDA C __device__职能作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有是在何时使用内联函数和,以避免它常规的C编码多多指教。什么是 __ __ forceinline 对CUDA C __ __设备功能的影响?应该在哪里,他们使用何避免?

There is a lot of advice on when to use inline functions and when to avoid it in regular C coding. What is the effect of __forceinline__ on CUDA C __device__ functions? Where should they be used and where be avoided?

推荐答案

通常情况下, NVCC 设备code编译器将它何时内联特定自主决定 __设备__ 功能,一般来讲,​​你也许并不需要担心重写,与 __ forceinline __ 装饰/指令。

Normally the nvcc device code compiler will make it's own decisions about when to inline a particular __device__ function and generally speaking, you probably don't need to worry about overriding that with the __forceinline__ decorator/directive.

CC 1.x设备不具备所有相同的硬件功能作为较新的设备,所以很多时候,编译器会为这些设备自动内联函数。

cc 1.x devices don't have all the same hardware capabilities as newer devices, so very often the compiler will automatically inline functions for those devices.

我想原因指定 __ __ forceinline 是一样的你可能已经了解了主机C code。它通常用于优化时,编译器可能无法以其他方式内联函数(例如在CC 2.x或更新的设备)。如果你曾经调用该函数这种优化(即函数调用的开销)可能是微不足道的,但如果你是调用,例如一个循环的功能,并确保它是内联可能会给在code执行明显改善。

I think the reason to specify __forceinline__ is the same as what you may have learned about host C code. It is usually used for optimization when the compiler might not otherwise inline the function (e.g. on cc 2.x or newer devices). This optimization (i.e. function call overhead) might be negligible if you were calling the function once, but if you were calling the function in a loop for example, making sure it was inlined might give noticeable improvement in code execution.

作为一个反例,内联和递归一般有禁忌症。对于自称是递归函数,我不认为这是可能的处理任意递归,也严格的内联。所以,如果你打算使用递归功能​​(在CC 2.x和以上版本支持),您可能不希望指定 __ __ forceinline

As a counter example, inlining and recursion generally have contra-indications. For a recursive function that calls itself, I don't think it's possible to handle arbitrary recursion and also strict inlining. So if you intend to use a function recursively (supported in cc 2.x and above) you probably wouldn't want to specify __forceinline__.

在一般情况下,我认为你应该让编译器管理这个给你。它会智能地决定是否内联函数。

In general, I think you should let the compiler manage this for you. It will intelligently decide whether to inline a function.

这篇关于__forceinline__在CUDA C __device__职能作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆