CUDA内核作为类的成员函数 [英] CUDA kernel as member function of a class

查看:446
本文介绍了CUDA内核作为类的成员函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用CUDA 5.0和Compute Capability 2.1卡。

I am using CUDA 5.0 and a Compute Capability 2.1 card.

问题很简单:内核可以成为类的一部分吗?
例如:

The question is quite straightforward: Can a kernel be part of a class? For example:

class Foo
{
private:
 //...
public:
 __global__ void kernel();
};

__global__ void Foo::kernel()
{
 //implementation here
}

如果不是,那么解决方案是制作一个包装函数,该包装函数是该类的成员并在内部调用内核?

If not then the solution is to make a wrapper function that is member of the class and calls the kernel internally?

如果是,那么它是否可以像正常的私有函数一样访问私有属性?

And if yes, then will it have access to the private attributes as a normal private function?

(我不只是尝试这样做,还因为我的项目看到了结果现在还有其他几个错误,我也认为这是一个很好的参考问题。我很难找到将CUDA与C ++结合使用的参考。可以找到基本的功能示例,但找不到结构化代码的策略。)

(I'm not just trying it and see what happens because my project has several other errors right now and also I think it's a good reference question. It was difficult for me to find reference for using CUDA with C++. Basic functionality examples can be found but not strategies for structured code.)

推荐答案

让我暂时不在讨论中使用cuda动态并行性(即假定计算能力为3.0或更低)。

Let me leave cuda dynamic parallelism out of the discussion for the moment (i.e. assume compute capability 3.0 or prior).

记住__ global__用于将(仅)从主机调用(但在设备上执行)的cuda函数。如果您在设备上实例化此对象,它将无法正常工作。此外,要使成员函数可以使用设备可访问的私有数据,必须在设备上实例化该对象。

remember __ global__ is used for cuda functions that will (only) be called from the host (but execute on the device). If you instantiate this object on the device, it won't work. Furthermore, to get device-accessible private data to be available to the member function, the object would have to be instantiated on the device.

因此,您可以拥有一个内核 invocation (即 mykernel<<< blocks,threads>>>>(...); 嵌入到宿主对象成员函数中,但是内核定义(即带有__ global__装饰器的函数定义)通常会在源代码中位于对象定义之前,并且如前所述,这种方法不能用于在设备上实例化的对象。不能访问在对象的其他位置定义的普通私有数据。(可能会为仅主机的对象提供一个方案,该对象使用全局内存中的指针创建设备数据,然后可以在设备上访问,但是乍一看,这样的方案在我看来还是很复杂。)

So you could have a kernel invocation (ie. mykernel<<<blocks,threads>>>(...); embedded in a host object member function, but the kernel definition (i.e. the function definition with the __ global__ decorator) would normally precede the object definition in your source code. And as stated already, such a methodology could not be used for an object instantiated on the device. It would also not have access to ordinary private data defined elsewhere in the object. (It may be possible to come up with a scheme for a host-only object that does create device data, using pointers in global memory, that would then be accessible on the device, but such a scheme seems quite convoluted to me at first glance).

通常,设备可用的成员函数将以__开头device__装饰器。在这种情况下,设备成员函数中的所有代码都在调用它的线程内执行。

Normally, device-usable member functions would be preceded by the __ device__ decorator. In this case, all the code in the device member function executes from within the thread that called it.

这个问题给出了一个C ++对象的示例(在我编辑过的答案中),该对象的成员函数可以从两个主机中调用和设备,并在主机和设备对象之间进行适当的数据复制。

This question gives an example (in my edited answer) of a C++ object with a member function callable from both the host and the device, with appropriate data copying between host and device objects.

这篇关于CUDA内核作为类的成员函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆