Cuda虚拟类 [英] Cuda virtual class

查看:163
本文介绍了Cuda虚拟类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在cuda内核中执行一些虚拟方法,而不是在同一个内核中创建对象,我想在主机上创建它,并将其复制到gpu内存。

I would like to execute some virtual methods in a cuda kernel, but instead of creating the object in the same kernel I would like to create it on the host and copy it to gpu memory.

我在内核中成功创建对象并调用了一个虚方法。复制对象时出现问题。这是有道理的,因为显然虚函数指针是假的。
发生什么是简单的Cuda网格启动失败,至少这是Nsight说。
但是当看看SASS时,它会对虚函数指针的取消引用崩溃。这是有道理的。

I am successfully creating objects in a kernel and call a virtual method. The problem arises when copying the object. This makes sense because obviously the virtual function pointer is bogus. What happens is simply "Cuda grid launch failed", at least this is what Nsight says. But when having a look at the SASS it crashes on the dereferencing of the virtual function pointer, which makes sense.

我当然使用Cuda 4.2以及在适配卡上使用compute_30进行编译。

I am of course using Cuda 4.2 as well as compiling with "compute_30" on a fitting card.

那么,推荐的方法是什么?

So what is the recommended way to go? Or is this feature simply not supported?

我有一个想法,首先运行一个不同的内核,创建虚拟对象,并提取虚拟函数指针补丁我的对象复制它们。可悲的是,这并不是真正的工作(还没有想出来),以及它将是一个丑陋的解决方案。

I had the idea to run a different kernel first which creates dummy objects and extract the virtual function pointer to "patch" my objects before copying them. Sadly this is not really working (haven't figured it out yet) as well as it would be an ugly solution.

这实际上是问题的重新运行,这个问题从未完全回答。

P.S. This is actually a rerun of this question, which sadly was never fully answered.

编辑:

所以我找到了一种方式来做我想要的。但只是为了清楚:这不是一个答案或解决方案,答案已经提供,这只是一个黑客,只是为了好玩。

So I found a way to do what I wanted. But just to be clear : This is not at all an answer or solution, the answer was already provided, this is only a hack, just for fun.

看看Cuda在调用虚方法时做什么,下面是debug SASS

So first lets see what Cuda is doing when calling a virtual method, below is debug SASS

//R0 is the address of our object
LD.CG R0, [R0];  
IADD R0, R0, 0x4;  
NOP;  
MOV R0, R0;  
LD.CG R0, [R0];
...
IADD R0, RZ, R9;  
MOV R0, R0;  
LDC R0, c[0x2][R0];
...
BRX R0 - 0x5478

0x2] [INDEX]对于所有内核都是常量,我们可以通过运行一个内核来获取一个类的索引,其中obj是一个新创建的类的对象:

So assuming that "c[0x2][INDEX]" is constant for all kernels we can just get the index for a class by just running a kernel and doing this, where obj is a newly created object of the class looking at:

unsigned int index = *(unsigned int*)(*(unsigned int*)obj + 4);

然后使用这样的:

struct entry
{
    unsigned int vfptr;// := &vfref, thats our value to store in an object
    int dummy;// := 1234, great for debugging
    unsigned int vfref;// := &dummy
    unsigned int index;
    char ClassName[256];//use it as a key for a dict
};

将此内容存储在主机以及设备内存(内存位置是设备内存)以及主机上可以使用ClassName作为对patch的对象的查找。

Store this in host aswell as device memory(the memory locations are device ones) and on the host you can use the ClassName as a lookup for an object to "patch".

但是,我不会在任何严重的情况下使用这个,因为性能明智的虚拟函数不是伟大的。

But again : I would not use this in anything serious, because performance wise, virtual functions are not great at all.

推荐答案

目前,CUDA编译器和运行时(从CUDA 5.0开始)不支持您所要做的。 CUDA C编程指南v5.0的D.2.6.3节内容如下:

What you are trying to do is not supported, currently, by the CUDA compiler and runtime (as of CUDA 5.0). Section D.2.6.3 of the CUDA C Programming Guide v5.0 reads:


D.2.6.3虚函数



当派生类中的函数覆盖基类中的虚函数时,执行空间限定符(即 __ host __ code> __ device __ )必须匹配。

D.2.6.3 Virtual Functions

When a function in a derived class overrides a virtual function in a base class, the execution space qualifiers (i.e., __host__, __device__) on the overridden and overriding functions must match.

不允许作为参数传递给 __ global __ 函数是具有虚函数的类
的对象。

It is not allowed to pass as an argument to a __global__ function an object of a class with virtual functions.

虚函数表放置在全局

我推荐你将类的数据与类的功能分开封装。例如,将数据存储在结构中。如果您计划对这些对象的数组进行操作,请将数据存储在数组结构中(为了性能 - 不在此问题的范围之内)。使用 cudaMalloc 在主机上分配数据结构,然后将数据作为参数传递给内核,而不是使用virtual方法传递类。

What I recommend is that you encapsulate the data of your class separately from the functionality of the class. For example, store the data in a struct. If you plan to operate on arrays of these objects, store the data in a structure of arrays (for performance -- outside the scope of this question). Allocate the data structures on the host using cudaMalloc, and then pass the data to the kernel as arguments, rather than passing the class with virtual methods.

然后在设备上使用虚拟方法构造对象。使用虚拟方法的类的构造函数将设备指针内核参数作为参数。然后,虚拟设备方法可以对设备数据进行操作。

Then construct your objects with virtual methods on the device. The constructor of your class with virtual methods would take the device pointer kernel parameters as arguments. The virtual device methods could then operate on the device data.

同样的方法也可以在设备上的一个内核中分配数据,并在设备上的另一个内核中访问它(因为再次,具有虚拟函数的类不能是内核的参数)。

The same approach would work to enable allocating the data in one kernel on the device, and accessing it in another kernel on the device (since again, classes with virtual functions can't be parameters to the kernels).

这篇关于Cuda虚拟类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆