如何实现设备端CUDA虚拟功能? [英] How to implement device side CUDA virtual functions?

查看:50
本文介绍了如何实现设备端CUDA虚拟功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到CUDA不允许将带有虚函数的类传递到内核函数中.有没有解决此限制的方法?

I see that CUDA doesn't allow for classes with virtual functions to be passed into kernel functions. Are there any work-arounds to this limitation?

我真的很希望能够在内核函数中使用多态性.

I would really like to be able to use polymorphism within a kernel function.

谢谢!

推荐答案

罗伯特·克罗维拉(Robert Crovella)评论的最重要部分是:

The most important part of Robert Crovella's comment is:

只需在设备上创建对象即可.

The objects simply need to be created on the device.

因此请记住,我正在处理这样的情况:我有一个抽象的 class Function ,然后实现了一些封装了不同功能及其评估的实现.这是我的代码如何实现多态的简化版本,但我并不是说它不能做得更好...希望可以帮助您理解这一点:

So keeping that in mind, I was dealing with situation where I had an abstract class Function and then some implementations of it encapsulating different function and its evaluation. This is the simplified version of my code how I achieved polymorphism in my situation, but I am not saying it cannot be done better... It will hopefully help you to get the idea:

class Function
{
public:
    __device__ Function() {}
    __device__ virtual ~Function() {}
    __device__ virtual void Evaluate(const real* __restrict__ positions, real* fitnesses, const SIZE_TYPE particlesCount) const = 0;
};

class FunctionRsj : public Function
{
private:
    SIZE_TYPE m_DimensionsCount;
    SIZE_TYPE m_PointsCount;
    real* m_Y;
    real* m_X;
public:
    __device__ FunctionRsj(const SIZE_TYPE dimensionsCount, const SIZE_TYPE pointsCount, real* configFileData)
        : m_DimensionsCount(dimensionsCount),
            m_PointsCount(pointsCount),
            m_Y(configFileData),
            m_X(configFileData + pointsCount) {}

    __device__ ~FunctionRsj()
    {
        // m_Y points to the beginning of the config
        // file data, use it for destruction as this 
        // object took ownership of configFilDeata.
        delete[] m_Y;
    }

    __device__ void Evaluate(const real* __restrict__ positions, real* fitnesses, const SIZE_TYPE particlesCount) const
    {
        // Implement evaluation of FunctionRsj here.
    }
};

__global__ void evaluate_fitnesses(
    const real* __restrict__ positions,
    real* fitnesses,
    Function const* const* __restrict__ function,
    const SIZE_TYPE particlesCount)
{
    // This whole kernel is just a proxy as kernels
    // cannot be member functions.
    (*function)->Evaluate(positions, fitnesses, particlesCount);
}

__global__ void create_function(
    Function** function,
    SIZE_TYPE dimensionsCount,
    SIZE_TYPE pointsCount,
    real* configFileData)
{
    // It is necessary to create object representing a function
    // directly in global memory of the GPU device for virtual
    // functions to work correctly, i.e. virtual function table
    // HAS to be on GPU as well.
    if (threadIdx.x == 0 && blockIdx.x == 0)
    {
        (*function) = new FunctionRsj(dimensionsCount, pointsCount, configFileData);
    }
}

__global__ void delete_function(Function** function)
{
    delete *function;
}

int main()
{
    // Lets just assume d_FunctionConfigData, d_Positions,
    // d_Fitnesses are arrays allocated on GPU already ...

    // Create function.
    Function** d_Function;
    cudaMalloc(&d_Function, sizeof(Function**));
    create_function<<<1, 1>>>(d_Function, 10, 10, d_FunctionConfigData);

    // Evaluate using proxy kernel.
    evaluate_fitnesses<<<
        m_Configuration.GetEvaluationGridSize(),
        m_Configuration.GetEvaluationBlockSize(),
        m_Configuration.GetEvaluationSharedMemorySize()>>>(
        d_Positions,
        d_Fitnesses,
        d_Function,
        m_Configuration.GetParticlesCount());

    // Delete function object on GPU.
    delete_function<<<1, 1>>>(d_Function);
}

这篇关于如何实现设备端CUDA虚拟功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆