CUDA:在C ++中包装设备内存分配 [英] CUDA: Wrapping device memory allocation in C++

查看:173
本文介绍了CUDA:在C ++中包装设备内存分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在开始使用CUDA,不得不承认我对C API有点失望。我理解选择C的原因,但是语言是基于C ++,而是几个方面会更简单。设备内存分配(通过 cudaMalloc )。

I'm starting to use CUDA at the moment and have to admit that I'm a bit disappointed with the C API. I understand the reasons for choosing C but had the language been based on C++ instead, several aspects would have been a lot simpler, e.g. device memory allocation (via cudaMalloc).

我的计划是使用重载的 new 和RAII(两个备选)的新的 operator new 。我想知道是否有任何警告,我还没有注意到目前为止。代码似乎可以工作,但我仍然想知道潜在的内存泄漏。

My plan was to do this myself, using overloaded operator new with placement new and RAII (two alternatives). I'm wondering if there are any caveats that I haven't noticed so far. The code seems to work but I'm still wondering about potential memory leaks.

RAII 代码如下:

CudaArray<float> device_data(SIZE);
// Use `device_data` as if it were a raw pointer.

也许一个类在这个上下文中是过度的(特别是因为你还需要使用 cudaMemcpy ,该类仅封装RAII),因此另一种方法是 placement new

Perhaps a class is overkill in this context (especially since you'd still have to use cudaMemcpy, the class only encapsulating RAII) so the other approach would be placement new:

float* device_data = new (cudaDevice) float[SIZE];
// Use `device_data` …
operator delete [](device_data, cudaDevice);

这里, cudaDevice 触发过载。但是,由于在正常放置 new 这将表明放置,我发现语法奇怪一致,甚至更喜欢使用类。

Here, cudaDevice merely acts as a tag to trigger the overload. However, since in normal placement new this would indicate the placement, I find the syntax oddly consistent and perhaps even preferable to using a class.

我会赞赏各种批评。有人可能知道是否计划在这个方向上的一些事情为下一个版本的CUDA(正如我所听说的,将改善其C ++支持,无论他们的意思)。

I'd appreciate criticism of every kind. Does somebody perhaps know if something in this direction is planned for the next version of CUDA (which, as I've heard, will improve its C++ support, whatever they mean by that).

所以,我的问题实际上有三个:

So, my question is actually threefold:


  1. 我的刊登位置 new 重载语义正确?

  2. 是否有任何人有关于未来的CUDA发展的信息(让我们面对它:C ++ s * ck中的C接口)?

  3. 我如何以一致的方式进一步(还有其他API需要考虑,例如,不仅有设备内存,还有一个常量内存存储和纹理内存)?

  1. Is my placement new overload semantically correct? Does it leak memory?
  2. Does anybody have information about future CUDA developments that go in this general direction (let's face it: C interfaces in C++ s*ck)?
  3. How can I take this further in a consistent manner (there are other APIs to consider, e.g. there's not only device memory but also a constant memory store and texture memory)?







// Singleton tag for CUDA device memory placement.
struct CudaDevice {
    static CudaDevice const& get() { return instance; }
private:
    static CudaDevice const instance;
    CudaDevice() { }
    CudaDevice(CudaDevice const&);
    CudaDevice& operator =(CudaDevice const&);
} const& cudaDevice = CudaDevice::get();

CudaDevice const CudaDevice::instance;

inline void* operator new [](std::size_t nbytes, CudaDevice const&) {
    void* ret;
    cudaMalloc(&ret, nbytes);
    return ret;
}

inline void operator delete [](void* p, CudaDevice const&) throw() {
    cudaFree(p);
}

template <typename T>
class CudaArray {
public:
    explicit
    CudaArray(std::size_t size) : size(size), data(new (cudaDevice) T[size]) { }

    operator T* () { return data; }

    ~CudaArray() {
        operator delete [](data, cudaDevice);
    }

private:
    std::size_t const size;
    T* const data;

    CudaArray(CudaArray const&);
    CudaArray& operator =(CudaArray const&);
};

关于这里使用的单例:是的,我知道它的缺点。然而,这些在这方面是不相关的。所有我需要的是一个小型的标签,不可复制。其他一切(即多线程考虑,初始化时间)不适用。

About the singleton employed here: Yes, I'm aware of its drawbacks. However, these aren't relevant in this context. All I needed here was a small type tag that wasn't copyable. Everything else (i.e. multithreading considerations, time of initialization) don't apply.

推荐答案

然后我将定义一个符合std :: allocator<>接口的类。理论上,您可以将此类作为模板参数传递到std :: vector<>和std :: map<>等。

I would go with the placement new approach. Then I would define a class that conforms to the std::allocator<> interface. In theory, you could pass this class as a template parameter into std::vector<> and std::map<> and so forth.

做这样的事情充满困难,但至少你会学到更多关于STL这种方式。而且您不需要重新创建容器和算法。

Beware, I have heard that doing such things is fraught with difficulty, but at least you will learn a lot more about the STL this way. And you do not need to re-invent your containers and algorithms.

这篇关于CUDA:在C ++中包装设备内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆