CUDA:在C ++中包装设备内存分配 [英] CUDA: Wrapping device memory allocation in C++

查看:88
本文介绍了CUDA:在C ++中包装设备内存分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在开始使用CUDA,不得不承认我对C API有点失望。我理解选择C的原因,但是语言是基于C ++,而是几个方面会更简单。设备内存分配(通过 cudaMalloc )。

I'm starting to use CUDA at the moment and have to admit that I'm a bit disappointed with the C API. I understand the reasons for choosing C but had the language been based on C++ instead, several aspects would have been a lot simpler, e.g. device memory allocation (via cudaMalloc).

我的计划是自己做,使用重载的 new 和RAII(两个备选)的新操作符 c $ c。我想知道是否有任何警告,我还没有注意到目前为止。代码似乎可以工作,但我仍然想知道潜在的内存泄漏。

My plan was to do this myself, using overloaded operator new with placement new and RAII (two alternatives). I'm wondering if there are any caveats that I haven't noticed so far. The code seems to work but I'm still wondering about potential memory leaks.

RAII 代码如下:

CudaArray<float> device_data(SIZE);
// Use `device_data` as if it were a raw pointer.

也许一个类在这个上下文中是过度的(特别是因为你还要使用 cudaMemcpy ,该类仅封装RAII),因此另一种方法是 placement new

Perhaps a class is overkill in this context (especially since you'd still have to use cudaMemcpy, the class only encapsulating RAII) so the other approach would be placement new:

float* device_data = new (cudaDevice) float[SIZE];
// Use `device_data` …
operator delete [](device_data, cudaDevice);

这里, cudaDevice 触发过载。但是,由于在正常放置 new 这将表明放置,我发现语法奇怪一致,甚至更喜欢使用类。

Here, cudaDevice merely acts as a tag to trigger the overload. However, since in normal placement new this would indicate the placement, I find the syntax oddly consistent and perhaps even preferable to using a class.

我会赞赏各种批评。有人可能知道是否计划在这个方向上的一些事情为下一个版本的CUDA(正如我所听说的,将改善其C ++支持,无论他们的意思)。

I'd appreciate criticism of every kind. Does somebody perhaps know if something in this direction is planned for the next version of CUDA (which, as I've heard, will improve its C++ support, whatever they mean by that).

所以,我的问题实际上有三个:

So, my question is actually threefold:


  1. 我的展示位置 new 重载语义正确?

  2. 任何人都有关于未来的CUDA发展的信息(让我们面对它:C ++ s * ck中的C接口)?

  3. 我如何以一致的方式进一步(还有其他API要考虑,例如,不仅有设备内存,而且一个恒定的内存存储和纹理内存)?

  1. Is my placement new overload semantically correct? Does it leak memory?
  2. Does anybody have information about future CUDA developments that go in this general direction (let's face it: C interfaces in C++ s*ck)?
  3. How can I take this further in a consistent manner (there are other APIs to consider, e.g. there's not only device memory but also a constant memory store and texture memory)?


// Singleton tag for CUDA device memory placement.
struct CudaDevice {
    static CudaDevice const& get() { return instance; }
private:
    static CudaDevice const instance;
    CudaDevice() { }
    CudaDevice(CudaDevice const&);
    CudaDevice& operator =(CudaDevice const&);
} const& cudaDevice = CudaDevice::get();

CudaDevice const CudaDevice::instance;

inline void* operator new [](std::size_t nbytes, CudaDevice const&) {
    void* ret;
    cudaMalloc(&ret, nbytes);
    return ret;
}

inline void operator delete [](void* p, CudaDevice const&) throw() {
    cudaFree(p);
}

template <typename T>
class CudaArray {
public:
    explicit
    CudaArray(std::size_t size) : size(size), data(new (cudaDevice) T[size]) { }

    operator T* () { return data; }

    ~CudaArray() {
        operator delete [](data, cudaDevice);
    }

private:
    std::size_t const size;
    T* const data;

    CudaArray(CudaArray const&);
    CudaArray& operator =(CudaArray const&);
};

关于这里使用的单例:是的,我知道它的缺点。然而,这些在这方面是不相关的。我所需要的是一个小型的标签,不可复制。其他一切(即多线程考虑,初始化时间)不适用。

About the singleton employed here: Yes, I'm aware of its drawbacks. However, these aren't relevant in this context. All I needed here was a small type tag that wasn't copyable. Everything else (i.e. multithreading considerations, time of initialization) don't apply.

推荐答案

然后我将定义一个符合std :: allocator<>接口的类。理论上,您可以将此类作为模板参数传递到std :: vector<>和std :: map<>等。

I would go with the placement new approach. Then I would define a class that conforms to the std::allocator<> interface. In theory, you could pass this class as a template parameter into std::vector<> and std::map<> and so forth.

做这样的事情充满困难,但至少你会学到更多关于STL这种方式。而且您不需要重新创建容器和算法。

Beware, I have heard that doing such things is fraught with difficulty, but at least you will learn a lot more about the STL this way. And you do not need to re-invent your containers and algorithms.

这篇关于CUDA:在C ++中包装设备内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆