在CUDA中混合使用自定义内存管理和Thrust [英] Mix custom memory management and Thrust in CUDA

查看:274
本文介绍了在CUDA中混合使用自定义内存管理和Thrust的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的项目中,我实现了一个自定义内存分配器,以避免在应用程序预热"后不必要地调用cudaMalloc.而且,我使用自定义内核进行基本的数组填充,数组之间的算术运算等,并且想通过使用Thrust并摆脱这些内核来简化我的代码.设备上的每个数组都是通过原始指针创建和访问的(目前),我想在这些对象上使用device_vectorThrust的方法,但是我发现自己在原始指针和device_ptr<>之间进行转换时间,使我的代码有些混乱.

In my project, I have implemented a custom memory allocator to avoid unneccessary calls to cudaMalloc once the application has "warmed up". Moreover, I use custom kernels for basic array filling, arithmetic operations between arrays, etc. and would like to simplify my code by using Thrust and getting rid of these kernels. Every array on the device is created and accessed through raw pointers (for now) and I'd like to use device_vector and Thrusts methods on these objects, but I find myself converting between raw pointers and device_ptr<> all the time, somewhat cluttering up my code.

我的模棱两可的问题:您如何/将以最易读的方式组织自定义内存管理,Thrust数组方法和对自定义内核的调用的使用?

My rather vague question: How would/do you organize the usage of custom memory management, Thrusts array methods and calls to custom kernels in the most readable way?

推荐答案

与所有标准c ++容器一样,您可以通过提供自己的.默认情况下,thrust::device_vector的分配器是thrust::device_malloc_allocator,当Thrust的后端系统是CUDA时,它将使用cudaMalloc(cudaFree)分配(取消分配)存储.

Like all standard c++ containers, you can customize how thrust::device_vector allocates storage by providing it with your own "allocator". By default, thrust::device_vector's allocator is thrust::device_malloc_allocator, which allocates (deallocates) storage with cudaMalloc (cudaFree) when Thrust's backend system is CUDA.

有时,希望自定义device_vector分配内存的方式,例如在OP的情况下,OP希望在程序初始化时执行的单个大分配内对存储进行子分配.这样可以避免开销,这可能是由许多单独调用基础分配方案(在这种情况下为cudaMalloc)引起的.

Occasionally, it is desirable to customize the way device_vector allocates memory, such as in the OP's case, who would like to sub-allocate storage within a single large allocation performed at program initialization. This can avoid overhead which may be incurred by many individual calls to the underlying allocation scheme, in this case, cudaMalloc.

提供device_vector自定义分配器的一种简单方法是从device_malloc_allocator继承.原则上可以从头开始编写整个分配器,但是使用继承方法,只需提供allocatedeallocate成员函数.定义自定义分配器后,可以将其作为第二个模板参数提供给device_vector.

A simple way to provide device_vector a custom allocator is to inherit from device_malloc_allocator. One could in principle author an entire allocator from scratch, but with an inheritance approach, only the allocate and deallocate member functions need to be provided. Once the custom allocator is defined, it can be provided to device_vector as its second template parameter.

此示例代码演示了如何提供一个自定义分配器,该分配器在分配和取消分配时打印一条消息:

This example code demonstrates how to provide a custom allocator which prints a message upon allocation and deallocation:

#include <thrust/device_malloc_allocator.h>
#include <thrust/device_vector.h>
#include <iostream>

template<typename T>
  struct my_allocator : thrust::device_malloc_allocator<T>
{
  // shorthand for the name of the base class
  typedef thrust::device_malloc_allocator<T> super_t;

  // get access to some of the base class's typedefs

  // note that because we inherited from device_malloc_allocator,
  // pointer is actually thrust::device_ptr<T>
  typedef typename super_t::pointer   pointer;

  typedef typename super_t::size_type size_type;

  // customize allocate
  pointer allocate(size_type n)
  {
    std::cout << "my_allocator::allocate(): Hello, world!" << std::endl;

    // defer to the base class to allocate storage for n elements of type T
    // in practice, you'd do something more interesting here
    return super_t::allocate(n);
  }

  // customize deallocate
  void deallocate(pointer p, size_type n)
  {
    std::cout << "my_allocator::deallocate(): Hello, world!" << std::endl;

    // defer to the base class to deallocate n elements of type T at address p
    // in practice, you'd do something more interesting here
    super_t::deallocate(p,n);
  }
};

int main()
{
  // create a device_vector which uses my_allocator
  thrust::device_vector<int, my_allocator<int> > vec;

  // create 10 ints
  vec.resize(10, 13);

  return 0;
}

这是输出:

$ nvcc my_allocator_test.cu -arch=sm_20 -run
my_allocator::allocate(): Hello, world!
my_allocator::deallocate(): Hello, world!

在此示例中,请注意,我们曾经在vec.resize(10,13)上听到过my_allocator::allocate()的声音.当vec由于破坏其元素而超出范围时,将调用my_allocator::deallocate().

In this example, note that we hear from my_allocator::allocate() once upon vec.resize(10,13). my_allocator::deallocate() is invoked once when vec goes out of scope as it destroys its elements.

这篇关于在CUDA中混合使用自定义内存管理和Thrust的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆