在CUDA中混合自定义内存管理和推力 [英] Mix custom memory managment and Thrust in CUDA

查看:290
本文介绍了在CUDA中混合自定义内存管理和推力的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的项目中,我已经实现了一个自定义内存分配器,以避免不必要的调用 cudaMalloc 一旦应用程序已预热。此外,我使用自定义内核进行基本数组填充,数组之间的算术运算等,并希望通过使用 Thrust 简化我的代码,并删除这些内核。设备上的每个阵列都是通过原始指针(现在)创建和访问的,我想使用 device_vector Thrust s方法对这些对象,但我发现自己转换之间的原始指针和 device_ptr<> 所有的时间,有点凌乱我的代码。



我相当含糊的问题:如何/你组织使用自定义内存管理, Thrust 的数组方法和调用自定义内核最可读的方式?

解决方案

像所有标准的c ++容器一样,你可以自定义如何 thrust :: device_vector 提供您自己的allocator。默认情况下, thrust :: device_vector 的分配器是 thrust :: device_malloc_allocator ,它分配当Thrust的后端系统是CUDA时,code> cudaMalloc ( cudaFree )。



device_vector 分配内存的方式,例如在OP的情况下,谁希望在程序执行的单个大分配中对存储进行子分配初始化。这可以避免由于对基本分配方案的许多个人调用可能引起的开销,在这种情况下, cudaMalloc



<提供 device_vector 一个自定义分配器的简单方法是继承 device_malloc_allocator 。原则上可以从头开始创建整个分配器,但是使用继承方法,只有分配 deallocate 成员函数需要提供。一旦自定义分配器被定义,它可以提供给 device_vector 作为它的第二个模板参数。



演示如何提供自定义分配器,在分配和释放时打印消息:

  #include< thrust / device_malloc_allocator.h& 
#include< thrust / device_vector.h>
#include< iostream>

template< typename T>
struct my_allocator:thrust :: device_malloc_allocator< T>
{
//基类名称的简写
typedef thrust :: device_malloc_allocator< T> super_t;

//访问一些基类的typedefs

//注意,因为我们继承了device_malloc_allocator,
//指针实际上是thrust :: device_ptr< ; T>
typedef typename super_t :: pointer pointer;

typedef typename super_t :: size_type size_type;

//自定义分配
指针allocate(size_type n)
{
std :: cout< my_allocator :: allocate():Hello,world! << std :: endl;

//延伸到基类为T
//的n个元素分配存储在实践中,你会做一些更有趣的事情
return super_t :: allocate (n)。
}

//自定义deallocate
void deallocate(pointer p,size_type n)
{
std :: cout< my_allocator :: deallocate():Hello,world! << std :: endl;

//推迟到基类中释放地址为p的t类型的n个元素
//实际上,你会做一些更有趣的事情
super_t :: deallocate (p,n)。
}
};

int main()
{
//创建一个使用my_allocator的设备向量
thrust :: device_vector< int,my_allocator< int> > vec;

//创建10个ints
vec.resize(10,13);

return 0;
}

这是输出:

  $ nvcc my_allocator_test.cu -arch = sm_20 -run 
my_allocator :: allocate():Hello,world!
my_allocator :: deallocate():Hello,world!

在这个例子中,注意我们听到 my_allocator :: allocate 一次 vec.resize(10,13) vec 超出范围时,会调用一次<_ c $ c> my_allocator :: deallocate() >

In my project, I have implemented a custom memory allocator to avoid unneccessary calls to cudaMalloc once the application has "warmed up". Moreover, I use custom kernels for basic array filling, arithmetic operations between arrays, etc. and would like to simplify my code by using Thrust and getting rid of these kernels. Every array on the device is created and accessed through raw pointers (for now) and I'd like to use device_vector and Thrusts methods on these objects, but I find myself converting between raw pointers and device_ptr<> all the time, somewhat cluttering up my code.

My rather vague question: How would/do you organize the usage of custom memory management, Thrusts array methods and calls to custom kernels in the most readable way?

解决方案

Like all standard c++ containers, you can customize how thrust::device_vector allocates storage by providing it with your own "allocator". By default, thrust::device_vector's allocator is thrust::device_malloc_allocator, which allocates (deallocates) storage with cudaMalloc (cudaFree) when Thrust's backend system is CUDA.

Occasionally, it is desirable to customize the way device_vector allocates memory, such as in the OP's case, who would like to sub-allocate storage within a single large allocation performed at program initialization. This can avoid overhead which may be incurred by many individual calls to the underlying allocation scheme, in this case, cudaMalloc.

A simple way to provide device_vector a custom allocator is to inherit from device_malloc_allocator. One could in principle author an entire allocator from scratch, but with an inheritance approach, only the allocate and deallocate member functions need to be provided. Once the custom allocator is defined, it can be provided to device_vector as its second template parameter.

This example code demonstrates how to provide a custom allocator which prints a message upon allocation and deallocation:

#include <thrust/device_malloc_allocator.h>
#include <thrust/device_vector.h>
#include <iostream>

template<typename T>
  struct my_allocator : thrust::device_malloc_allocator<T>
{
  // shorthand for the name of the base class
  typedef thrust::device_malloc_allocator<T> super_t;

  // get access to some of the base class's typedefs

  // note that because we inherited from device_malloc_allocator,
  // pointer is actually thrust::device_ptr<T>
  typedef typename super_t::pointer   pointer;

  typedef typename super_t::size_type size_type;

  // customize allocate
  pointer allocate(size_type n)
  {
    std::cout << "my_allocator::allocate(): Hello, world!" << std::endl;

    // defer to the base class to allocate storage for n elements of type T
    // in practice, you'd do something more interesting here
    return super_t::allocate(n);
  }

  // customize deallocate
  void deallocate(pointer p, size_type n)
  {
    std::cout << "my_allocator::deallocate(): Hello, world!" << std::endl;

    // defer to the base class to deallocate n elements of type T at address p
    // in practice, you'd do something more interesting here
    super_t::deallocate(p,n);
  }
};

int main()
{
  // create a device_vector which uses my_allocator
  thrust::device_vector<int, my_allocator<int> > vec;

  // create 10 ints
  vec.resize(10, 13);

  return 0;
}

Here's the output:

$ nvcc my_allocator_test.cu -arch=sm_20 -run
my_allocator::allocate(): Hello, world!
my_allocator::deallocate(): Hello, world!

In this example, note that we hear from my_allocator::allocate() once upon vec.resize(10,13). my_allocator::deallocate() is invoked once when vec goes out of scope as it destroys its elements.

这篇关于在CUDA中混合自定义内存管理和推力的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆