Tensorflow新的Op CUDA内核内存管理 [英] Tensorflow new Op CUDA kernel memory management

查看：184 发布时间：2021/5/13 18:41:24 tensorflow gpu

本文介绍了Tensorflow新的Op CUDA内核内存管理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经在Tensorflow中使用GPU CUDA内核实现了一个相当复杂的新Op.此Op需要大量动态内存分配的变量，这些变量不是张量，并且在操作完成后被释放，更具体地说，它涉及使用哈希表.

I am have implemented a rather complex new Op in Tensorflow with a GPU CUDA kernel. This Op requires a lot of dynamic memory allocation of variables which are not tensors and are deallocated after the op is done, more specifically it involves using a hash table.

现在我正在使用 cudaMalloc()和 cudaFree()，但是我注意到Tensorflow有自己的类型，称为 Eigen :: GPUDevice 可以在GPU上分配和取消分配内存.

Right now I am using cudaMalloc() and cudaFree() but I have noticed Tensorflow has its own type called Eigen::GPUDevice which has the ability to allocate and deallocate memory on the GPU.

我的问题:

使用 Eigen :: GPUDevice 来管理GPU内存是最佳实践；
通过使用 Eigen :: GPUDevice 而不是CUDA API，我可以自动"启用多GPU支持，因为可以将不同的 GPUDevices 传递给Op；
我应该将此概念扩展到CPU内核，看看是否有 CPUDevice 类型也可以管理内存，而不是使用C ++语法(例如， auto var = new int [100]; delete [] var )

Is it best practice to use Eigen::GPUDevice to manage GPU memory;
By using Eigen::GPUDevice instead of the CUDA API I am "automatically" enabling multi-GPU support since different GPUDevices can be passed to the Op;
Should I extend this idea to the CPU kernel and see if there is a CPUDevice type which also manages the memory instead of using C++ syntax (i.e. auto var = new int[100]; delete[] var)

推荐答案

此问题没有直接的公共指南.我通常只是让TensorFlow通过以下方式分配此信息:

The is no direct public guideline for this issue. I usually just let the TensorFlow allocate this information by

template<typename Device, typename Dtype>
class MyOp: public OpKernel {
{
public:
  explicit MyOp(OpKernelConstruction *context) :
      OpKernel(context)
  {
    // ...
  }

  void Compute(OpKernelContext *context) override
  {
    Tensor* tmp_var = nullptr;
    Tensor* output = nullptr;

    TensorShape some_shape, some_shape2;

    // temparily use this space
    OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, some_shape, &tmp_var));
    // allocate memory for output tensor
    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, some_shape2, &output));

无论需要什么内存，都应该由TensorFlow上下文分配，而不是通过自定义 cudaMalloc 或 new type [num] 调用分配.
上下文应为分配者提供信息
请参阅下文

为简单起见，请考虑添加两个矩阵(完整示例).TensorFlow-Operations通常包含以下结构:

Consider, for the sake of simplicity just adding two matrices (full example). TensorFlow-Operations usually contain the following structure:

操作说明具有 REGISTER_OP ，它负责形状检查和设置输出形状(

Op description having REGISTER_OP, which is responsible for shape-checking, and setting the output shape (example)

OpKernel 负责分配内存，获取指向输入和设置内容的指针(请参见上文或

OpKernel responsible for allocating memory, getting pointer to the inputs and setup stuff, (see above or this )

Functor 来实现，例如

Tensor* output = nullptr;
Tensor* tmp_var = nullptr;
OP_REQUIRES_OK(ctx, ctx->allocate_output(0, output_shape, &output));
OP_REQUIRES_OK(ctx, ctx->allocate_temp(0, some_shape, &tmp_var));
// the function does not need to care about the memory allocation as everything is already setup at this point
::tensorflow::functor::MyFunctor<Device, Dtype>()(ctx, inputA, inputB, tmp_var, output);

您只是通过实施而离开了

You are just left by implementing

    // gpu version
    template <typename Dtype>
    struct MyFunctor<GPUDevice, Dtype> {
      void operator ()(::tensorflow::OpKernelContext* ctx,...)

    // cpu version
    template <typename Dtype>
    struct MyFunctor<CPUDevice, Dtype> {
      void operator ()(::tensorflow::OpKernelContext* ctx,...)

修改

allocate_persistent:如果您需要在Op调用之间的数据(例如一次性索引结构)，请使用此参数.[示例]

allocate_persistent: use this if you need your data between Op invocations like one-time index structures.[example]
allocate_temp just tmp memory which will be not retained at the end of the Compute method lifetime. [example]

但我强烈建议您阅读

But I highly recommend reading the comment in the source-code here and then decided depending on your use case.

这篇关于Tensorflow新的Op CUDA内核内存管理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Tensorflow新的Op CUDA内核内存管理 [英] Tensorflow new Op CUDA kernel memory management

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Tensorflow新的Op CUDA内核内存管理 [英] Tensorflow new Op CUDA kernel memory management

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭