如何估计基于推力的实现的GPU内存需求？ [英] How to estimate GPU memory requirements for thrust based implementation?

查看：159 发布时间：2017/3/4 16:18:45 cuda thrust

本文介绍了如何估计基于推力的实现的GPU内存需求？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有3个不同的基于推力的实现，执行某些计算：第一是最慢，需要最少的GPU内存，第二是最快，需要最大的GPU内存，第三个是中间。对于每个人，我知道每个设备向量使用的大小和数据类型，所以我使用vector.size（）* sizeof（type）粗略估计存储需要的内存。

I have 3 different thrust-based implementations that perform certain calculations: first is the slowest and requires the least of GPU memory, second is the fastest and requires the most of GPU memory, and the third one is in-between. For each of those I know the size and data type for each device vector used so I am using vector.size()*sizeof(type) to roughly estimate the memory needed for storage.

所以对于给定的输入，基于其大小，我想决定使用哪个实现。换句话说，确定最适合的实现是在可用的GPU内存中。

So for a given input, based on its size, I would like to decide which implementation to use. In other words, determine the fastest implementation that will fit is in the available GPU memory.

我认为对于我处理的很长的向量，我计算的vector.data（）是一个相当不错的估计，其余的开销（如果有）可以忽略。

I think that for very long vectors that I am dealing with, the size of the vector.data() that I am calculating is a fairly good estimate and the rest of the overhead (if any) could be disregarded.

但是如何估计与推力算法实现相关的内存使用开销（如果有的话）？具体来说，我正在寻找这样的估计为transform，copy，reduce，reduce_by_key和gather。我不是真的关心静态的开销，并且不是算法输入和输出参数大小的函数，除非它非常重要。

But how would I estimate the memory usage overhead (if any) associated with the thrust algorithms implementation? Specifically I am looking for such estimates for transform, copy, reduce, reduce_by_key, and gather. I do not really care about the overhead that is static and is not a function of the algorithm input and output parameters sizes unless it’s very significant.

我理解GPU内存碎片等等，但让我们暂时离开。

I understand the implication of the GPU memory fragmentation, etc. but let’s leave this aside for a moment.

非常感谢您花时间来研究这个问题。

Thank you very much for taking the time to look into this.

推荐答案

Thrust旨在像黑盒一样使用，并且没有我知道的各种算法的内存开销的文档。但它并不是听起来像一个非常困难的问题，通过运行几个数值实验经验推导出来。您可能期望特定算法的内存消耗近似为：

Thrust is intended to be used like a black box and there is no documentation of the memory overheads of the various algorithms that I am aware of. But it doesn't sound like a very difficult problem to deduce it empirically by running a few numerical experiments. You might expect the memory consumption of a particular alogrithm to be approximable as:

total number of words of memory consumed = a + (1 + b)*N

N 输入字。这里 a 将是算法的固定开销， 1 + b 最佳适合存储器的斜率对 N 线。 b 是每个输入字的算法的开销量。

for a problem with N input words. Here a will be the fixed overhead of the algorithm and 1+b the slope of best fit memory versus N line. b is then the amount of overhead the algorithm per input word.

因此，问题变成如何监视给定算法的内存使用情况。 Thrust使用内部辅助函数 get_temporary_buffer 分配内部内存。最好的想法是写自己的实现 get_temporary_buffer ，它发出它被调用的大小，并且（可能）使用 cudaGetMemInfo 在调用函数时获取上下文内存统计信息。你可以看到一些具体的例子，如何拦截 get_temporary_buffer 调用 here 。

So the question then becomes how to monitor the memory usage of a given algorithm. Thrust uses an internal helper function get_temporary_buffer to allocate internal memory. The best idea would be to writeyour own implementation of get_temporary_buffer which emits the size it has been called with, and (perhaps) uses a call to cudaGetMemInfo to get context memory statistics at the time the function gets called. You can see some concrete examples of how to intercept get_temporary_buffer calls here.

使用适当的检测分配器，并且有一些运行在几个不同的问题大小，你应该能够适应模型并估计给定算法的 b 值。然后，可以在您的代码中使用该模型来确定给定内存的安全最大问题大小。

With a suitably instrumented allocator and some runs with it at a few different problem sizes, you should be able to fit the model above and estimate the b value for a given algorithm. The model can then be used in your code to determine safe maximum problem sizes for a given about of memory.

我希望这是您问的问题...

I hope this is what you were asking about...

这篇关于如何估计基于推力的实现的GPU内存需求？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何估计基于推力的实现的GPU内存需求？ [英] How to estimate GPU memory requirements for thrust based implementation?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

如何估计基于推力的实现的GPU内存需求？ [英] How to estimate GPU memory requirements for thrust based implementation?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭