CUDA中Malloc功能的效率 [英] Efficiency of Malloc function in CUDA

查看：164 发布时间：2017/3/4 12:21:06 cuda malloc

本文介绍了CUDA中Malloc功能的效率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试将一些CPU代码移植到CUDA中。我的CUDA卡是基于费米架构，因此我可以使用malloc（）函数在设备中动态分配内存，不需要改变原来的代码很多。（malloc（）函数在我的代码中被多次调用。）我的问题是如果这个malloc函数足够高效，或者我们应该避免使用它。我不会加快运行我的代码在CUDA上，我怀疑这是由使用malloc（）函数。

I am trying to port some CPU codes into CUDA. My CUDA card is based on Fermi architecture, and therefore I can use the malloc() function in the device to dynamically allocate memory and don't need to change the original codes a lot. (The malloc() function is called many times in my codes.) My question is if this malloc function is efficient enough, or we should avoid to use it if possible. I don't get much speedup running my codes on CUDA, and I doubt this is caused by the use of malloc() function.

请让我知道如果你有任何建议或评论。非常感谢您的帮助。

Please let me know if you have any suggestion or comment. I appreciate your help.

推荐答案

目前的设备malloc实现非常慢（已经发表了有关CUDA动态内存分配，但该工作尚未出现在发布工具包AFAIK中）。它分配的内存来自堆，这是存储全局内存，它也很慢。除非你有一个非常令人信服的理由这样做，我建议避免在内核动态内存分配。这将对总体性能产生负面影响。它是否真的对你的代码有很大的影响是一个完全独立的问题。

The current device malloc implementation is very slow (there has been papers published about efficient CUDA dynamic memory allocation, but that work has not yet appeared in a release toolkit, AFAIK). The memory it allocates comes from heap, which is stored global memory, and it is also very slow. Unless you have a very compelling reason to do so, I would recommend avoiding in kernel dynamic memory allocation. It will have a negative effect on overall performance. Whether it is actually have much effect on your code is a completely separate question.

这篇关于CUDA中Malloc功能的效率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CUDA中Malloc功能的效率 [英] Efficiency of Malloc function in CUDA

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

CUDA中Malloc功能的效率 [英] Efficiency of Malloc function in CUDA

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭