CUDA 中 Malloc 函数的效率 [英] Efficiency of Malloc function in CUDA

查看：37 发布时间：2022/1/10 15:58:13 cuda malloc

本文介绍了CUDA 中 Malloc 函数的效率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将一些 CPU 代码移植到 CUDA.我的 CUDA 卡是基于 Fermi 架构的，因此我可以使用设备中的 malloc() 函数来动态分配内存，并且不需要对原始代码进行大量更改.(malloc() 函数在我的代码中被多次调用.)我的问题是这个 malloc 函数是否足够高效，或者我们应该尽可能避免使用它.我在 CUDA 上运行我的代码并没有得到太多的加速，我怀疑这是由于使用了 malloc() 函数造成的.

I am trying to port some CPU codes into CUDA. My CUDA card is based on Fermi architecture, and therefore I can use the malloc() function in the device to dynamically allocate memory and don't need to change the original codes a lot. (The malloc() function is called many times in my codes.) My question is if this malloc function is efficient enough, or we should avoid to use it if possible. I don't get much speedup running my codes on CUDA, and I doubt this is caused by the use of malloc() function.

如果您有任何建议或意见，请告诉我.感谢您的帮助.

Please let me know if you have any suggestion or comment. I appreciate your help.

推荐答案

目前的设备malloc实现很慢(已经发表过关于高效CUDA动态内存分配的论文，但是那个工作还没有出现在发布工具包中，据我所知).它分配的内存来自堆，是存储全局内存，而且速度也很慢.除非您有非常令人信服的理由这样做，否则我建议避免在内核动态内存分配中.这将对整体性能产生负面影响.它是否真的对你的代码有很大影响是一个完全独立的问题.

The current device malloc implementation is very slow (there has been papers published about efficient CUDA dynamic memory allocation, but that work has not yet appeared in a release toolkit, AFAIK). The memory it allocates comes from heap, which is stored global memory, and it is also very slow. Unless you have a very compelling reason to do so, I would recommend avoiding in kernel dynamic memory allocation. It will have a negative effect on overall performance. Whether it is actually have much effect on your code is a completely separate question.

这篇关于CUDA 中 Malloc 函数的效率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CUDA 中 Malloc 函数的效率 [英] Efficiency of Malloc function in CUDA

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

CUDA 中 Malloc 函数的效率 [英] Efficiency of Malloc function in CUDA

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭