CUDA中Malloc功能的效率 [英] Efficiency of Malloc function in CUDA

查看:164
本文介绍了CUDA中Malloc功能的效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试将一些CPU代码移植到CUDA中。我的CUDA卡是基于费米架构,因此我可以使用malloc()函数在设备中动态分配内存,不需要改变原来的代码很多。 (malloc()函数在我的代码中被多次调用。)我的问题是如果这个malloc函数足够高效,或者我们应该避免使用它。我不会加快运行我的代码在CUDA上,我怀疑这是由使用malloc()函数。

I am trying to port some CPU codes into CUDA. My CUDA card is based on Fermi architecture, and therefore I can use the malloc() function in the device to dynamically allocate memory and don't need to change the original codes a lot. (The malloc() function is called many times in my codes.) My question is if this malloc function is efficient enough, or we should avoid to use it if possible. I don't get much speedup running my codes on CUDA, and I doubt this is caused by the use of malloc() function.

请让我知道如果你有任何建议或评论。非常感谢您的帮助。

Please let me know if you have any suggestion or comment. I appreciate your help.

推荐答案

目前的设备malloc实现非常慢(已经发表了有关CUDA动态内存分配,但该工作尚未出现在发布工具包AFAIK中)。它分配的内存来自堆,这是存储全局内存,它也很慢。除非你有一个非常令人信服的理由这样做,我建议避免在内核动态内存分配。这将对总体性能产生负面影响。它是否真的对你的代码有很大的影响是一个完全独立的问题。

The current device malloc implementation is very slow (there has been papers published about efficient CUDA dynamic memory allocation, but that work has not yet appeared in a release toolkit, AFAIK). The memory it allocates comes from heap, which is stored global memory, and it is also very slow. Unless you have a very compelling reason to do so, I would recommend avoiding in kernel dynamic memory allocation. It will have a negative effect on overall performance. Whether it is actually have much effect on your code is a completely separate question.

这篇关于CUDA中Malloc功能的效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆