动态地分配__device / global__ CUDA内核中的内存 [英] Dynamically allocating memory inside __device/global__ CUDA kernel

查看:267
本文介绍了动态地分配__device / global__ CUDA内核中的内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据<一href=\"http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf\"相对=nofollow> CUDA编程指南,第122页,有可能一个设备/全局函数中动态分配内存,只要我们使用计算架构2.x版本

According to the CUDA Programming Guide , Page 122, it is possible to dynamically allocate memory inside a device/global function so long as we're using compute architecture 2.x.

我的问题是,当我尝试此我得到的命令行消息:

My problem is that when I attempt this I get the command line message:

命令命令的一些-gen code = ARCH = compute_10,code = \\sm_10,compute_10 \\-gen code = ARCH = compute_20,code = \\ sm_20,compute_20 \\等等...

The command "some command" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" etc...

这是紧随其后的错误说,你不能调用从的设备/全局的功能的主机功能(malloc的)。

This is followed by an error saying that you cannot call a host function (malloc) from a device/global function.

以上消息显示,它正试图在计算1.x的编译我使用VS2010,并有code一代设置为compute_20,sm_20中的CUDA C / C ++属性页,所以我不知道为什么它仍然试图在计算1.x的编译我肯定使用支持2.x的一张牌任何想法?

The above message is showing that it is attempting to compile under compute 1.x. I am using VS2010 and have "Code Generation" set to "compute_20,sm_20" in the "CUDA C/C++" property page, so I am not sure why it is still trying to compile under compute 1.x. I am definitely using a card that supports 2.x. Any ideas?

推荐答案

您应该能看到NVCC命令行输出。事实上,我认为你所有的-gen code /粘贴等该位。在它的的命令行。因此,这也是证明你是编译code为sm_10和sm_20,这就是为什么,当你调用malloc得到错误。

You should be able to see the nvcc command line in the output. In fact, I think that bit you pasted with all the -gencode/etc. in it is your command line. Therefore, it is also proof that you are compiling the code for both sm_10 and sm_20, which is why you get the error when you call malloc.

您可以通过包装调用与#如果__CUDA_ARCH__&GT对malloc确认;。= 200 ,看看错误消失

You can confirm by wrapping the calls to malloc with #if __CUDA_ARCH__ >= 200 and see if the error goes away.

我猜你设置的属性来编译在您的项目文件.CU默认属性sm_20,但是的之后的添加的文件.CU到项目。当文件被添加到项目时,默认被大概设定为sm_10和sm_20(这对于以.rules文件默认值)。如果文件本身右键单击您可能会看到sm_20被选中。只是一种预感。

I'm guessing that you set the properties to compile for sm_20 in the default properties for .cu files in your project, but after you added the .cu file to the project. When the file was added to the project, the defaults were probably set to sm_10 and sm_20 (which is the default for the .rules file). If you right-click on the file itself you might see that sm_20 is checked. Just a hunch.

这篇关于动态地分配__device / global__ CUDA内核中的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆