推力函子:“请求发射的太多资源" [英] thrust functor: "too many resources requested for launch"
问题描述
我正在尝试在CUDA中实现这样的事情:
I'm trying to implement something like this in CUDA:
每个元素
p = { p if p >= floor
z if p < floor
其中 floor
和 z
是在测试开始时配置的常量.
Where floor
and z
are constants configured at the start of the test.
我试图像这样实现它,但是出现错误请求启动的资源过多"
I have attempted to implement it like so, but I get the error "too many resources requested for launch"
函子:
struct floor_functor : thrust::unary_function <float, float>
{
const float floorLevel, floorVal;
floor_functor(float _floorLevel, float _floorVal) : floorLevel(_floorLevel), floorVal(_floorVal){}
__host__
__device__
float operator()(float& x) const
{
if (x >= floorLevel)
return x;
else
return floorVal;
}
};
由转换使用:
thrust::transform(input->begin(), input->end(), output.begin(), floor_functor(floorLevel, floorVal));
如果我删除函子的一个成员,例如 floorVal
,并且只使用一个成员变量的函子,它将正常工作.
If I remove one of the members of my functor, say floorVal
, and use a functor with only one member variable, it works fine.
有人知道为什么会这样吗,我该如何解决?
Does anyone know why this might be, and how I could fix it?
其他信息:
我的数组长786432个元素.
My array is 786432 elements long.
我的GPU是GeForce GTX590
My GPU is a GeForce GTX590
我正在使用以下命令进行构建:
I am building with the command:
`nvcc -c -g -arch sm_11 -Xcompiler -fPIC -Xcompiler -Wall -DTHRUST_DEBUG -I <my_include_dir> -o <my_output> <my_source>`
我的cuda版本是4.0:
My cuda version is 4.0:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_May_12_11:09:45_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221
我每个块的最大线程数是1024(由deviceQuery报告):
And my maximum number of threads per block is 1024 (reported by deviceQuery):
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
更新::
我偶然发现了解决我的问题的方法,但是不明白.如果我将函子从"floor_functor"重命名为其他名称,则它可以正常工作!我不知道为什么会这样,并且想听听任何人对此的想法.
I have stumbled upon a fix for my problem, but do not understand it. If I rename my functor from "floor_functor" to basically anything else, it works! I have no idea why this is the case, and would be interested to hear anyone's ideas about this.
推荐答案
对于更简单的CUDA实现,您可以使用ArrayFire在一行代码中做到这一点:
For an easier CUDA implementation, you could do this with ArrayFire in one line of code:
p(p < floor) = z;
只需将变量声明为af :: array即可.
Just declare your variables as af::array's.
祝你好运!
免责声明:我从事各种CUDA项目,包括ArrayFire.
Disclaimer: I work on all sorts of CUDA projects, including ArrayFire.
这篇关于推力函子:“请求发射的太多资源"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!