cudaMemcpyToSymbol不复制数据 [英] cudaMemcpyToSymbol do not copy data
问题描述
我想使用 __ constant __
内存,所有内存中的所有线程都可以访问。
声明是这样的
extern __constant__ float smooth [8 * 1024]
我使用
将数据复制到此变量 cudaMemcpyToSymbol(smooth,smooth_local,smooth_size,0,cudaMemcpyHostToDevice);
smooth_size = 7K字节
给我不正确的输出
但是当我在 -deviceemu
模式下运行时,试图打印这两个文件的内容变量在内核中,我得到的所有零的平滑和smooth_local是正确的。
我试着打印输出就在 cudaMemcpyToSymbol $ c
你能告诉我的问题吗?
要声明CUDA常量内存,它将如下所示:
__ constant__ float smooth [8 * 1024];注意,CUDA常量内存对于其转换单元是本地的(即,它被隐含地声明为静态的)。这是CUDA的恼人的限制之一,所以如果你需要在separete .cpp / .cu文件之间共享这些值,你将必须重新编写每个.cpp / .cu文件中的内存,它是需要的。你也将再次调用cudaMemCopyToSymbol。最后,您在整个CUDA程序中总共限制为64k的常量内存。
I want to use __constant__
memory which will be accessed by all threads across all of my kernels.
The declaration is something like this
extern __constant__ float smooth [8 * 1024];
I am copying data to this variable using
cudaMemcpyToSymbol("smooth", smooth_local, smooth_size, 0, cudaMemcpyHostToDevice);
smooth_size = 7K bytes
It was giving me incorrect output
but when I run it in -deviceemu
mode and tried to print the contents of both these variables inside the kernel, I was getting all zeroes for smooth and smooth_local was correct.
I tried printing the output just after cudaMemcpyToSymbol
still it was giving me 0's.
Can you anyone throw light on my problem?
解决方案 To declare CUDA constant memory, it would look like this:
__constant__ float smooth[8 * 1024];
Note that CUDA constant memory is local to its translation unit (i.e. it is implicitly declared static). This is one of the annoying limitations of CUDA so if you need to share these values between separete .cpp/.cu files, you will have to redeclare the memory in each .cpp/.cu file it is needed in. You will also have to call cudaMemCopyToSymbol again. And finally, you are limited to a total of 64k of constant memory throughout your entire CUDA program.
这篇关于cudaMemcpyToSymbol不复制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!