在CUDA中使用全局对常数内存 [英] Usage of global vs. constant memory in CUDA

查看:209
本文介绍了在CUDA中使用全局对常数内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在那里,
我有以下代码:

Hey there, I have the following piece of code:

#if USE_CONST == 1
    __constant__ double PNT[ SIZE ];    
#else
    __device__ double *PNT;
#endif

稍后我有:

#if USE_CONST == 0
    cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
    cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
#else
    cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
#endif

是在代码中定义的某处。当使用 USE_CONST = 1 时,一切都按预期工作,但在没有它工作时,它不工作。我通过

whereas point is somewhere defined in the code before. When working with USE_CONST=1 everything works as expected, but when working without it, than it doesn't. I access the array in my kernel-function via

PNT [index]

这两个变体之间的问题在哪里?
感谢!

Where's the problem between the both variants? Thanks!

推荐答案

CUDA 4.0之前的cudaMemcpyToSymbol的正确用法是:

The correct usage of cudaMemcpyToSymbol prior to CUDA 4.0 is:

cudaMemcpyToSymbol("PNT", point, sizeof(double)*SIZE)

或者:

double *cpnt;
cudaGetSymbolAddress((void **)&cpnt, "PNT");
cudaMemcpy(cpnt, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

这可能会更快一点,如果你打算从主机API多次访问符号。

which might be a bit faster if you are planning to access the symbol from the host API more than once.

编辑:误解了这个问题。对于全局内存版本,对常量内存类似于第二个版本。

misunderstood the question. For the global memory version, do something similar to the second version for constant memory

double *gpnt;
cudaGetSymbolAddress((void **)&gpnt, "PNT");
cudaMemcpy(gpnt, point, sizeof(double)*SIZE.  cudaMemcpyHostToDevice););

这篇关于在CUDA中使用全局对常数内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆