如何使用CUDA常量内存在程序员愉快的方式? [英] How to use CUDA constant memory in a programmer pleasant way?

查看:179
本文介绍了如何使用CUDA常量内存在程序员愉快的方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用CUDA框架开发一个数字捣蛋应用程序。我有一些静态数据,所有线程都应该可以访问,所以我把它在常量记忆如下:

I'm working on a number crunching app using the CUDA framework. I have some static data that should be accessible to all threads, so I've put it in constant memory like this:

__device__ __constant__ CaseParams deviceCaseParams;

我使用调用cudaMemcpyToSymbol将这些参数从主机传输到设备:

I use the call cudaMemcpyToSymbol to transfer these params from the host to the device:

void copyMetaData(CaseParams* caseParams)
{
    cudaMemcpyToSymbol("deviceCaseParams", caseParams, sizeof(CaseParams));
}

无论如何,似乎(通过试验和错误,也从阅读在网上的职位),由于一些生病的原因,deviceCaseParams的声明和它的复制操作(调用cudaMemcpyToSymbol)必须在相同文件。目前我有一个.cu文件中的这两个,但我真的想有一个.cuh文件中的参数struct,以便任何实现可以看到它,如果它想要。这意味着我还必须在头文件中使用copyMetaData函数,但是这会弄乱链接(符号已经定义),因为.cpp和.cu文件都包含这个头(因此MS C ++编译器和nvcc编译它)

Anyways, it seems (by trial and error, and also from reading posts on the net) that for some sick reason, the declaration of deviceCaseParams and the copy operation of it (the call to cudaMemcpyToSymbol) must be in the same file. At the moment I have these two in a .cu file, but I really want to have the parameter struct in a .cuh file so that any implementation could see it if it wants to. That means that I also have to have the copyMetaData function in the a header file, but this messes up linking (symbol already defined) since both .cpp and .cu files include this header (and thus both the MS C++ compiler and nvcc compiles it).

有没有人在设计方面有任何建议?

Does anyone have any advice on design here?

更新:查看评论

推荐答案

使用最新的CUDA(例如3.2),您应该能够从内部执行memcpy不同的翻译单元,如果你在运行时查找符号(即通过将一个字符串作为第一个参数传递给 cudaMemcpyToSymbol ,就像你的例子)。

With an up-to-date CUDA (e.g. 3.2) you should be able to do the memcpy from within a different translation unit if you're looking up the symbol at runtime (i.e. by passing a string as the first arg to cudaMemcpyToSymbol as you are in your example).

此外,对于Fermi类设备,您只能malloc内存( cudaMalloc ),复制到设备内存,然后传递参数作为const指针。编译器将识别您是否在整个warp中均匀地访问数据,如果是,将使用常量缓存。有关详细信息,请参阅CUDA编程指南。注意:您需要使用 -arch = sm_20 编译。

Also, with Fermi-class devices you can just malloc the memory (cudaMalloc), copy to the device memory, and then pass the argument as a const pointer. The compiler will recognise if you are accessing the data uniformly across the warps and if so will use the constant cache. See the CUDA Programming Guide for more info. Note: you would need to compile with -arch=sm_20.

这篇关于如何使用CUDA常量内存在程序员愉快的方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆