CUDA 5.0命名空间用于常量内存变量使用 [英] CUDA 5.0 namespaces for constant memory variable usage
问题描述
在我的程序中,我想使用一个包含常量变量的结构,并保存在设备上,只要程序执行完成。
In my program I want to use a structure containing constant variables and keep it on device all long as the program executes to completion.
我有几个头文件全局函数及其各自的.cu文件的声明。我保持这个方案,因为它帮助我在一个地方包含类似的代码。例如要完成 KERNEL_1 所需的所有设备功能与完成 KERNEL_2 '以及内核定义。
I have several header files containing the declaration of 'global' functions and their respective '.cu' files for their definitions. I kept this scheme because it helps me contain similar code in one place. e.g. all the 'device' functions required to complete 'KERNEL_1' are separated from those 'device' functions required to complete 'KERNEL_2' along with kernels definitions.
我在编译和链接过程中没有遇到这个问题。直到我遇到常量变量。我想通过所有内核和设备函数使用相同的常量变量,但它似乎不工作。
I had no problems with this scheme during compilation and linking. Until I encountered constant variables. I want to use the same constant variable through all kernels and device functions but it doesn't seem to work.
##########################################################################
CODE EXAMPLE
###########################################################################
filename: 'common.h'
--------------------------------------------------------------------------
typedef struct {
double height;
double weight;
int age;
} __CONSTANTS;
__constant__ __CONSTANTS d_const;
---------------------------------------------------------------------------
filename: main.cu
---------------------------------------------------------------------------
#include "common.h"
#include "gpukernels.h"
int main(int argc, char **argv) {
__CONSTANTS T;
T.height = 1.79;
T.weight = 73.2;
T.age = 26;
cudaMemcpyToSymbol(d_const, &T, sizeof(__CONSTANTS));
test_kernel <<< 1, 16 >>>();
cudaDeviceSynchronize();
}
---------------------------------------------------------------------------
filename: gpukernels.h
---------------------------------------------------------------------------
__global__ void test_kernel();
---------------------------------------------------------------------------
filename: gpukernels.cu
---------------------------------------------------------------------------
#include <stdio.h>
#include "gpukernels.h"
#include "common.h"
__global__ void test_kernel() {
printf("Id: %d, height: %f, weight: %f\n", threadIdx.x, d_const.height, d_const.weight);
}
当我执行此代码时,内核执行,显示线程ID,常数值显示为零。如何解决此问题?
When I execute this code, the kernel executes, displays the thread ids, but the constant values are displayed as zeros. How can I fix this?
filename: gpukernels.h
----------------------------------------------------------------------
__global__ void test_kernel();
----------------------------------------------------------------------
filename: gpukernels.cu
----------------------------------------------------------------------
#include <stdio.h>
#include "common.h"
#include "gpukernels.h"
extern "C" __constant__ __CONSTANTS d_const;
__global__ void test_kernel() {
printf("Id: %d, Height: %f, Weight: %f\n", threadIdx.x, d_const.height, d_const.weight);
}
----------------------------------------------------------------------
filename: common.h
----------------------------------------------------------------------
typedef struct {
double height;
double weight;
int age;
} __CONSTANTS;
----------------------------------------------------------------------
filename: main.cu
----------------------------------------------------------------------
#include "common.h"
#include "gpukernels.h"
__constant__ __CONSTANTS d_const;
int main(int argc, char **argv) {
__CONSTANTS T;
T.height = 1.79;
T.weight = 73.2;
T.age = 26;
cudaMemcpyToSymbol(d_const, &T, sizeof(__CONSTANTS));
test_kernel <<< 1, 16 >>> ();
cudaDeviceSynchronize();
return 0;
}
按照建议,我试过代码,仍然不工作。
So as suggested, I tried the code, still doesn't work. Did I miss something here?
推荐答案
下面,我报告为我工作的解决方案。请记住,您使用单独的编译,所以不要忘记使用生成可重定位设备代码( -rdc = true
选项)。
Below, I report the solution which is working for me. Remember that you are using separate compilation, so do not forget to use Generate Relocatable Device Code (-rdc=true
option).
FILE main.cu
#include <cuda.h>
#include <cuda_runtime.h>
typedef struct {
double height;
double weight;
int age;
} __CONSTANTS;
__constant__ __CONSTANTS d_const;
__global__ void test_kernel();
#include <conio.h>
int main(int argc, char **argv) {
__CONSTANTS T;
T.height = 1.79;
T.weight = 73.2;
T.age = 26;
cudaMemcpyToSymbol(d_const, &T, sizeof(__CONSTANTS));
test_kernel <<< 1, 16 >>>();
cudaDeviceSynchronize();
getch();
return 0;
}
FILE kernel.cu b
$ b
FILE kernel.cu
#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>
typedef struct {
double height;
double weight;
int age;
} __CONSTANTS;
extern __constant__ __CONSTANTS d_const;
__global__ void test_kernel() {
printf("Id: %d, height: %f, weight: %f\n", threadIdx.x, d_const.height, d_const.weight);
}
这篇关于CUDA 5.0命名空间用于常量内存变量使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!