CUDA 5.0命名空间用于常量内存变量使用 [英] CUDA 5.0 namespaces for constant memory variable usage

查看:128
本文介绍了CUDA 5.0命名空间用于常量内存变量使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的程序中,我想使用一个包含常量变量的结构,并保存在设备上,只要程序执行完成。

In my program I want to use a structure containing constant variables and keep it on device all long as the program executes to completion.

我有几个头文件全局函数及其各自的.cu文件的声明。我保持这个方案,因为它帮助我在一个地方包含类似的代码。例如要完成 KERNEL_1 所需的所有设备功能与完成 KERNEL_2 '以及内核定义。

I have several header files containing the declaration of 'global' functions and their respective '.cu' files for their definitions. I kept this scheme because it helps me contain similar code in one place. e.g. all the 'device' functions required to complete 'KERNEL_1' are separated from those 'device' functions required to complete 'KERNEL_2' along with kernels definitions.

我在编译和链接过程中没有遇到这个问题。直到我遇到常量变量。我想通过所有内核和设备函数使用相同的常量变量,但它似乎不工作。

I had no problems with this scheme during compilation and linking. Until I encountered constant variables. I want to use the same constant variable through all kernels and device functions but it doesn't seem to work.

##########################################################################
                                CODE EXAMPLE
###########################################################################
filename: 'common.h'
--------------------------------------------------------------------------
typedef struct {
    double height;
    double weight;
    int age;
} __CONSTANTS;

__constant__ __CONSTANTS d_const;

---------------------------------------------------------------------------
filename: main.cu
---------------------------------------------------------------------------
#include "common.h"
#include "gpukernels.h"
int main(int argc, char **argv) {

    __CONSTANTS T;
    T.height   = 1.79;
    T.weight   = 73.2;
    T.age      = 26;

    cudaMemcpyToSymbol(d_const, &T, sizeof(__CONSTANTS));
    test_kernel <<< 1, 16 >>>();
    cudaDeviceSynchronize();
}

---------------------------------------------------------------------------
filename: gpukernels.h
---------------------------------------------------------------------------
__global__ void test_kernel();

---------------------------------------------------------------------------
filename: gpukernels.cu
---------------------------------------------------------------------------
#include <stdio.h>
#include "gpukernels.h"
#include "common.h"

__global__ void test_kernel() {
    printf("Id: %d, height: %f, weight: %f\n", threadIdx.x, d_const.height, d_const.weight);
}



当我执行此代码时,内核执行,显示线程ID,常数值显示为零。如何解决此问题?

When I execute this code, the kernel executes, displays the thread ids, but the constant values are displayed as zeros. How can I fix this?

filename: gpukernels.h
----------------------------------------------------------------------

__global__ void test_kernel();

----------------------------------------------------------------------
filename: gpukernels.cu
----------------------------------------------------------------------

#include <stdio.h>
#include "common.h"
#include "gpukernels.h"

extern "C" __constant__ __CONSTANTS d_const;

__global__ void test_kernel() {
    printf("Id: %d, Height: %f, Weight: %f\n", threadIdx.x, d_const.height, d_const.weight);
}

----------------------------------------------------------------------
filename: common.h
----------------------------------------------------------------------

typedef struct {
    double height;
    double weight;
    int age;
} __CONSTANTS;

----------------------------------------------------------------------
filename: main.cu
----------------------------------------------------------------------
#include "common.h"
#include "gpukernels.h"

__constant__ __CONSTANTS d_const;

int main(int argc, char **argv) {

    __CONSTANTS T;
    T.height = 1.79;
    T.weight = 73.2;
    T.age    = 26;

    cudaMemcpyToSymbol(d_const, &T, sizeof(__CONSTANTS));
    test_kernel <<< 1, 16 >>> ();
    cudaDeviceSynchronize();

    return 0;
}



按照建议,我试过代码,仍然不工作。

So as suggested, I tried the code, still doesn't work. Did I miss something here?

推荐答案

下面,我报告为我工作的解决方案。请记住,您使用单独的编译,所以不要忘记使用生成可重定位设备代码( -rdc = true 选项)。

Below, I report the solution which is working for me. Remember that you are using separate compilation, so do not forget to use Generate Relocatable Device Code (-rdc=true option).

FILE main.cu

#include <cuda.h>
#include <cuda_runtime.h>

typedef struct {
    double height;
    double weight;
    int age;
} __CONSTANTS;

__constant__ __CONSTANTS d_const;

__global__ void test_kernel();

#include <conio.h>
int main(int argc, char **argv) {

    __CONSTANTS T;
    T.height   = 1.79;
    T.weight   = 73.2;
    T.age      = 26;

    cudaMemcpyToSymbol(d_const, &T, sizeof(__CONSTANTS));
    test_kernel <<< 1, 16 >>>();
    cudaDeviceSynchronize();

    getch();
    return 0;
}

FILE kernel.cu b
$ b

FILE kernel.cu

#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>

typedef struct {
    double height;
    double weight;
    int age;
} __CONSTANTS;

extern __constant__ __CONSTANTS d_const;

__global__ void test_kernel() {
    printf("Id: %d, height: %f, weight: %f\n", threadIdx.x, d_const.height, d_const.weight);
}

这篇关于CUDA 5.0命名空间用于常量内存变量使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆