CUDA常量内存值不正确 [英] CUDA constant memory value not correct

查看:400
本文介绍了CUDA常量内存值不正确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读许多与常数记忆相关的SO问题,我仍然不明白为什么我的程序不工作。总体如下

I have been reading through many of the SO questions related to constant memory and I still don't understand why my program is not working. Overall it looks like follows

Common.cuh

__constant__ int numElements;

__global__
void kernelFunction();

Common.cu

#include "Common.cuh"
#include <stdio.h>

__global__
kernelFunction()
{
   printf("NumElements = %d", numElements);
}

Test.cu

#include "Common.cuh"

int main()
{
   int N = 100;
   cudaMemcpyToSymbol(numElements,&N,sizeof(int));
   kernelFunction<<<1,1>>>();
   cudaDeviceSynchronize();
   return 0;
}

编译时没有错误,但打印 numElements 我只是得到一个随机值。

It compiles with no error but when printing the value of numElements I just get a random value. Can someone point me in the right direction to get to understand this?

推荐答案

这一行:

__constant__ int numElements;

编译单元范围。这意味着如果将它编译到一个模块中,并且也编译到另一个模块中,这两个模块将在 __ constant __ numElements $ c>内存。

has compilation unit scope. That means if you compile it into one module, and also into another module, the two modules will have different instantiations of numElements in __constant__ memory.

解决方案是使用单独的编译和链接,以将两个模块设备链接在一起,此时,将由设备链接器在两个模块之间解析该符号。

The solution is to use separate compilation and linking, to device-link the two modules together, at which point the symbol will be resolved between the two modules by the device linker.

nvcc -arch=sm_20 -rdc=true -o test common.cu test.cu

例如:

$ cat common.cuh
#ifndef COMMON_CU
extern __constant__ int numElements;
#endif
__global__
void kernelFunction();
$ cat common.cu
#define COMMON_CU
#include "common.cuh"
#include <stdio.h>

__constant__ int numElements;
__global__
void kernelFunction()
{
   printf("NumElements = %d\n", numElements);
}
$ cat test.cu
#define TEST_CU
#include "common.cuh"

int main()
{
   int N = 100;
   cudaMemcpyToSymbol(numElements,&N,sizeof(int));
   kernelFunction<<<1,1>>>();
   cudaDeviceSynchronize();
   return 0;
}

$ nvcc -arch=sm_20 -rdc=true -o test common.cu test.cu
$ ./test
NumElements = 100
$

这篇关于CUDA常量内存值不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆