CUDA 常量内存值不正确 [英] CUDA constant memory value not correct

查看:30
本文介绍了CUDA 常量内存值不正确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读许多与持续记忆相关的 SO 问题,但我仍然不明白为什么我的程序无法正常工作.总体看起来如下

I have been reading through many of the SO questions related to constant memory and I still don't understand why my program is not working. Overall it looks like follows

Common.cuh

__constant__ int numElements;

__global__
void kernelFunction();

Common.cu

#include "Common.cuh"
#include <stdio.h>

__global__
kernelFunction()
{
   printf("NumElements = %d", numElements);
}

Test.cu

#include "Common.cuh"

int main()
{
   int N = 100;
   cudaMemcpyToSymbol(numElements,&N,sizeof(int));
   kernelFunction<<<1,1>>>();
   cudaDeviceSynchronize();
   return 0;
}

它编译没有错误,但是当打印 numElements 的值时,我只是得到一个随机值.有人可以指出我正确的方向来理解这一点吗?

It compiles with no error but when printing the value of numElements I just get a random value. Can someone point me in the right direction to get to understand this?

推荐答案

这一行:

__constant__ int numElements;

具有编译单元范围.这意味着如果你将它编译到一个模块中,也编译到另一个模块中,这两个模块将在 __constant__ 内存中具有不同的 numElements 实例化.

has compilation unit scope. That means if you compile it into one module, and also into another module, the two modules will have different instantiations of numElements in __constant__ memory.

解决方法是使用单独编译和链接,将两个模块设备链接在一起,此时设备链接器将在两个模块之间解析符号.

The solution is to use separate compilation and linking, to device-link the two modules together, at which point the symbol will be resolved between the two modules by the device linker.

nvcc -arch=sm_20 -rdc=true -o test common.cu test.cu

示例:

$ cat common.cuh
#ifndef COMMON_CU
extern __constant__ int numElements;
#endif
__global__
void kernelFunction();
$ cat common.cu
#define COMMON_CU
#include "common.cuh"
#include <stdio.h>

__constant__ int numElements;
__global__
void kernelFunction()
{
   printf("NumElements = %d
", numElements);
}
$ cat test.cu
#define TEST_CU
#include "common.cuh"

int main()
{
   int N = 100;
   cudaMemcpyToSymbol(numElements,&N,sizeof(int));
   kernelFunction<<<1,1>>>();
   cudaDeviceSynchronize();
   return 0;
}

$ nvcc -arch=sm_20 -rdc=true -o test common.cu test.cu
$ ./test
NumElements = 100
$

这篇关于CUDA 常量内存值不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆