CUDA 中的全局变量 [英] Global variable in CUDA

查看:101
本文介绍了CUDA 中的全局变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 CUDA 中创建全局变量?可以举个例子吗?

How can I create global variables in CUDA?? Could you please give me an example?

例如,如何在 CUDA 函数中创建数组

How can create arrays inside a CUDA function for example

__global__ void test()
{
  int *a = new int[10];
}

或者我怎样才能创建一个全局数组并通过这个函数访问它.例如

or How can I create a global array and access it from this function. for example

__device__ int *a;
__global__ void test()
{
  a[0] = 2;
}

或者我怎样才能像下面这样使用..

Or How can I use like the following..

__global__ void ProcessData(int img)
{
   int *neighborhood = new int[8]; 
   getNeighbourhood(img, neighbourhood);
}

<小时>

我仍然有一些问题.我发现与


Still I have some problem with this. I found that compare to

__device__

如果我定义

"__device__ __constant__" (read only)

将改善内存访问.但我的问题是我在主机内存中有一个数组说

will improve the memory access. But my problem is I have an array in host memory say

 float *arr = new float[sizeOfTheArray]; 

我想将它作为设备中的变量数组,我需要在设备内存中修改它,我需要将它复制回主机.我该怎么做??

I want to make it as a variable array in device and I need to modify this in device memory and I need to copy this back to host. How can I do it??

推荐答案

计算能力 2.0 和 2.1(即 Fermi)与 CUDA 4.0 支持 C++ new 运算符,因此您可以使用 new 将全局内存分配到设备符号上,尽管您的前两个代码片段都不是实际执行的方式.

The C++ new operator is supported on compute capability 2.0 and 2.1 (ie. Fermi) with CUDA 4.0, so you could use new to allocate global memory onto a device symbol, although neither of your first two code snippets are how it would be done in practice.

在较旧的硬件和/或 CUDA 4.0 之前的工具包上,标准方法是使用 cudaMemcpyToSymbol 主机代码中的 API:

On older hardware, and/or with pre CUDA 4.0 toolkits, the standard approach is to use the cudaMemcpyToSymbol API in host code:

__device__ float *a;

int main()
{
    const size_t sz = 10 * sizeof(float);

    float *ah;
    cudaMalloc((void **)&ah, sz);
    cudaMemcpyToSymbol("a", &ah, sizeof(float *), size_t(0),cudaMemcpyHostToDevice);
}

将动态分配的设备指针复制到可以直接在设备代码中使用的符号上.

which copies a dynamically allocated device pointer onto a symbol which can be used directly in device code.

回答这个问题有点像击中一个移动的目标.对于您现在似乎感兴趣的常量记忆案例,这是一个完整的工作示例:

Answering this question is a bit like hitting a moving target. For the constant memory case you now seem interested in, here is a complete working example:

#include <cstdio>

#define nn (10)

__constant__ float a[nn];

__global__ void kernel(float *out)
{
    if (threadIdx.x < nn)
        out[threadIdx.x] = a[threadIdx.x];

}

int main()
{
    const size_t sz = size_t(nn) * sizeof(float);
    const float avals[nn]={ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10. };
    float ah[nn];

    cudaMemcpyToSymbol("a", &avals[0], sz, size_t(0),cudaMemcpyHostToDevice);

    float *ad;
    cudaMalloc((void **)&ad, sz);

    kernel<<<dim3(1),dim3(16)>>>(ad);

    cudaMemcpy(&ah[0],ad,sz,cudaMemcpyDeviceToHost);

    for(int i=0; i<nn; i++) {
        printf("%d %f
", i, ah[i]);
    }
}

这显示了将数据复制到一个常量内存符号上,并在内核中使用该数据.

This shows copying data onto a constant memory symbol, and using that data inside a kernel.

这篇关于CUDA 中的全局变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆