CUDA中的全局变量 [英] Global variable in CUDA

查看:628
本文介绍了CUDA中的全局变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在CUDA中创建全局变量?

How can I create global variables in CUDA?? Could you please give me an example?

如何在CUDA函数中创建数组

How can create arrays inside a CUDA function for example

__global__ void test()
{
  int *a = new int[10];
}

或者如何创建一个全局数组并从这个函数访问它。例如

or How can I create a global array and access it from this function. for example

__device__ int *a;
__global__ void test()
{
  a[0] = 2;
}

或者我如何使用如下:

__global__ void ProcessData(int img)
{
   int *neighborhood = new int[8]; 
   getNeighbourhood(img, neighbourhood);
}






有了这个。我发现与


Still I have some problem with this. I found that compare to

__device__

如果我定义

"__device__ __constant__" (read only)

将改善内存访问。
但我的问题是我有一个数组在主机内存中

will improve the memory access. But my problem is I have an array in host memory say

 float *arr = new float[sizeOfTheArray]; 

我想在设备中将其作为一个变量数组,我需要在设备内存中修改它,我需要复制回主机。我如何做到

I want to make it as a variable array in device and I need to modify this in device memory and I need to copy this back to host. How can I do it??

推荐答案

C ++ new 支持使用CUDA 4.0的计算能力2.0和2.1(即Fermi),因此您可以使用 new 将全局内存分配到设备符号上,尽管您的前两个代码

The C++ new operator is supported on compute capability 2.0 and 2.1 (ie. Fermi) with CUDA 4.0, so you could use new to allocate global memory onto a device symbol, although neither of your first two code snippets are how it would be done in practice.

在旧的硬件上和/或使用CUDA 4.0预备工具包,标准方法是使用 cudaMemcpyToSymbol API in host code:

On older hardware, and/or with pre CUDA 4.0 toolkits, the standard approach is to use the cudaMemcpyToSymbol API in host code:

__device__ float *a;

int main()
{
    const size_t sz = 10 * sizeof(float);

    float *ah;
    cudaMalloc((void **)&ah, sz);
    cudaMemcpyToSymbol("a", &ah, sizeof(float *), size_t(0),cudaMemcpyHostToDevice);
}

将动态分配的设备指针复制到可以直接在设备代码。

which copies a dynamically allocated device pointer onto a symbol which can be used directly in device code.

编辑:回答这个问题有点像移动目标。对于你现在感兴趣的常量内存情况,这里是一个完整的工作示例:

Answering this question is a bit like hitting a moving target. For the constant memory case you now seem interested in, here is a complete working example:

#include <cstdio>

#define nn (10)

__constant__ float a[nn];

__global__ void kernel(float *out)
{
    if (threadIdx.x < nn)
        out[threadIdx.x] = a[threadIdx.x];

}

int main()
{
    const size_t sz = size_t(nn) * sizeof(float);
    const float avals[nn]={ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10. };
    float ah[nn];

    cudaMemcpyToSymbol("a", &avals[0], sz, size_t(0),cudaMemcpyHostToDevice);

    float *ad;
    cudaMalloc((void **)&ad, sz);

    kernel<<<dim3(1),dim3(16)>>>(ad);

    cudaMemcpy(&ah[0],ad,sz,cudaMemcpyDeviceToHost);

    for(int i=0; i<nn; i++) {
        printf("%d %f\n", i, ah[i]);
    }
}

这表示将数据复制到常量内存符号,在内核中使用该数据。

This shows copying data onto a constant memory symbol, and using that data inside a kernel.

另一方面,interweb是充满回答的问题,教程,演讲笔记,视频,电子书,示例代码和文档关于CUDA编程的基础。五分钟与你选择的搜索引擎将得到你的答案在过去几天你一直在问这些问题。也许现在是做到这一点的时候了。

On another note, the interweb is overflowing with well answered questions, tutorials, lecture notes, videos, ebooks, sample code and documentation on the basics of CUDA programming. Five minutes with the search engine of your choice would get you answers to every one of these questions you have been asking over the last few days. Perhaps it is time to do exactly that.

这篇关于CUDA中的全局变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆