CUDA:如何将 3D 阵列从主机复制到设备? [英] CUDA : How to copy a 3D array from host to device?
问题描述
我想了解如何将 3 维数组从主机内存复制到设备内存.假设我有一个包含数据的 3d 数组.例如int host_data[256][256][256];我想以这种方式将该数据复制到 dev_data(一个设备数组)主机数据[x][y][z]=dev_data[x][y][z];我该怎么做?我应该如何访问设备中的 dev_data 数组?一个简单的例子会很有帮助.
I want to learn how can i copy a 3 dimensional array from host memory to device memory. Lets say i have a 3d array which contains data. For example int host_data[256][256][256]; I want to copy that data to dev_data (a device array) in such a way so host_data[x][y][z]=dev_data[x][y][z]; How can i do it? and how am i supposed to access the dev_data array in the device? A simple example would be very helpfull.
推荐答案
常用的方法是展平一个数组(使其成为一维的).然后,您必须进行一些计算以将 (x,y,z)
三重映射到一个数字 - 扁平一维数组中的一个位置.
The common way is to flatten an array (make it one-dimensional). Then you'll have to make some calculations to map from (x,y,z)
triple to one number - a position in a flattened one-dimensional array.
示例 2D:
int data[256][256];
int *flattened = data;
data[x][y] == fattened[x * 256 + y];
示例 3D:
int data[256][256][256];
int *flattened = data;
data[x][y][z] == flattened[x * 256 * 256 + y * 256 + z];
或使用包装器:
__host__ __device___ inline int index(const int x, const int y, const int z) {
return x * 256 * 256 + y * 256 + z;
}
知道了,你可以像往常一样用cudaMalloc分配一个线性数组,然后使用index
函数访问设备代码中的相应元素.
Knowing that, you can allocate a linear array with cudaMalloc, as usual, then use an index
function to access corresponding element in device code.
更新:这个问题的作者声称要找到更好的解决方案(至少对于 2D),您可能想看看.
Update: The author of this question claims to have found a better solution (at least for 2D), you might want to have a look.
这篇关于CUDA:如何将 3D 阵列从主机复制到设备?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!