CUDA-从3D阵列提取图层 [英] CUDA - Extract Layer from 3D array

查看:91
本文介绍了CUDA-从3D阵列提取图层的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


我有一个3D矩阵,其中x-y平面表示图像,z-平面表示图像层.
问题是,当我尝试使用idz提取第一层(或其他层)时,没有得到预期的结果.看起来数组一旦放在CUDA中,对于x,y或z的索引就比我期望的要大(与pycuda一样).我通过下面的结果数组看到了这一点.

以下是此迷你示例的逐步过程(我使用通用int编号表示图像,以保存上传的图像和整个代码)!
我在这里导入库并定义图像大小和图层...


I have a 3D matrix where the x-y plane(s) represent an image and the z-plane represents image layers.
The issue is when I try to extract the first (or other layers) using idz, I do not get the expected results. It looks like the array, once in CUDA, has different indexes for x, y or z than what I expect (as in pycuda). I see this by the result array below.

The following is a step by step process for this mini example (I used generic int numbers to represent my images to save uploading images and the entire code)!
Here I import libraries and define image size and layers...

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
from pycuda.gpuarray import to_gpu

row = 10
column = 10
depth = 5

然后我定义我的输入3D数组和我的输出2D数组...

Then I define my input 3D array and my output 2D array...

#--==== Input 3D Array ====---
arrayA = numpy.full((row, column, depth), 0)

#populate each layer with fixed values
for i in range(depth):
    arrayA[:,:,i] = i + 1

arrayA = arrayA.astype(numpy.uint16)
arrayA_gpu = cuda.mem_alloc(arrayA.nbytes)
cuda.memcpy_htod(arrayA_gpu, arrayA)
arrayA_Answer = numpy.empty_like(arrayA)

#--==== Output 2D array container ====---
arrayB = numpy.zeros([row, column], dtype = numpy.uint16)
arrayB_gpu = cuda.mem_alloc(arrayB.nbytes)
cuda.memcpy_htod(arrayB_gpu, arrayB)
arrayB_Answer = numpy.empty_like(arrayB)

接下来,我在pycuda中定义CUDA内核和功能

Next I define the CUDA kernal and function in pycuda

mod = SourceModule("""
    __global__ void getLayer(int *arrayA, int *arrayB)
    {
        int idx = threadIdx.x + (blockIdx.x * blockDim.x); // x coordinate (numpy axis 2) 
        int idy = threadIdx.y + (blockIdx.y * blockDim.y); // y coordinate (numpy axis 1)
        int idz = 0; //The first layer, this can set in range from 0-4 
        int x_width = (blockDim.x * gridDim.x); 
        int y_width = (blockDim.y * gridDim.y); 

        arrayB[idx + (x_width * idy)] = arrayA[idx + (x_width * idy) + (x_width * y_width) * idz];
    }
    """)

func = mod.get_function("getLayer")
func(arrayA_gpu, arrayB_gpu, block=(row, column, 1), grid=(1,1))

使用标准pycuda命令,提取结果(不是我期望的结果)
arrayA [:,:,0] = 10x10矩阵,其中填充了1(好)

Using standard pycuda commands, I extract the results (not what I expected)
arrayA[:,:,0] = 10x10 matrix populated with 1's (good)

print(arrayA_Answer[:,:,0])
[[1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]]

arrayB [:,:] =用以下(错误)填充的10x10矩阵,期望等于arrayA [:,:,0] ...

arrayB[:,:] = 10x10 matrix populated with the following (bad), expected to be equal to arrayA[:,:,0]...

print(arrayB_Answer)
[[1 2 3 4 5 1 2 3 4 5]
 [1 2 3 4 5 1 2 3 4 5]
 [1 2 3 4 5 1 2 3 4 5]
 [1 2 3 4 5 1 2 3 4 5]
 [1 2 3 4 5 1 2 3 4 5]
 [1 2 3 4 5 1 2 3 4 5]
 [1 2 3 4 5 1 2 3 4 5]
 [1 2 3 4 5 1 2 3 4 5]
 [1 2 3 4 5 1 2 3 4 5]
 [1 2 3 4 5 1 2 3 4 5]]

推荐答案

如所讨论的这里,numpy的3D存储顺序模式是" z "(即"3rd")索引是快速变化的索引,随着您在内存中线性前进.您的代码假定第一个索引(" x ")是快速变化的索引.

As discussed here, the numpy 3D storage order pattern is that the "z" (i.e. "3rd") index is the rapidly varying index, as you progress linearly through memory. Your code assumes that the first index ("x") is the rapidly varying one.

由于已经为高效(合并")加载/存储行为组织了内核,因此可以通过以numpy重新排列图像/图层/切片的存储顺序来解决此问题.这是一个可行的示例:

Since your kernel is already organized for efficient ("coalesced") load/store behavior, you could address this by reordering the storage of your images/layers/slices in numpy. Here is a worked example:

$ cat t10.py
from __future__ import print_function
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
from pycuda.gpuarray import to_gpu

row = 5
column = 10
depth = 10

#--==== Input 3D Array ====---
arrayA = numpy.full((row, column, depth), 0)
my_slice=numpy.int32(3)  # choose the layer
#populate each layer with fixed values
for i in range(row):
    arrayA[i,:,:] = i + 1

arrayA = arrayA.astype(numpy.int32)
arrayA_gpu = cuda.mem_alloc(arrayA.nbytes)
cuda.memcpy_htod(arrayA_gpu, arrayA)
arrayA_Answer = numpy.empty_like(arrayA)

#--==== Output 2D array container ====---
arrayB = numpy.zeros([column, depth], dtype = numpy.int32)
arrayB_gpu = cuda.mem_alloc(arrayB.nbytes)
cuda.memcpy_htod(arrayB_gpu, arrayB)
arrayB_Answer = numpy.empty_like(arrayB)

mod = SourceModule("""
    __global__ void getLayer(int *arrayA, int *arrayB, int slice)
    {
        int idx = threadIdx.x + (blockIdx.x * blockDim.x); // x coordinate (numpy axis 2)
        int idy = threadIdx.y + (blockIdx.y * blockDim.y); // y coordinate (numpy axis 1)
        int idz = slice; //The "layer"
        int x_width = (blockDim.x * gridDim.x);
        int y_width = (blockDim.y * gridDim.y);

        arrayB[idx + (x_width * idy)] = arrayA[idx + (x_width * idy) + (x_width * y_width) * idz];
    }
    """)

func = mod.get_function("getLayer")
func(arrayA_gpu, arrayB_gpu, my_slice, block=(depth, column, 1), grid=(1,1))
cuda.memcpy_dtoh(arrayB_Answer,arrayB_gpu)

print(arrayA[my_slice,:,:])

print(arrayB_Answer[:,:])
$ python t10.py
[[4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]]
[[4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]
 [4 4 4 4 4 4 4 4 4 4]]
$

请注意,我还将您对 uint16 的使用更改为 int32 ,以匹配内核类型 int .

Note that I have also changed your use of uint16 to int32, to match the kernel type int.

这篇关于CUDA-从3D阵列提取图层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆