在PyCUDA中遍历2D数组 [英] Iterating through a 2D array in PyCUDA

查看:160
本文介绍了在PyCUDA中遍历2D数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图遍历PyCUDA中的2D数组,但最终得到重复的数组值。我最初会抛出一个小的随机整数数组,并且可以按预期工作,但是当我向其抛出图像时,我会一次又一次看到相同的值。



这是我的代码

  img = np.random.randint(20,size =(4,5))
打印输入数组
打印img
img_size = img.shape
打印img_size

#nbtes确定numpy数组的字节数a
img_gpu = cuda .mem_alloc(img.nbytes)
#将内存从CPU复制到GPU
cuda.memcpy_htod(img_gpu,img)


mod = SourceModule(
#include< stdio.h>
__global__ void AHE(int * a,int row,int col)
{
int i = threadIdx.x + blockIdx.x * blockDim .x;
int j = threadIdx.y + blockIdx.y * blockDim.y;
if(i == 0&& j == 0)
printf(输出数组 );
if(i< row&& j< col)
{
printf(%d,a [j + i * col]);
}
}


col = np.int32(img.shape [-1])$ ​​b $ b row = np.int32(img.shape) [0])
func = mod.get_function( AHE)
func(img_gpu,row,col,block =(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe,img_gpu)

解决方案

此处的问题是您正在加载图片

  import pycuda.driver as 

  import pycuda.driver as来自pycuda.compiler的cuda 
import SourceModule
numpy as np
import cv2

import pycuda.autoinit

img = cv2.imread ('Chest.jpg',0)
img_size = img.shape
打印img_size
打印img.dtype

#nbtes确定numpy数组的字节数a
img_gpu = cuda.mem_alloc(img.nbytes)
#将内存从CPU复制到GPU
cuda.memcpy_htod(img_gpu,img)

mod = SourceModule(
#include< stdio.h>
__global__ void AHE(unsigned char * a,int row,int col)
{
int i = threadIdx.x + blockIdx.x * blockDim.x;
int j = threadIdx.y + blockIdx.y * blockDim.y;
if(i == 0&& j == 0)
printf( Output array);
if(i< row&& j< col)
{
int val = int(a [j + i * col]);
printf(%d,val);
}
}

#为您提供列数
col = np.int32(img.shape [-1])$ ​​b $ b row = np.int32(img.shape [0])
func = mod.get_function( AHE)
func(img_gpu,row,col,block =(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe,img_gpu)

运行时代码会发出以下信息:

  $ python image.py 
(681,1024)
uint8
输出数组244244244244244244244244244244244244244244244244244244244244244244244245245246246246246246246246246246246246246244244244244244244244244244244 245 245 245 245 245 245 245 245 244 244 245 245 245 245 246 246 246

[为简洁起见,输出被裁剪]



请注意图像的 dtype - uint8 。您的代码正在尝试将无符号8位值流视为整数,从技术上讲,它应在完整映像上生成运行时错误,因为内核将读取si以外的内容图像的大小,因为它读取的是每个像素4个字节而不是1个字节。但是,您看不到此图像,因为您只运行一个块,并且您输入的图像大概比该块的32 x 32大小大至少四倍。



顺便说一句,PyCUDA非常擅长管理和实施CUDA调用的类型安全性,但是您的代码巧妙地击败了PyCUDA可以检测到类型不匹配的所有机制。内核调用。 PyCUDA包含出色的 GPUarray 类。您应该熟悉它。如果您在此处使用了GPUarray实例,则将出现类型不匹配的运行时错误,这将在您第一次尝试运行它时提醒您确切的问题根源。


I am trying to iterate through a 2D array in PyCUDA but I end up with repeated array values. I initially throw a small random integer array and that works as expected but when I throw an image at it, I see the same values over and over again.

Here is my code

img = np.random.randint(20, size = (4,5))
print "Input array"
print img
img_size=img.shape
print img_size

#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)


mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(int *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
    printf(" %d",a[j + i*col]);
}
}
""")

col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)

Now when I replace the random integer array with an image converted to a numpy array I end up with this

img = cv2.imread('Chest.jpg',0)
img_size=img.shape
print img_size

#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)

mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(int *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
    printf(" %d",a[j + i*col]);
}
}
""")
#Gives you the number of columns
col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)

解决方案

The problem here is that the image you are loading doesn't have pixel values stored as signed integers. This modification of your example works more as expected:

import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import numpy as np
import cv2 

import pycuda.autoinit

img = cv2.imread('Chest.jpg',0)
img_size=img.shape
print img_size
print img.dtype

#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)

mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(unsigned char *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
    int val = int(a[j + i*col]);
    printf(" %d", val);
}
}
""")
#Gives you the number of columns
col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)

When run the code emits this:

$ python image.py 
(681, 1024)
uint8
Output array  244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 245 245 245 246 246 246 246 246 246 246 246 246 246 246 244 244 244 244 244 244 244 244 245 245 245 245 245 245 245 245 244 244 245 245 245 246 246 246 

[Output clipped for brevity]

Note the dtype of the image - uint8. Your code is attempting to treat the stream of unsigned 8 bit values as integers. It should technically generate a runtime error on a full image because the kernel will read beyond the size of image as it reads 4 bytes per pixel instead of 1. However, you don't see this because you only run a single block, and your input image is presumably at least four times larger than the 32 x 32 size of the block you run.

Incidentally, PyCUDA is extremely good at managing and enforcing type safety for CUDA calls, but your code neatly defeats every mechanism by which PyCUDA could detect a type mismatch in the kernel call. PyCUDA includes an excellent GPUarray class. You should familiarise yourself with it. If you had used a GPUarray instance here, you would have gotten type mismatch runtime errors which would have alerted you to the exact source of the problem the first time you tried to run it.

这篇关于在PyCUDA中遍历2D数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆