如何使用pyopencl将字符串列表传递给opencl内核? [英] How to pass a list of strings to an opencl kernel using pyopencl?
问题描述
如何以正确的方式将字符串列表传递给opencl内核?
How to pass list of strings to an opencl kernel the right way?
我使用缓冲区尝试这种方式(请参见以下代码),但失败了.
I tried this way using buffers (see following code), but I failed.
OpenCL(struct.cl):
OpenCL (struct.cl):
typedef struct{
uchar uc[40];
} my_struct9;
inline void try_this7_now(__global const uchar * IN_DATA ,
const uint IN_len_DATA ,
__global uchar * OUT_DATA){
for (unsigned int i=0; i<IN_len_DATA ; i++) OUT_DATA[i] = IN_DATA[i];
}
__kernel void try_this7(__global const my_struct9 * pS_IN_DATA ,
const uint IN_len ,
__global my_struct9 * pS_OUT){
uint idx = get_global_id(0);
for (unsigned int i=0; i<idx; i++) try_this7_now(pS_IN_DATA[i].uc, IN_len, pS_OUT[i].uc);
}
Python(opencl_struct.py):
Python (opencl_struct.py):
# -*- coding: utf-8 -*-
import pyopencl as cl
import pyopencl.array as cl_array
import numpy
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
# --------------------------------------------------------
LIMIT = 40
mf = cl.mem_flags
import ctypes,sys,struct
"""
typedef struct{
uchar uc[40];
} my_struct9;
"""
INlist = []
INlist.append("That is VERY cool!")
INlist.append("It is a list!")
INlist.append("A big one!")
#INlist.append("But it failes to output. :-(") # PLAY WITH THOSE
INlist.append("WTF is THAT?") # PLAY WITH THOSE
print "INlist : "+str(INlist)
print "largest string "+str( max( len(INlist[iL]) for iL in range(len(INlist)) ) )
strLIMIT=str(LIMIT)
s7 = struct.Struct( (str(strLIMIT+'s') *len(INlist)) )
IN_host_buffer = ctypes.create_string_buffer(s7.size)
s7.pack_into(IN_host_buffer, 0, *INlist)
IN_dev_buffer = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=IN_host_buffer)
OUT_host_buffer = ctypes.create_string_buffer(s7.size)
OUT_dev_buffer = cl.Buffer(ctx, mf.WRITE_ONLY, len(OUT_host_buffer))
print "> len(OUT_host_buffer) "+str(len(OUT_host_buffer))
# ========================================================================================
f = open("struct.cl", 'r')
fstr = "".join(f.readlines())
prg = cl.Program(ctx, fstr).build()
#cl.enqueue_copy(queue, IN_dev_buffer, IN_host_buffer, is_blocking=True) # copy data to device
cl.enqueue_write_buffer(queue, IN_dev_buffer, IN_host_buffer).wait()
prg.try_this7(queue, (1,), None, IN_dev_buffer, numpy.uint32(LIMIT), OUT_dev_buffer)
# ========================================================================================
cl.enqueue_copy(queue, OUT_host_buffer, OUT_dev_buffer).wait()
SSS = s7.unpack_from(OUT_host_buffer,0)
# unpack here OUT_host_buffer
print "(GPU) output : "+str( SSS )+" "
for s in range(len(SSS)):
print ">>> (GPU) output : "+str( SSS[s] )
我第一次运行程序时显示为但是输出失败"作为第四个列表元素.然后我通过增加和减少列表中的元素进行操作.最后,出现了这个问题: 该程序的输出应该是(简短版本)
I ran the program first time with "but it failes to output" as 4th list element. Then I played around by increasing and decreasing elements of the list. Finally, there appeared this problem: The output of the program is supposed to be (short version)
(GPU)输出:太酷了!
(GPU) output : That is VERY cool!
(GPU)输出:这是一个列表!
(GPU) output : It is a list!
(GPU)输出:很大!
(GPU) output : A big one!
(GPU)输出:WTF是什么?
(GPU) output : WTF is THAT?
但这是
python opencl_struct.py
python opencl_struct.py
INlist:['这太酷了!',' 列表!",大个子!","WTF就是那个?"
INlist : ['That is VERY cool!', 'It is a list!', 'A big one!', 'WTF is THAT?']
最大字符串18
len(OUT_host_buffer)160(GPU)输出:('太酷了!\ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00', '它是一个 列表! \ x00 \ x00 \ x00', '一个大的 一个!\ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00', "但它无法输出. :-(\ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00')
len(OUT_host_buffer) 160 (GPU) output : ('That is VERY cool!\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'It is a list!\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'A big one!\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'But it failes to output. :-(\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
(GPU)输出:太酷了!
(GPU) output : That is VERY cool!
(GPU)输出:这是一个列表!
(GPU) output : It is a list!
(GPU)输出:很大!
(GPU) output : A big one!
(GPU)输出:但是无法输出. :-(
(GPU) output : But it failes to output. :-(
如您所见,第4个列表元素有所不同.
As you can see, the the 4th list element differes.
所以,也许我的方法是错误的,或者pyopencl或其他地方存在错误.
So, maybe my approach is wrong or there is a bug in pyopencl or somewhere else.
我正在使用NVidia 9400 GPU.
I am using a NVidia 9400 GPU.
兰博
推荐答案
在我看来,您的代码非常复杂.有些地方对我来说不是很清楚.例如,我不明白为什么只创建一个工作项:
You code seems to me very complicated. And some part are not very clear to me. For instance, I don't see why you create only one work item:
prg.try_this7(queue, (1,), None,...)
这会迫使您遍历字符串(在内核中),而不是使用可用的并行性.无论如何,据我所知,您想将一些字符串发送到GPU,将它们复制到另一个缓冲区中,再将它们返回主机端并显示它们.
Which force you to loop through your strings (in the kernel) instead of using the available parallelism. Anyhow, if I well understand, you want to send some strings to the GPU copy them in another buffer, get them back in the host side and display them.
如果是这种情况,这是仅使用numpy以及pyopencl的版本:
If it's the case here is a version using only numpy and of course pyopencl:
import numpy as np
import pyopencl as cl
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
#The kernel uses one workitem per char transfert
prog_str = """kernel void foo(global char *in, global char *out, int size){
int idx = get_global_id(0);
if (idx < size){
out[idx] = in[idx];
}
}"""
prog = cl.Program(ctx, prog_str).build()
#Note that the type of the array of strings is '|S40' for the length
#of third element is 40, the shape is 3 and the nbytes is 120 (3 * 40)
original_str = np.array(('this is an average string',
'and another one',
"let's push even more with a third string"))
mf = cl.mem_flags
in_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=original_str)
out_buf = cl.Buffer(ctx, mf.WRITE_ONLY, size=str_size)
copied_str = np.zeros_like(original_str)
#here launch the kernel with str_size number of workitems in this case 120
#this mean that some of the workitems won't process any meaningful char
#(not all string have a lenght of 40) but it's no biggie
prog.foo(queue, (str_size,), None, in_buf, out_buf, np.int32(str_size))
cl.enqueue_copy(queue, copied_str, out_buf).wait()
print copied_str
显示的结果:
['this is an average string' 'and another one'
"let's push even more with a third string"]
这篇关于如何使用pyopencl将字符串列表传递给opencl内核?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!