使用PyOpenCL将带有指针成员的结构传递给OpenCL内核 [英] Passing struct with pointer members to OpenCL kernel using PyOpenCL

查看:119
本文介绍了使用PyOpenCL将带有指针成员的结构传递给OpenCL内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们假设我有一个内核来计算两个数组的按元素求和.我没有将a,b和c作为三个参数传递,而是使它们成为以下结构成员:

Let's suppose I have a kernel to compute the element-wise sum of two arrays. Rather than passing a, b, and c as three parameters, I make them structure members as follows:

typedef struct
{
    __global uint *a;
    __global uint *b;
    __global uint *c;
} SumParameters;

__kernel void compute_sum(__global SumParameters *params)
{
    uint id = get_global_id(0);
    params->c[id] = params->a[id] + params->b[id];
    return;
}

如果您使用PyOpenCL的RTFM,则有关于结构的信息[1],其他人也已解决了此问题[2] [3] [4].但是我找不到的OpenCL结构示例都没有将指针作为成员.

There is information on structures if you RTFM of PyOpenCL [1], and others have addressed this question too [2] [3] [4]. But none of the OpenCL struct examples I've been able to find have pointers as members.

特别是,我担心主机/设备地址空间是否匹配以及主机/设备指针大小是否匹配.有人知道答案吗?

Specifically, I'm worried about whether host/device address spaces match, and whether host/device pointer sizes match. Does anyone know the answer?

[1] http: //documen.tician.de/pyopencl/howto.html#how-to-use-struct-types-with-pyopencl

[2] 使用PyOpenCL进行结构对齐

[3] http ://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/

[4] http://acooke.org/cute/Somesimple0.html

推荐答案

不,不保证地址空间匹配.对于基本类型(float,int,…),您有对齐要求(标准的6.1.5节),并且必须使用OpenCL实现的cl_type名称(在C语言中进行编程时,pyopencl会在后台进行工作)会说).

No, there is no guaranty that address spaces match. For the basic types (float, int,…) you have alignment requirement (section 6.1.5 of the standard) and you have to use the cl_type name of the OpenCL implementation (when programming in C, pyopencl does the job under the hood I’d say).

对于指针,由于这种不匹配而变得更加简单.标准v 1.2的6.9节(版本1.1的6.8节)的开头部分是:

For the pointers it’s even simpler due to this mismatch. The very beginning of section 6.9 of the standard v 1.2 (it’s section 6.8 for version 1.1) states:

程序中声明为指针的内核函数的参数 必须使用__global,__ constant或__local限定符声明.

Arguments to kernel functions declared in a program that are pointers must be declared with the __global, __constant or __local qualifier.

然后是第p点:

声明为struct或的内核函数的参数 联合不允许将OpenCL对象作为的元素传递 结构或联合.

Arguments to kernel functions that are declared to be a struct or union do not allow OpenCL objects to be passed as elements of the struct or union.

还要注意点d:

具有可变(或不定尺寸)的可变长度数组和结构 不支持数组.

Variable length arrays and structures with flexible (or unsized) arrays are not supported.

因此,没有办法让您的内核按照您的问题中所述的那样运行,这就是为什么您无法找到一些带有指针作为成员的OpenCl结构示例的原因. 我仍然可以提出一种利用内核是在JIT中编译的事实的变通方法.仍然需要您正确打包数据,并且要注意对齐,最后在程序执行期间大小不会改变.老实说,我会选择一个以3个缓冲区作为参数的内核,但是无论如何,它确实存在.

So, no way to make you kernel runs as described in your question and that's why you haven’t been able to find some examples of OpenCl struct have pointers as members.
I still can propose a workaround that takes advantage of the fact that the kernel is compiled in JIT. It still requires that you pack you data properly and that you pay attention to the alignment and finally that the size doesn’t change during the execution of the program. I honestly would go for a kernel taking 3 buffers as arguments, but anyhow, there it is.

想法是使用预处理程序选项–D,如下面的python示例所示:

The idea is to use the preprocessor option –D as in the following example in python:

内核:

typedef struct {
    uint a[SIZE];
    uint b[SIZE];
    uint c[SIZE];
} SumParameters;

kernel void foo(global SumParameters *params){
    int idx = get_global_id(0);
    params->c[idx] = params->a[idx] + params->b[idx];
}

主机代码:

import numpy as np
import pyopencl as cl

def bar():
   mf = cl.mem_flags
   ctx = cl.create_some_context()
   queue = cl.CommandQueue(self.ctx)
   prog_f = open('kernels.cl', 'r')
   #a = (1, 2, 3), b = (4, 5, 6)          
   ary = np.array([(1, 2, 3), (4, 5, 6), (0, 0, 0)], dtype='uint32, uint32, uint32')
   cl_ary = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=ary)
   #Here should compute the size, but hardcoded for the example
   size = 3
   #The important part follows using -D option
   prog = cl.Program(ctx, prog_f.read()).build(options="-D SIZE={0}".format(size))    
   prog.foo(queue, (size,), None, cl_ary)
   result = np.zeros_like(ary)
   cl.enqueue_copy(queue, result, cl_ary).wait()
   print result

结果:

[(1L, 2L, 3L) (4L, 5L, 6L) (5L, 7L, 9L)]

这篇关于使用PyOpenCL将带有指针成员的结构传递给OpenCL内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆