如何将C ++向量传递和访问OpenCL内核? [英] How to pass and access C++ vectors to OpenCL kernel?

查看:56
本文介绍了如何将C ++向量传递和访问OpenCL内核?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是C,C ++和OpenCL的新手,目前正在尽最大努力学习它们.这是一个预先存在的C ++函数,我试图弄清楚如何使用C或C ++绑定移植到OpenCL.

I'm new to C, C++ and OpenCL and doing my best to learn them at the moment. Here's a preexisting C++ function that I'm trying to figure out how to port to OpenCL using either the C or C++ bindings.

#include <vector>

using namespace std;

class Test {

private:

    double a;
    vector<double> b;
    vector<long> c;
    vector<vector<double> > d;

public:

    double foo(long x, double y) {
        // mathematical operations
        // using x, y, a, b, c, d
        // and also b.size()
        // to calculate return value
        return 0.0;
    }

};

广泛地,我的问题是如何将该函数访问的所有类成员传递到绑定和内核中.我知道如何传递标量值,但不确定矢量值.也许有一种方法可以传递指向上述每个成员的指针或将它们映射到内存,以便OpenCL对它们的查看与主机内存同步?细分我的问题如下.

Broadly my question is how to pass in all the class members that this function accesses into the binding and the kernel. I understand how to pass in the scalar values but the vector values I'm not sure about. Is there perhaps a way to pass in pointers to each of the above members or memory map them so that OpenCL's view of them is in sync with host memory? Broken down my questions are as below.

  1. 鉴于成员b和c的大小可变,如何将它们传递给绑定和内核?
  2. 鉴于成员d是二维的,我该如何传递它?
  3. 如何从内核内部访问这些成员,以及在内核参数中将它们声明为哪种类型?只需使用数组索引符号即b [0]即可进行访问吗?
  4. 我该如何在内核函数中调用等效于b.size()的操作,还是不这样做,而是将绑定中的大小作为额外的参数传递给内核?如果发生变化会怎样?

我非常感谢C或C ++绑定以及答案中的内核代码示例源代码.

I would really appreciate either C or C++ binding and kernel code example source code in answers.

非常感谢.

推荐答案

  1. 您必须分配一个OpenCL缓冲区并将CPU数据复制到其中. OpenCL缓冲区的大小是固定的,因此,如果数据大小发生更改,则必须重新创建它;或者使它足够大",如果需要较少的内存,则仅使用它的一个子部分.例如,要为b创建一个缓冲区,并同时将其所有数据复制到设备:

  1. You have to allocate an OpenCL buffer and copy your CPU data into it. An OpenCL buffer has a fixed size, so you either have to recreate it if your data size changes or you make it "big enough" and use only a subsection of it if less memory is needed. For example, to create a buffer for b and at the same time copy all of its data to the device:

cl_mem buffer_b = clCreateBuffer(
    context, // OpenCL context
    CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, // Only read access from kernel,
                                             // copy data from host
    sizeof(cl_double) * b.size(), // Buffer size in bytes
    &b[0], // Pointer to data to copy
    &errorcode); // Return code

也可以直接映射主机内存(CL_MEM_USE_HOST_PTR),但这在创建缓冲区后对对齐和对主机内存的访问施加了一些限制.基本上,当您当前不映射主机内存时,主机内存可能包含垃圾.

It is also possible to directly map host memory (CL_MEM_USE_HOST_PTR), but this imposes some restrictions on the alignment and the access to the host memory after creating the buffer. Basically, the host memory can contain garbage when you are not currently mapping it.

这取决于.向量在第二维上的大小是否始终相等?然后在将它们上传到OpenCL设备时将它们展平.否则它将变得更加复杂.

It depends. Are the sizes of the vectors in the second dimension consistenly equal? Then just flatten them when uploading them to the OpenCL device. Otherwise it gets more complicated.

您可以在内核中将缓冲区参数声明为__global指针.例如,__global double *b将适合于在1中创建的缓冲区.您可以简单地在内核中使用数组表示法来访问缓冲区中的各个元素.

You declare buffer arguments as __global pointers in your kernel. For example, __global double *b would be appropiate for the buffer created in 1. You can simply use array notation in the kernel to access the individual elements in the buffer.

您无法从内核内部查询缓冲区大小,因此必须手动传递它.这也可能隐式发生,例如如果工作项的数量与b的大小匹配.

You cannot query the buffer size from within the kernel, so you have to pass it manually. This can also happen implicitly, e.g. if the number of work items matches the size of b.

可以访问所有数据进行计算的内核看起来像这样:

A kernel which can access all of the data for the computation could look like this:

__kernel void foo(long x, double y, double a, __global double* b, int b_size,
                  __global long* c, __global double* d,
                  __global double* result) {
  // Here be dragons
  *result = 0.0;
}

请注意,您还必须为结果分配内存.如果需要,可能需要传递其他大小参数.您将按以下方式调用内核:

Note that you also have to allocate memory for the result. It might be necessary to pass additional size arguments should you need them. You would call the kernel as follows:

// Create/fill buffers
// ...

// Set arguments
clSetKernelArg(kernel, 0, sizeof(cl_long), &x);
clSetKernelArg(kernel, 1, sizeof(cl_double), &y);
clSetKernelArg(kernel, 2, sizeof(cl_double), &a);
clSetKernelArg(kernel, 3, sizeof(cl_mem), &b_buffer);
cl_int b_size = b.size();
clSetKernelArg(kernel, 4, sizeof(cl_int), &b_size);
clSetKernelArg(kernel, 5, sizeof(cl_mem), &c_buffer);
clSetKernelArg(kernel, 6, sizeof(cl_mem), &d_buffer);
clSetKernelArg(kernel, 7, sizeof(cl_mem), &result_buffer);
// Enqueue kernel
clEnqueueNDRangeKernel(queue, kernel, /* ... depends on your domain */);

// Read back result
cl_double result;
clEnqueueReadBuffer(queue, result_buffer, CL_TRUE, 0, sizeof(cl_double), &result,
                    0, NULL, NULL);

这篇关于如何将C ++向量传递和访问OpenCL内核?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆