如何在OpenCV(3.0.0)OCL中启动自定义OpenCL内核? [英] How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

查看:124
本文介绍了如何在OpenCV(3.0.0)OCL中启动自定义OpenCL内核?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可能通过将OpenCV用作官方OpenCL C ++绑定的包装器而滥用OpenCV,以便我可以启动自己的内核.

I'm probably misusing OpenCV by using it as wrapper to the official OpenCL C++ bindings so that I can launch my own kernels.

但是,OpenCV确实具有诸如Program,ProgramSource,Kernel,Queue等类,这些类似乎告诉我可以使用OpenCV启动我自己的(甚至基于非映像的)内核.我很难找到这些类的文档,更不用说示例了.因此,到目前为止,我一直对此持怀疑态度.

However, OpenCV does have classes like Program, ProgramSource, Kernel, Queue, etc. that seem to tell me that I can launch my own (even non-image-based) kernels with OpenCV. I am having trouble finding documentation out there for these classes, let alone examples. So, I took a stab at it so far:

#include <fstream>
#include <iostream>

#include "opencv2/opencv.hpp"
#include "opencv2/core/ocl.hpp"

#define ARRAY_SIZE 128

using namespace std;
using namespace cv;

int main(int, char)
{
    std::ifstream file("kernels.cl");
    std::string kcode(std::istreambuf_iterator<char>(file),
        (std::istreambuf_iterator<char>()));

    cv::ocl::ProgramSource * programSource;
    programSource = new cv::ocl::ProgramSource(kcode.c_str());

    cv::String errorMessage;
    cv::ocl::Program * program;
    program = new cv::ocl::Program(*programSource, NULL, errorMessage);

    cv::ocl::Kernel * kernel;
    kernel = new cv::ocl::Kernel("simple_add", *program);
    /* I'm stuck here at the args. */

    size_t globalSize[2] = { ARRAY_SIZE, 1 };
    size_t localSize[2] = { ARRAY_SIZE, 1 };    
    kernel->run(ARRAY_SIZE, globalSize, localSize, true);

    return 0;
}

请注意,我尚未设置主机变量.我被困在kernel->args(...)上.有15个重载,并且每个重载都没有指定我应按以下顺序指定的顺序:

Note that I haven't set up my host variables yet. I'm stuck at kernel->args(...). There are 15 overloads and none of them specify what order I should specify the following, per argument:

  1. 参数索引,所以我按照内核中给定的顺序手动匹配参数.
  2. 主机变量本身.
  3. 主机变量的数组大小-尽管我以前在纯OpenCL的clEnqueueWriteBuffer函数中指定了该值,但我通常会说类似sizeof(int) * ARRAY_SIZE的内容.
  4. 设备缓冲存储器访问,例如CL_MEM_READ_ONLY
  1. The parameter index, so I manually match the parameter in the order given in the kernel.
  2. The host variable itself.
  3. The host variable's array size - typically I say something like sizeof(int) * ARRAY_SIZE, though I used to specify that on the clEnqueueWriteBuffer function in plain OpenCL.
  4. Device buffer memory access, for example CL_MEM_READ_ONLY

它看起来不像我调用enqueueWriteBufer(...),enqueueNDRangeKernel(...)或enqueueReadBuffer(...),因为(我猜)kernel-> run()为我完成了所有这些工作在引擎盖下.我假设kernel-> run()会将新值写入我的输出参数.

It doesn't look like I call enqueueWriteBufer(...), enqueueNDRangeKernel(...), or enqueueReadBuffer(...) because (I guess) the kernel->run() does all of that for me under the hood. I assume that kernel->run() will write the new values to my output parameter.

我没有指定命令队列,设备或上下文.我认为只有一个命令队列和一个上下文,以及默认设备-所有设备都是在后台创建的,可以从这些类访问.

I didn't specify a command queue, device, or context. I think that there is only one command queue and one context, and the default device - all created under-the-hood and are accessible from these classes.

再说一遍,如何使用内核的args函数?

So again, how do I use the args function of the kernel?

推荐答案

尽管我不确定100%,但我想出了一种方法来做到这一点. 此示例包含有关如何使用cv :: UMat,基本类型(例如int/float/uchar)和Image2D向/从自定义内核传递数据/从自定义内核检索数据的提示.

Although I am not 100% sure, I figured out a way to do this. This example contains tips on how to pass/retrieve data to/from a custom kernel using cv::UMat, basic types (e.g. int/float/uchar), and Image2D.

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <opencv2/opencv.hpp>
#include <opencv2/core/ocl.hpp>

using namespace std;

void main()
{
    if (!cv::ocl::haveOpenCL())
    {
        cout << "OpenCL is not avaiable..." << endl;
        return;
    }
    cv::ocl::Context context;
    if (!context.create(cv::ocl::Device::TYPE_GPU))
    {
        cout << "Failed creating the context..." << endl;
        return;
    }

    // In OpenCV 3.0.0 beta, only a single device is detected.
    cout << context.ndevices() << " GPU devices are detected." << endl;
    for (int i = 0; i < context.ndevices(); i++)
    {
        cv::ocl::Device device = context.device(i);
        cout << "name                 : " << device.name() << endl;
        cout << "available            : " << device.available() << endl;
        cout << "imageSupport         : " << device.imageSupport() << endl;
        cout << "OpenCL_C_Version     : " << device.OpenCL_C_Version() << endl;
        cout << endl;
    }

    // Select the first device
    cv::ocl::Device(context.device(0));

    // Transfer Mat data to the device
    cv::Mat mat_src = cv::imread("Lena.png", cv::IMREAD_GRAYSCALE);
    mat_src.convertTo(mat_src, CV_32F, 1.0 / 255);
    cv::UMat umat_src = mat_src.getUMat(cv::ACCESS_READ, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
    cv::UMat umat_dst(mat_src.size(), CV_32F, cv::ACCESS_WRITE, cv::USAGE_ALLOCATE_DEVICE_MEMORY);

    std::ifstream ifs("shift.cl");
    if (ifs.fail()) return;
    std::string kernelSource((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());
    cv::ocl::ProgramSource programSource(kernelSource);

    // Compile the kernel code
    cv::String errmsg;
    cv::String buildopt = cv::format("-D dstT=%s", cv::ocl::typeToStr(umat_dst.depth())); // "-D dstT=float"
    cv::ocl::Program program = context.getProg(programSource, buildopt, errmsg);

    cv::ocl::Image2D image(umat_src);
    float shift_x = 100.5;
    float shift_y = -50.0;
    cv::ocl::Kernel kernel("shift", program);
    kernel.args(image, shift_x, shift_y, cv::ocl::KernelArg::ReadWrite(umat_dst));

    size_t globalThreads[3] = { mat_src.cols, mat_src.rows, 1 };
    //size_t localThreads[3] = { 16, 16, 1 };
    bool success = kernel.run(3, globalThreads, NULL, true);
    if (!success){
        cout << "Failed running the kernel..." << endl;
        return;
    }

    // Download the dst data from the device (?)
    cv::Mat mat_dst = umat_dst.getMat(cv::ACCESS_READ);

    cv::imshow("src", mat_src);
    cv::imshow("dst", mat_dst);
    cv::waitKey();
}

下面是一个"shift.cl"文件.

Below is a "shift.cl" file.

__constant sampler_t samplerLN = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR;
__kernel void shift(
   __global const image2d_t src,
   float shift_x,
   float shift_y,
   __global uchar* dst,
   int dst_step, int dst_offset, int dst_rows, int dst_cols)
{
   int x = get_global_id(0);
   int y = get_global_id(1);
   if (x >= dst_cols) return;
   int dst_index = mad24(y, dst_step, mad24(x, (int)sizeof(dstT), dst_offset));
   __global dstT *dstf = (__global dstT *)(dst + dst_index);
   float2 coord = (float2)((float)x+0.5f+shift_x, (float)y+0.5f+shift_y);
   dstf[0] = (dstT)read_imagef(src, samplerLN, coord).x;
}

重点是使用UMat.我们使用KernelArg :: ReadOnly(umat);接收内核中的5个参数(* data_ptr,int步,int偏移,int行,int cols); 3(* data_ptr,int step,int offset)with KernelArg :: ReadOnlyNoSize(umat);并且只有1(* data_prt)与KernelArg :: PtrReadOnly(umat).此规则对于WriteOnly和ReadWrite相同.

The point is the use of UMat. We recieve 5 parameters in the kernel (*data_ptr, int step, int offset, int rows, int cols) with KernelArg::ReadOnly(umat); 3 (*data_ptr, int step, int offset) with KernelArg::ReadOnlyNoSize(umat); and only 1 (*data_prt) with KernelArg::PtrReadOnly(umat). This rule is the same for WriteOnly and ReadWrite.

访问数据数组时需要步长和偏移量,因为由于内存地址对齐,UMat可能不是密集矩阵.

The step and offset are required when accessing the data array, since UMat may not be dense matrix due to the memory-address alignment.

cv :: ocl :: Image2D可以从UMat实例构造,并且可以直接传递给kernel.args().借助image2D_t和sampler_t,我们可以受益于GPU的硬件纹理单元进行线性插值采样(具有实值像素坐标).

cv::ocl::Image2D can be constructed from an UMat instance, and can be directly passed to kernel.args(). With image2D_t and sampler_t, we can benefit from GPU's hardware texture-units for linear-interpolation sampling (with real-valued pixel coordinates).

请注意,-D xxx = yyy"构建选项在内核代码中提供了从xxx到yyy的文本替换.

Note that the "-D xxx=yyy " build-option offers text replacement from xxx to yyy in the kernel code.

您可以在我的帖子中找到更多代码: http://qiita.com/tackson5/items/8dac6b083071d31baf00

You can find more codes at my post: http://qiita.com/tackson5/items/8dac6b083071d31baf00

这篇关于如何在OpenCV(3.0.0)OCL中启动自定义OpenCL内核?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆