如何用内核编译opencl项目 [英] how to compile opencl project with kernels

查看:335
本文介绍了如何用内核编译opencl项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我完全是一个opencl的初学者,我在互联网上搜索,并找到一些helloworld演示为opencl项目。通常在这样的最小项目中,有一个* .cl文件包含某种类型的opencl内核,一个* .c文件包含main函数。那么问题是如何编译这种项目使用命令行。我知道我应该使用某种类型的-lOpenCL标志在linux和-framework OpenCL在mac上。但我不知道将* .cl内核链接到我的主源文件。感谢您的评论或有用的链接。

I am totally a beginner on opencl, I searched around the internet and found some "helloworld" demos for opencl project. Usually in such sort of minimal project, there is a *.cl file contains some sort of opencl kernels and a *.c file contains the main function. Then the question is how do I compile this kind of project use a command line. I know I should use some sort of -lOpenCL flag on linux and -framework OpenCL on mac. But I have no idea to link the *.cl kernel to my main source file. Thank you for any comments or useful links.

推荐答案

在OpenCL中,包含设备内核代码的 .cl 通常在运行时被编译和构建。这意味着在你的主机OpenCL程序的某个地方,你必须编译和构建你的设备程序才能使用它。此功能可实现最大的可移植性。

In OpenCL, the .cl files that contain device kernel codes are usually being compiled and built at run-time. It means somewhere in your host OpenCL program, you'll have to compile and build your device program to be able to use it. This feature enables maximum portability.

让我们考虑一个我从两本书收集的例子。下面是一个非常简单的OpenCL内核,从两个全局数组中添加两个数字,并将它们保存在另一个全局数组中。我将这个代码保存在一个名为 vector_add_kernel.cl 的文件中。

Let's consider an example I collected from two books. Below is a very simple OpenCL kernel adding two numbers from two global arrays and saving them in another global array. I save this code in a file named vector_add_kernel.cl.

kernel void vecadd( global int* A, global int* B, global int* C ) {
    const int idx = get_global_id(0);
    C[idx] = A[idx] + B[idx];
}

下面是使用C ++编写的利用OpenCL C ++ API的主机代码。我将它保存在一个文件名为 ocl_vector_addition.cpp 旁边的保存我的 .cl 文件。

Below is the host code written in C++ that exploits OpenCL C++ API. I save it in a file named ocl_vector_addition.cpp beside where I saved my .cl file.

#include <iostream>
#include <fstream>
#include <string>
#include <memory>
#include <stdlib.h>

#define __CL_ENABLE_EXCEPTIONS
#if defined(__APPLE__) || defined(__MACOSX)
#include <OpenCL/cl.cpp>
#else
#include <CL/cl.hpp>
#endif

int main( int argc, char** argv ) {

    const int N_ELEMENTS=1024*1024;
    unsigned int platform_id=0, device_id=0;

    try{
        std::unique_ptr<int[]> A(new int[N_ELEMENTS]); // Or you can use simple dynamic arrays like: int* A = new int[N_ELEMENTS];
        std::unique_ptr<int[]> B(new int[N_ELEMENTS]);
        std::unique_ptr<int[]> C(new int[N_ELEMENTS]);

        for( int i = 0; i < N_ELEMENTS; ++i ) {
            A[i] = i;
            B[i] = i;
        }

        // Query for platforms
        std::vector<cl::Platform> platforms;
        cl::Platform::get(&platforms);

        // Get a list of devices on this platform
        std::vector<cl::Device> devices;
        platforms[platform_id].getDevices(CL_DEVICE_TYPE_GPU|CL_DEVICE_TYPE_CPU, &devices); // Select the platform.

        // Create a context
        cl::Context context(devices);

        // Create a command queue
        cl::CommandQueue queue = cl::CommandQueue( context, devices[device_id] );   // Select the device.

        // Create the memory buffers
        cl::Buffer bufferA=cl::Buffer(context, CL_MEM_READ_ONLY, N_ELEMENTS * sizeof(int));
        cl::Buffer bufferB=cl::Buffer(context, CL_MEM_READ_ONLY, N_ELEMENTS * sizeof(int));
        cl::Buffer bufferC=cl::Buffer(context, CL_MEM_WRITE_ONLY, N_ELEMENTS * sizeof(int));

        // Copy the input data to the input buffers using the command queue.
        queue.enqueueWriteBuffer( bufferA, CL_FALSE, 0, N_ELEMENTS * sizeof(int), A.get() );
        queue.enqueueWriteBuffer( bufferB, CL_FALSE, 0, N_ELEMENTS * sizeof(int), B.get() );

        // Read the program source
        std::ifstream sourceFile("vector_add_kernel.cl");
        std::string sourceCode( std::istreambuf_iterator<char>(sourceFile), (std::istreambuf_iterator<char>()));
        cl::Program::Sources source(1, std::make_pair(sourceCode.c_str(), sourceCode.length()));

        // Make program from the source code
        cl::Program program=cl::Program(context, source);

        // Build the program for the devices
        program.build(devices);

        // Make kernel
        cl::Kernel vecadd_kernel(program, "vecadd");

        // Set the kernel arguments
        vecadd_kernel.setArg( 0, bufferA );
        vecadd_kernel.setArg( 1, bufferB );
        vecadd_kernel.setArg( 2, bufferC );

        // Execute the kernel
        cl::NDRange global( N_ELEMENTS );
        cl::NDRange local( 256 );
        queue.enqueueNDRangeKernel( vecadd_kernel, cl::NullRange, global, local );

        // Copy the output data back to the host
        queue.enqueueReadBuffer( bufferC, CL_TRUE, 0, N_ELEMENTS * sizeof(int), C.get() );

        // Verify the result
        bool result=true;
        for (int i=0; i<N_ELEMENTS; i ++)
            if (C[i] !=A[i]+B[i]) {
                result=false;
                break;
            }
        if (result)
            std::cout<< "Success!\n";
        else
            std::cout<< "Failed!\n";

    }
    catch(cl::Error err) {
        std::cout << "Error: " << err.what() << "(" << err.err() << ")" << std::endl;
        return( EXIT_FAILURE );
    }

    std::cout << "Done.\n";
    return( EXIT_SUCCESS );
}

我在Ubuntu 12.04这样的机器上编译这个代码:

I compile this code on a machine with Ubuntu 12.04 like this:

g++ ocl_vector_addition.cpp -lOpenCL -std=c++11 -o ocl_vector_addition.o

它产生一个 ocl_vector_addition.o ,当我运行时,显示成功的输出。如果你看看编译命令,你会看到我们没有传递任何关于我们的 .cl 文件。我们只使用 -lOpenCL 标志为我们的程序启用OpenCL库。此外,不要被 -std = c ++ 11 命令分散注意力。因为我在主机代码中使用了 std :: unique_ptr ,所以我必须使用这个标志来成功编译。

It produces a ocl_vector_addition.o, which when I run, shows successful output. If you look at the compilation command, you see we have not passed anything about our .cl file. We only have used -lOpenCL flag to enable OpenCL library for our program. Also, don't get distracted by -std=c++11 command. Because I used std::unique_ptr in the host code, I had to use this flag for a successful compile.

那么 .cl 文件在哪里使用?如果你看看主机代码,你会发现我在下面重复的四个部分编号:

So where is this .cl file being used? If you look at the host code, you'll find four parts that I repeat in below numbered:

        //1. Read the program source
        std::ifstream sourceFile("vector_add_kernel.cl");
        std::string sourceCode( std::istreambuf_iterator<char>(sourceFile), (std::istreambuf_iterator<char>()));
        cl::Program::Sources source(1, std::make_pair(sourceCode.c_str(), sourceCode.length()));

        //2. Make program from the source code
        cl::Program program=cl::Program(context, source);

        //3. Build the program for the devices
        program.build(devices);

        //4. Make kernel
        cl::Kernel vecadd_kernel(program, "vecadd");

在第一步中,我们读取保存我们设备代码的文件的内容, std :: string 命名为 sourceCode 。然后我们创建一个字符串和它的长度对,并将其保存到 source ,其类型为 cl :: Program :: Sources 。在我们准备好代码后,我们为创建一个 cl :: program 对象程序 context 并将源代码加载到程序对象中。第三步是为设备编译(和链接)OpenCL代码的步骤。由于设备代码是在第三步中构建的,我们可以创建一个名为 vecadd_kernel 的内核对象,并将名为 vecadd 里面用我们的 cl :: kernel 对象。这几乎是在程序中编译 .cl 文件中涉及的一组步骤。

In the 1st step, we read the content of the file that holds our device code and put it into a std::string named sourceCode. Then we make a pair of the string and its length and save it to source which has the type cl::Program::Sources. After we prepared the code, we make a cl::program object named program for the context and load the source code into the program object. The 3rd step is the one in which the OpenCL code gets compiled (and linked) for the device. Since the device code is built in the 3rd step, we can create a kernel object named vecadd_kernel and associate the kernel named vecadd inside it with our cl::kernel object. This was pretty much the set of steps involved in compiling a .cl file in a program.

程序I显示和解释关于从内核源代码创建设备程序。另一个选择是使用二进制代码。使用二进制程序增强了应用程序加载时间,并允许程序的二进制分发,但限制可移植性,因为在一个设备上正常工作的二进制文件可能无法在另一个设备上工作。使用源代码和二进制文件创建程序也分别称为离线和在线编译(更多信息,请访问此处)。我在这里跳过,因为答案已经太长了。

The program I showed and explained about creates the device program from the kernel source code. Another option is to use binaries instead. Using binary program enhances application loading time and allows binary distribution of the program but limits portability since binaries that work fine on one device may not work on another device. Creating program using source code and binary are also called offline and online compilation respectively (more information here). I skip it here since the answer is already too long.

这篇关于如何用内核编译opencl项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆