如何在OpenCL内核中使用C ++模板? [英] How to use C++ templates in OpenCL kernels?

查看:301
本文介绍了如何在OpenCL内核中使用C ++模板?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是OpenCL的新手。

I'm a novice in OpenCL.

我有一个使用模板的算法。它适用于OpenMP并行化,但现在数据量已经增长,处理它的唯一方法是重写它以使用OpenCL。
我可以很容易地使用MPI来构建集群,但类似特斯拉的GPU比集群更便宜:)

I have an algorithm which uses templates. It worked well with OpenMP parallelization but now the amount of data has grown and the only way to process it is to rewrite it to use OpenCL. I can easily use MPI to build it for cluster but Tesla-like GPU is much cheaper than cluster :)

有任何方法使用C ++模板OpenCL内核?

Is there any way to use C++ templates in OpenCL kernel?

是否可以通过C ++编译器或某些工具扩展模板,然后使用这样改变的内核函数?

Is it possible to somehow expand templates by C++ compiler or some tool and after that use so changed kernel function?

EDIT。解决方法的想法是以某种方式从模板的C ++代码生成C99兼容的代码。

EDIT. The idea of a workaround is to somehow generate C99-compatible code from C++ code from the template.

我发现了关于Comeau的以下信息:

I found a following about Comeau:

Comeau C ++ 4.3.3是一个完整的真正的编译器,执行完整的语法检查,全语义检查,完全错误检查和所有其他编译器职责。输入C ++代码被翻译成内部编译器树和符号表,看起来不像C ++或C.同样,它生成一个内部专有的中间形式。但是不使用专有的后端代码生成器,Comeau C ++ 4.3.3生成C代码作为其输出。除了C ++的技术优势,Comeau C ++ 4.3.3等产品的C生成方面已经被推崇为C ++的成功原因,因为它能够被带到大量的平台,由于C编译器的共同可用性。

Comeau C++ 4.3.3 is a full and true compiler that performs full syntax checking, full semantic checking, full error checking and all other compiler duties. Input C++ code is translated into internal compiler trees and symbol tables looking nothing like C++ or C. As well, it generates an internal proprietary intermediate form. But instead of using a proprietary back end code generator, Comeau C++ 4.3.3 generates C code as its output. Besides the technical advantages of C++, the C generating aspects of products like Comeau C++ 4.3.3 have been touted as a reason for C++'s success since it was able to be brought to a large number of platforms due to the common availability of C compilers.

C编译器仅用于获取本机代码生成。这意味着Comeau C ++专门针对每个平台上的特定C编译器使用。请注意,这是一个要求,裁缝必须由Comeau做。否则,生成的C代码是无意义的,因为它绑定到特定的平台(其中平台至少包括CPU,OS和C编译器),此外,生成的C代码不是独立的。因此,它不能单独使用(注意,这是使用Comeau C ++时的技术和法律要求),这就是为什么通常没有选项看到生成的C代码:它几乎总是无益的和编译过程,包括其产生,应被视为翻译的内部阶段。

The C compiler is used merely and only for the sake of obtaining native code generation. This means that Comeau C++ is tailored for use with specific C compilers on each respective platform. Please note that it is a requirement that tailoring must be done by Comeau. Otherwise, the generated C code is meaningless as it is tied to a specific platform (where platform includes at least the CPU, OS, and C compiler) and furthermore, the generated C code is not standalone. Therefore, it cannot be used by itself (note that this is both a technical and legal requirement when using Comeau C++), and this is why there is not normally an option to see the generated C code: it's almost always unhelpful and the compile process, including its generation, should be considered as internal phases of translation.

推荐答案

使用纯C语言模仿模板有一种旧的方法。
它是基于单个文件多次(不包括guard)。
由于OpenCL具有全功能预处理器并允许包含文件,因此可以使用此技巧。

There is an old way to emulate templates in pure C language. It is based on including a single file several times (without include guard). Since OpenCL has fully functional preprocessor and allows including files, this trick can be used.

这里有一个很好的解释:
http://arnold.uthar.net/index.php?n=Work.TemplatesC

Here is a good explanation: http://arnold.uthar.net/index.php?n=Work.TemplatesC

它仍然比C ++模板更复杂:代码必须拆分成几个部分,你必须显式实例化模板的每个实例。此外,你似乎不能做一些有用的事情,如实现factorial作为递归模板。

It is still much messier than C++ templates: the code has to be splitted into several parts, and you have to explicitly instantiate each instance of template. Also, it seems that you cannot do some useful things like implementing factorial as a recursive template.

让我们将这个想法应用到OpenCL。假设我们想通过Newton-Raphson迭代计算反平方根(通常不是一个好主意)。但是,浮点类型和迭代次数可能不同。

Let's apply the idea to OpenCL. Suppose that we want to calculate inverse square root by Newton-Raphson iteration (generally not a good idea). However, the floating point type and the number of iterations may vary.

首先,我们需要一个帮助头(templates.h):

First of all, we need a helper header ("templates.h"):

#ifndef TEMPLATES_H_
#define TEMPLATES_H_

#define CAT(X,Y,Z) X##_##Y##_##Z   //concatenate words
#define TEMPLATE(X,Y,Z) CAT(X,Y,Z)

#endif

然后,我们在NewtonRaphsonRsqrt.cl中写入模板函数:

Then, we write template function in "NewtonRaphsonRsqrt.cl":

#include "templates.h"

real TEMPLATE(NewtonRaphsonRsqrt, real, iters) (real x, real a) {
    int i;
    for (i = 0; i<iters; i++) {
        x *= ((real)1.5 - (0.5*a)*x*x);
    }
    return x;
}

在您的主要.cl文件中,实例化此模板如下:

In your main .cl file, instantiate this template as follows:

#define real float
#define iters 2
#include "NewtonRaphsonRsqrt.cl"  //defining NewtonRaphsonRsqrt_float_2

#define real double
#define iters 3
#include "NewtonRaphsonRsqrt.cl"  //defining NewtonRaphsonRsqrt_double_3

#define real double
#define iters 4
#include "NewtonRaphsonRsqrt.cl"  //defining NewtonRaphsonRsqrt_double_4

使用它像这样:

double prec = TEMPLATE(NewtonRaphsonRsqrt, double, 4) (1.5, 0.5);
float approx = TEMPLATE(NewtonRaphsonRsqrt, float, 2) (1.5, 0.5);

这篇关于如何在OpenCL内核中使用C ++模板?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆