OpenCL内核未向量化 [英] OpenCL kernel not vectorized
问题描述
我正在尝试构建内核以执行并行字符串搜索.为此,我倾向于使用有限状态机. fsm的转换表位于内核参数状态中.代码:
I am trying to build a kernel to do parallel string search. To this end I tend to use a finite state machine. The transition table of the fsm is in the kernel argument states. The code:
__kernel void Find ( __constant char *text,
const int offset,
const int tlenght,
__constant char *characters,
const int clength,
const int maxlength,
__constant int *states,
const int statesdim){
private char c;
private int state;
private const int id = get_global_id(0);
if (id<(tlenght-maxlength)) {
private int cIndex,sd,s,k;
for (int i=0; i<maxlength; i++) {
c = text[i+offset];
cIndex = -1;
for (int j=0; j<clength; j++) {
if (characters[j]==c) {
cIndex = j;
}
}
if (cIndex==-1) {
state = 0;
break;
} else {
s = states[state+cIndex*statesdim];
}
if (state<=0) break;
}
}
}
如果我使用iocgui编译此内核,则会得到结果:
If I compile this kernel using iocgui, I get the result:
Using default instruction set architecture.
Intel OpenCL CPU device was found!
Device name: Pentium(R) Dual-Core CPU T4400 @ 2.20GHz
Device version: OpenCL 1.1 (Build 31360.31426)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
Build started
Kernel <Find> was successfully vectorized
Done.
Build succeeded!
当我将确定新状态的行更改为:
When I change the line where the new state is determined to:
state = states[state+cIndex*statesdim];
结果是:
Using default instruction set architecture.
Intel OpenCL CPU device was found!
Device name: Pentium(R) Dual-Core CPU T4400 @ 2.20GHz
Device version: OpenCL 1.1 (Build 31360.31426)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
Build started
Kernel <Find> was not vectorized
Done.
Build succeeded!
推荐答案
语句
X = states[state+cIndex*statesdim];
不能向量化,因为索引不一定要评估对线程间后续字节的访问.
cannot be vectorized since the index is not necessarily evaluates to accesses to consequent bytes across threads.
请注意,在第一个内核中,您有目标变量s
,该目标变量没有写回到全局内存.因此,编译器可以优化代码并删除s = states[state+cIndex*statesdim];
语句.因此,看来您的陈述已被向量化,但事实并非如此.
Notice that in your first kernel, you have the destination variable s
where it has not written back to global memory. Therefore, compiler may optimize the code and remove the s = states[state+cIndex*statesdim];
statement. Therefore, it looks your statement has been vectorized but it is not so.
这篇关于OpenCL内核未向量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!