如何在OpenCL中累积向量? [英] How to accumulate vectors in OpenCL?
问题描述
我有一组循环运行的操作.
I have a set of operations running in a loop.
for(int i = 0; i < row; i++)
{
sum += arr1[0] - arr2[0]
sum += arr1[0] - arr2[0]
sum += arr1[0] - arr2[0]
sum += arr1[0] - arr2[0]
arr1 += offset1;
arr2 += offset2;
}
现在我正在尝试矢量化这样的操作
Now I'm trying to vectorize the operations like this
for(int i = 0; i < row; i++)
{
convert_int4(vload4(0, arr1) - vload4(0, arr2));
arr1 += offset1;
arr2 += offset2;
}
但是如何在不使用循环的情况下在标量sum
中累积结果向量?
But how do I accumulate the resulting vector in the scalar sum
without using a loop?
我正在使用OpenCL 2.0.
I'm using OpenCL 2.0.
推荐答案
我找到了一种解决方案,似乎是我期望解决其问题的最接近的方法.
I have found a solution which seems to be the closest way I could have expected to solve my problem.
uint sum = 0;
uint4 S;
for(int i = 0; i < row; i++)
{
S += convert_uint4(vload4(0, arr1) - vload4(0, arr2));
arr1 += offset1;
arr2 += offset2;
}
S.s01 = S.s01 + S.s23;
sum = S.s0 + S.s1;
OpenCL 2.0为向量提供了此功能,其中向量的元素可以用如上所示的加法运算连续替换.这最多可以支持大小为16的向量.较大的操作可以分解为较小操作的因子.例如,要添加大小为32的两个向量之间的差的绝对值,我们可以执行以下操作:
OpenCL 2.0 provides this functionality with vectors where the elements of the vectors can successively be replaced with the addition operation as shown above. This can support up to a vector of size 16. Larger operations can be split into factors of smaller operations. For example, for adding the absolute values of differences between two vectors of size 32, we can do the following:
uint sum = 0;
uint16 S0, S1;
for(int i = 0; i < row; i++)
{
S0 += convert_uint16(abs(vload16(0, arr1) - vload16(0, arr2)));
S1 += convert_uint16(abs(vload16(1, arr1) - vload16(1, arr2)));
arr1 += offset1;
arr2 += offset2;
}
S0 = S0 + S1;
S0.s01234567 = S0.s01234567 + S0.s89abcdef;
S0.s0123 = S0.s0123 + S0.s4567;
S0.s01 = S0.s01 + S0.s23;
sum = S0.s0 + S0.s1;
这篇关于如何在OpenCL中累积向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!