是否可以在OpenCL中并行运行sum计算? [英] Is it possible to run the sum computation in parallel in OpenCL?
问题描述
我是OpenCL的新手。但是,我理解C / C ++的基础知识和OOP。
我的问题如下:是不是有可能并行运行和计算任务?理论上是可能的吗?下面我将描述我试图做什么:
任务是,例如:
double * values = new double [1000]; //让我们假装它里面有一些随机值
double sum = 0.0;
for(int i = 0; i <1000; i ++){
sum + = values [i];
}
我试图在OpenCL内核中做也许它在同一时间从不同的线程/任务访问相同的sum变量):
__ kernel void calculate2dim * vectors1dim,
__global float output,
const unsigned int count){
int i = get_global_id(0);
output + = vectors1dim [i];
}
此代码错误。如果有人回答我,如果有理由可能并行运行这样的任务,如果是 - 如何!
我提供的代码片段应该可以完成这项工作。
例如您有 N 个元素,工作组大小为 WS = 64 。我假设 N 是 2 * WS 的倍数(这很重要,一个工作组计算2 * WS个元素的总和)。然后你需要运行内核指定:
globalSizeX = 2 * WS *(N /(2 * WS)
因此 sum 数组的部分和 2 * WS 元素。 (例如 sum [1] )将包含其索引从 2 * WS 到 4 * WS-1 的元素的总和。 p>
如果你的globalSizeX是 2 * WS 或更少(这意味着你只有一个工作组),那么你就完成了。只需使用 sum [0] 即可。
如果没有 - 你需要重复程序,这次使用 sum 数组作为输入数组,并输出到其他数组(创建2个数组和它们之间的乒乓)。
还搜索Hilli Steele / Blelloch并行算法。
这篇文章
__ kernel void par_sum(__ global unsigned int * input,__global unsigned int * sum)
{
int li = get_local_id(0);
int groupId = get_group_id(0);
__local int our_h [2 * get_group_size(0)];
our_h [2 * li + 0] = hist [2 * get_group_size(0)* blockId + 2 * li + 0];
our_h [2 * li + 1] = hist [2 * get_group_size(0)* blockId + 2 * li + 1];
//向上扫描
int width = 2;
int num_el = 2 * get_group_size(0)/ width;
int wby2 = width>> 1;
for(int i = 2 * BLK_SIZ> 1; i> 0; i>> = 1)
{
barrier(CLK_LOCL_MEM_FENCE);
if(li< num_el)
{
int idx = width *(li + 1)
our_h [idx] = our_h [idx] + our_h [(idx - wby2)];
}
width<< = 1;
wby2 = width>> 1;
num_el>> = 1;
}
障碍(CLK_LOCL_MEM_FENCE);
// down-sweep
if(0 == li)
sum [groupId] = our_h [2 * get_group_size(0)-1]; // save sum
}
I am a newbie in OpenCL. However, I understand the C/C++ basics and the OOP. My question is as follows: is it somehow possible to run the sum computation task in parallel? Is it theoretically possible? Below I will describe what I've tried to do:
The task is, for example:
double* values = new double[1000]; //let's pretend it has some random values inside
double sum = 0.0;
for(int i = 0; i < 1000; i++) {
sum += values[i];
}
What I tried to do in OpenCL kernel (and I feel it is wrong because perhaps it accesses the same "sum" variable from different threads/tasks at the same time):
__kernel void calculate2dim(__global float* vectors1dim,
__global float output,
const unsigned int count) {
int i = get_global_id(0);
output += vectors1dim[i];
}
This code is wrong. I will highly appreciate if anyone answers me if it is theoretically possible to run such tasks in parallel and if it is - how!
The piece of code I've provided for reference should do the job.
E.g. you have N elements, and size of your workgroup is WS = 64. I assume that N is multiple of 2*WS (this is important, one workgroup calculates sum of 2*WS elements). Then you need to run kernel specifying:
globalSizeX = 2*WS*(N/(2*WS));
As a result sum array will have partial sums of 2*WS elements. ( e.g. sum[1] - will contain sum of elements whose indices are from 2*WS to 4*WS-1).
If your globalSizeX is 2*WS or less (which means that you have only one workgroup), then you are done. Just use sum[0] as a result. If not - you need to repeat procedure, this time using sum array as input array and output to other array (create 2 arrays and ping-pong between them). And so on untill you will have only one workgroup.
Search also for Hilli Steele / Blelloch parallel algorithms. This article could be useful as well
Here is the actual example:
__kernel void par_sum(__global unsigned int* input, __global unsigned int* sum)
{
int li = get_local_id(0);
int groupId = get_group_id(0);
__local int our_h[2 * get_group_size(0)];
our_h[2*li + 0] = hist[2*get_group_size(0)*blockId + 2*li + 0];
our_h[2*li + 1] = hist[2*get_group_size(0)*blockId + 2*li + 1];
// sweep up
int width = 2;
int num_el = 2*get_group_size(0)/width;
int wby2 = width>>1;
for(int i = 2*BLK_SIZ>>1; i>0; i>>=1)
{
barrier(CLK_LOCL_MEM_FENCE);
if(li < num_el)
{
int idx = width*(li+1) - 1;
our_h[idx] = our_h[idx] + our_h[(idx - wby2)];
}
width<<=1;
wby2 = width>>1;
num_el>>=1;
}
barrier(CLK_LOCL_MEM_FENCE);
// down-sweep
if(0 == li)
sum[groupId] = our_h[2*get_group_size(0)-1]; // save sum
}
这篇关于是否可以在OpenCL中并行运行sum计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!