是否可以在OpenCL中并行运行sum计算? [英] Is it possible to run the sum computation in parallel in OpenCL?

查看:357
本文介绍了是否可以在OpenCL中并行运行sum计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是OpenCL的新手。但是,我理解C / C ++的基础知识和OOP。
我的问题如下:是不是有可能并行运行和计算任务?理论上是可能的吗?下面我将描述我试图做什么:



任务是,例如:

  double * values = new double [1000]; //让我们假装它里面有一些随机值
double sum = 0.0;

for(int i = 0; i <1000; i ++){
sum + = values [i];
}

我试图在OpenCL内核中做也许它在同一时间从不同的线程/任务访问相同的sum变量):

  __ kernel void calculate2dim * vectors1dim,
__global float output,
const unsigned int count){
int i = get_global_id(0);
output + = vectors1dim [i];
}

此代码错误。如果有人回答我,如果有理由可能并行运行这样的任务,如果是 - 如何!

解决方案

我提供的代码片段应该可以完成这项工作。



例如您有 N 个元素,工作组大小为 WS = 64 。我假设 N 2 * WS 的倍数(这很重要,一个工作组计算2 * WS个元素的总和)。然后你需要运行内核指定:

  globalSizeX = 2 * WS *(N /(2 * WS) 

因此 sum 数组的部分和 2 * WS 元素。 (例如 sum [1] )将包含其索引从 2 * WS 4 * WS-1 的元素的总和。 p>

如果你的globalSizeX是 2 * WS 或更少(这意味着你只有一个工作组),那么你就完成了。只需使用 sum [0] 即可。
如果没有 - 你需要重复程序,这次使用 sum 数组作为输入数组,并输出到其他数组(创建2个数组和它们之间的乒乓)。



还搜索Hilli Steele / Blelloch并行算法。
这篇文章

  __ kernel void par_sum(__ global unsigned int * input,__global unsigned int * sum)
{
int li = get_local_id(0);
int groupId = get_group_id(0);

__local int our_h [2 * get_group_size(0)];
our_h [2 * li + 0] = hist [2 * get_group_size(0)* blockId + 2 * li + 0];
our_h [2 * li + 1] = hist [2 * get_group_size(0)* blockId + 2 * li + 1];

//向上扫描
int width = 2;
int num_el = 2 * get_group_size(0)/ width;
int wby2 = width>> 1;

for(int i = 2 * BLK_SIZ> 1; i> 0; i>> = 1)
{

barrier(CLK_LOCL_MEM_FENCE);

if(li< num_el)
{
int idx = width *(li + 1)
our_h [idx] = our_h [idx] + our_h [(idx - wby2)];
}

width<< = 1;
wby2 = width>> 1;
num_el>> = 1;
}

障碍(CLK_LOCL_MEM_FENCE);

// down-sweep
if(0 == li)
sum [groupId] = our_h [2 * get_group_size(0)-1]; // save sum
}


I am a newbie in OpenCL. However, I understand the C/C++ basics and the OOP. My question is as follows: is it somehow possible to run the sum computation task in parallel? Is it theoretically possible? Below I will describe what I've tried to do:

The task is, for example:

double* values = new double[1000]; //let's pretend it has some random values inside
double sum = 0.0;

for(int i = 0; i < 1000; i++) {
    sum += values[i];
}

What I tried to do in OpenCL kernel (and I feel it is wrong because perhaps it accesses the same "sum" variable from different threads/tasks at the same time):

__kernel void calculate2dim(__global float* vectors1dim,
                            __global float output,
                            const unsigned int count) {
    int i = get_global_id(0);
    output += vectors1dim[i];
}

This code is wrong. I will highly appreciate if anyone answers me if it is theoretically possible to run such tasks in parallel and if it is - how!

解决方案

The piece of code I've provided for reference should do the job.

E.g. you have N elements, and size of your workgroup is WS = 64. I assume that N is multiple of 2*WS (this is important, one workgroup calculates sum of 2*WS elements). Then you need to run kernel specifying:

globalSizeX = 2*WS*(N/(2*WS));

As a result sum array will have partial sums of 2*WS elements. ( e.g. sum[1] - will contain sum of elements whose indices are from 2*WS to 4*WS-1).

If your globalSizeX is 2*WS or less (which means that you have only one workgroup), then you are done. Just use sum[0] as a result. If not - you need to repeat procedure, this time using sum array as input array and output to other array (create 2 arrays and ping-pong between them). And so on untill you will have only one workgroup.

Search also for Hilli Steele / Blelloch parallel algorithms. This article could be useful as well

Here is the actual example:

__kernel void par_sum(__global unsigned int* input, __global unsigned int* sum)
{
    int li = get_local_id(0);
    int groupId = get_group_id(0);

    __local int our_h[2 * get_group_size(0)];
    our_h[2*li + 0] = hist[2*get_group_size(0)*blockId + 2*li + 0];
    our_h[2*li + 1] = hist[2*get_group_size(0)*blockId + 2*li + 1];

    // sweep up
    int width = 2;
    int num_el = 2*get_group_size(0)/width;
    int wby2 = width>>1;

    for(int i = 2*BLK_SIZ>>1; i>0; i>>=1)
    {

        barrier(CLK_LOCL_MEM_FENCE);

        if(li < num_el)
        {
            int idx = width*(li+1) - 1;
            our_h[idx] = our_h[idx] + our_h[(idx - wby2)];
        }

        width<<=1;
        wby2 = width>>1;
        num_el>>=1;
    }

        barrier(CLK_LOCL_MEM_FENCE);

    // down-sweep
    if(0 == li)
        sum[groupId] = our_h[2*get_group_size(0)-1]; // save sum
}

这篇关于是否可以在OpenCL中并行运行sum计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆