当尺寸不是2的幂时减去向量的和? [英] reduction for sum of vector when size is not power of 2?

查看:164
本文介绍了当尺寸不是2的幂时减去向量的和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于GPU上的经典缩减算法,如果向量的大小是2的幂,那么它可以很好地工作。如果不是这样,怎么办?在某一点上,我们将必须找到元素的奇数的和。

解决方案

您可以计算一个没有大小的矩阵的和的权力为二。查看示例:

  #include< math.h> 
#define N 1022 //总大小
__global__ void sum(int * A,int * C)
{
__shared__ int temp [blockDim.x];
int idx = threadIdx.x + blockDim.x * blockIdx.x;
int local_idx = threadIdx.x;
temp [local_idx] = A [idx];
int i = ceil(blockDim.x / 2);
__syncthreads();
while(i!= 0)
{
if(idx + i temp [local_idx] + = tmp [local_idx + i ];
i / = 2;
__syncthreads();

}
if(local_idx == 0)
C [blockIdx.x] = temp [0];
}


For the classical reduction algorithm on GPU, it works perfectly if the size of vector is the power of 2. What if it is not the case? At some point we will have to find the sum of odd number of element. What is the best way to deal with that?

解决方案

You can compute the sum of a matrix that doesn't have a size of a power of two. Look at the example :

#include <math.h>
#define N 1022 //total size
__global__ void sum(int *A, int *C)
{
        __shared__ int temp[blockDim.x];
        int idx = threadIdx.x+blockDim.x*blockIdx.x;
        int local_idx = threadIdx.x;
        temp[local_idx] = A[idx];
        int i=ceil(blockDim.x/2);
        __syncthreads();
        while(i!=0)
        {
                 if(idx+i<N && local_idx<i)
                          temp[local_idx] += tmp[local_idx+i];
                 i/=2;
                 __syncthreads();

        }
       if(local_idx == 0)
           C[blockIdx.x] = temp[0]; 
}

这篇关于当尺寸不是2的幂时减去向量的和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆