当尺寸不是2的幂时减去向量的和? [英] reduction for sum of vector when size is not power of 2?
本文介绍了当尺寸不是2的幂时减去向量的和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
解决方案
您可以计算一个没有大小的矩阵的和的权力为二。查看示例:
#include< math.h>
#define N 1022 //总大小
__global__ void sum(int * A,int * C)
{
__shared__ int temp [blockDim.x];
int idx = threadIdx.x + blockDim.x * blockIdx.x;
int local_idx = threadIdx.x;
temp [local_idx] = A [idx];
int i = ceil(blockDim.x / 2);
__syncthreads();
while(i!= 0)
{
if(idx + i temp [local_idx] + = tmp [local_idx + i ];
i / = 2;
__syncthreads();
}
if(local_idx == 0)
C [blockIdx.x] = temp [0];
}
For the classical reduction algorithm on GPU, it works perfectly if the size of vector is the power of 2. What if it is not the case? At some point we will have to find the sum of odd number of element. What is the best way to deal with that?
解决方案
You can compute the sum of a matrix that doesn't have a size of a power of two. Look at the example :
#include <math.h>
#define N 1022 //total size
__global__ void sum(int *A, int *C)
{
__shared__ int temp[blockDim.x];
int idx = threadIdx.x+blockDim.x*blockIdx.x;
int local_idx = threadIdx.x;
temp[local_idx] = A[idx];
int i=ceil(blockDim.x/2);
__syncthreads();
while(i!=0)
{
if(idx+i<N && local_idx<i)
temp[local_idx] += tmp[local_idx+i];
i/=2;
__syncthreads();
}
if(local_idx == 0)
C[blockIdx.x] = temp[0];
}
这篇关于当尺寸不是2的幂时减去向量的和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文