使用CUDA减少大向量中排列的等长的多个块 [英] Reduce multiple blocks of equal length that are arranged in a big vector Using CUDA

查看:87
本文介绍了使用CUDA减少大向量中排列的等长的多个块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种减少长度相等的多个块的快速方法 排列成一个大向量. 我有N个子数组(连续元素),它们排列成一个大数组.每个子数组都有一个固定的大小:k. 所以整个数组的大小是:N * K

I am looking for a fast way to reduce multiple blocks of equal length that are arranged as a big vector. I have N subarrays(contiguous elements) that are arranged in one big array. each sub array has a fixed size : k. so the size of the whole array is : N*K

我正在做的是将内核调用N次.在每次计算子数组的约简时,如下所示: 我将遍历大向量中包含的所有子数组:

What I'm doing is to call the kernel N times. in each time it computes the reduction of the subarray as follow: I will iterate over all the subarrays contained in the big vector :

    for(i=0;i<N;i++){
       thrust::device_vector< float > Vec(subarray, subarray+k);
       float sum = thrust::reduce(Vec.begin(), Vec.end(), (float)0, thrust::plus<float>());
       printf("sum %f\n",sum);
 }

对于纯CUDA,我将这样做(伪代码):

for pure CUDA i will do it like this (pseudo code):

 for(i=0;i<N;i++){
        reduction_kernel(subarray)

         }

您是否有另一种解决方案可以一次执行连续子数组的约简?使用纯CUDA或Thrust

do you have another solution to perform the reduction of the contiguous subarrays in once? using pure CUDA or Thrust

推荐答案

您要的是分段减少.可以使用 thrust::reduce_by_key 来完成此操作长度为N * K,我们将需要一个键"向量来定义每个片段-片段的大小不必相同,只要键向量可以像这样区分片段即可:

What you're asking for is a segmented reduction. This can be done in thrust using thrust::reduce_by_key In addition to your data vector of length N*K, we will need a "key" vector that defines each segment -- the segments don't have to be the same size, as long as the key vector differentiates segments like so:

data:  1 3 2 3 1 4 2 3 2 1 4 2 ...
keys:  0 0 0 1 1 1 0 0 0 3 3 3 ...
seg:   0 0 0 1 1 1 2 2 2 3 3 3 ...

每当键序列发生变化时,键就会划出一个新的段(请注意,在上面的示例中,我有两个单独的段是使用同一键划定的-推力不会将这些段组合在一起,而是将它们分开对待,因为存在1个或多个不同的中间键值).您实际上没有这些数据,但是为了提高速度和效率,由于您的段的长度相等,因此我们可以结合使用推力

The keys delineate a new segment any time the key sequence changes (note that I have two separate segments in the above example that are delineated using the same key - thrust doesn't group such segments together but treats them separately because there are 1 or more intervening key values that are different). You don't actually have this data, but for speed and efficiency, since your segments are of equal length, we can produce the necessary key sequence "on the fly" using a combination of thrust fancy iterators.

精美的迭代器将合并为:

The fancy iterators will combine to:

  1. 产生一个线性序列0 1 2 3 ...(通过counting_iterator)
  2. 将线性序列的每个成员除以段长度K(通过transform_iterator).我在这里使用推力占位符方法,所以我不必写函子用于转换迭代器.
  1. produce a linear sequence 0 1 2 3 ... (via counting_iterator)
  2. divide each member of the linear sequence by K, the segment length (via transform_iterator). I'm using thrust placeholder methodology here so I don't have to write a functor for the transform iterator.

这将产生必要的段键序列.

This will produce the necessary segment-key sequence.

这是一个可行的示例:

$ cat t1282.cu
#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <thrust/copy.h>
#include <thrust/execution_policy.h>
#include <iostream>

const int N = 1000;  // sequences
const int K = 100;   // length of sequence
typedef int mytype;

using namespace thrust::placeholders;

int main(){

  thrust::device_vector<mytype> data(N*K, 1);
  thrust::device_vector<mytype> sums(N);
  thrust::reduce_by_key(thrust::device, thrust::make_transform_iterator(thrust::counting_iterator<int>(0), _1/K), thrust::make_transform_iterator(thrust::counting_iterator<int>(N*K), _1/K), data.begin(), thrust::discard_iterator<int>(), sums.begin());
  // just display the first 10 results
  thrust::copy_n(sums.begin(), 10, std::ostream_iterator<mytype>(std::cout, ","));
  std::cout << std::endl;
}

$ nvcc -arch=sm_35 -o t1282 t1282.cu
$ ./t1282
100,100,100,100,100,100,100,100,100,100,
$

这篇关于使用CUDA减少大向量中排列的等长的多个块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆