使用CUDA Thrust复制向量多次 [英] Replicate a vector multiple times using CUDA Thrust

查看:218
本文介绍了使用CUDA Thrust复制向量多次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用CUDA Thrust解决问题。

I am trying to solve a problem using CUDA Thrust.

我有一个主机数组 3 。是否可以使用Thrust创建一个 384 元素的设备数组,其中我的主机数组中的 3 重复 128 次( 128 x 3 = 384 )?

I have a host array with 3 elements. Is it possible, using Thrust, to create a device array of 384 elements in which the 3 elements in my host array is repeated 128 times (128 x 3 = 384)?

一般来说,从一个 3 元素的数组开始,如何使用Thrust生成一个设备数组size X ,其中 X = Y x 3 ,即 Y 是重复次数?

Generally speaking, starting from an array of 3 elements, how can I use Thrust to generate a device array of size X, where X = Y x 3, i.e. Y is the number of repetitions?

推荐答案

一种可能的方法:


  1. 创建适当大小的设备向量

  2. 创建3 strided ranges ,一个用于最终输出(设备)向量中的每个元素位置{1,2,3}。

  3. use thrust :: fill用合适的(主向量)元素{1,2,3}填充3个strided范围

  1. create a device vector of appropriate size
  2. create 3 strided ranges, one for each of the element positions {1, 2, 3} in the final output (device) vector
  3. use thrust::fill to fill each of the 3 strided ranges with the appropriate (host vector) element {1, 2, 3}

这段代码是一个小的修改的stride范例示例来演示。您可以将 REPS 定义更改为128,以查看完全展开到384个输出元素:

This code is a trivial modification of the strided range example to demonstrate. You can change the REPS define to 128 to see the full expansion to 384 output elements:

#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/functional.h>

#include <thrust/fill.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>

// for printing
#include <thrust/copy.h>
#include <ostream>


#define STRIDE 3
#define REPS  15  // change to 128 if you like
#define DSIZE (STRIDE*REPS)

// this example illustrates how to make strided access to a range of values
// examples:
//   strided_range([0, 1, 2, 3, 4, 5, 6], 1) -> [0, 1, 2, 3, 4, 5, 6]
//   strided_range([0, 1, 2, 3, 4, 5, 6], 2) -> [0, 2, 4, 6]
//   strided_range([0, 1, 2, 3, 4, 5, 6], 3) -> [0, 3, 6]
//   ...

template <typename Iterator>
class strided_range
{
    public:

    typedef typename thrust::iterator_difference<Iterator>::type difference_type;

    struct stride_functor : public thrust::unary_function<difference_type,difference_type>
    {
        difference_type stride;

        stride_functor(difference_type stride)
            : stride(stride) {}

        __host__ __device__
        difference_type operator()(const difference_type& i) const
        {
            return stride * i;
        }
    };

    typedef typename thrust::counting_iterator<difference_type>                   CountingIterator;
    typedef typename thrust::transform_iterator<stride_functor, CountingIterator> TransformIterator;
    typedef typename thrust::permutation_iterator<Iterator,TransformIterator>     PermutationIterator;

    // type of the strided_range iterator
    typedef PermutationIterator iterator;

    // construct strided_range for the range [first,last)
    strided_range(Iterator first, Iterator last, difference_type stride)
        : first(first), last(last), stride(stride) {}

    iterator begin(void) const
    {
        return PermutationIterator(first, TransformIterator(CountingIterator(0), stride_functor(stride)));
    }

    iterator end(void) const
    {
        return begin() + ((last - first) + (stride - 1)) / stride;
    }

    protected:
    Iterator first;
    Iterator last;
    difference_type stride;
};

int main(void)
{
    thrust::host_vector<int> h_data(STRIDE);
    h_data[0] = 1;
    h_data[1] = 2;
    h_data[2] = 3;

    thrust::device_vector<int> data(DSIZE);

    typedef thrust::device_vector<int>::iterator Iterator;
    strided_range<Iterator> pos1(data.begin(), data.end(), STRIDE);
    strided_range<Iterator> pos2(data.begin()+1, data.end(), STRIDE);
    strided_range<Iterator> pos3(data.begin()+2, data.end(), STRIDE);

    thrust::fill(pos1.begin(), pos1.end(), h_data[0]);
    thrust::fill(pos2.begin(), pos2.end(), h_data[1]);
    thrust::fill(pos3.begin(), pos3.end(), h_data[2]);


    // print the generated data
    std::cout << "data: ";
    thrust::copy(data.begin(), data.end(), std::ostream_iterator<int>(std::cout, " "));  std::cout << std::endl;

    return 0;
}

这篇关于使用CUDA Thrust复制向量多次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆