使用CUDA Thrust确定每个矩阵列中的最小元素及其位置 [英] Determining the least element and its position in each matrix column with CUDA Thrust

查看:806
本文介绍了使用CUDA Thrust确定每个矩阵列中的最小元素及其位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当简单的问题,但我不能找出一个优雅的解决方案。



我有一个Thrust代码,产生 c 包含值的相同大小的向量。假设每个这些 c 向量都有一个索引。我想对每个向量位置得到值为最低的 c 向量的索引:



示例:

  C0 =(0,10,20,3,40)
C1 = ,3,5,10)

我会得到一个包含 C 向量具有最低值:

  result = 1,0,1)



我已经考虑过使用推力拉链迭代器,问题:我可以压缩所有 c 向量,并实现一个任意的变换,它接受一个元组并返回其最低值的索引,但是:


  1. 如何迭代元组的内容?

  2. 由于我理解元组只能存储 10 元素,并且可以有多少 10 c

然后我想到这样做:不是使用 c 它们都在单个向量 C 中,然后生成引用位置的键并执行按键的稳定排序,这将从相同位置重新组合向量条目。在给出的示例中:

  C =(0,10,20,3,40,1,2,3, 5,10)
keys =(0,1,2,3,4,0,1,2,3,4)
在通过键稳定排序后:
output =(0, 1,10,2,20,3,3,5,40,10)
keys =(0,0,1,1,2,2,3,3,4,4)

然后使用向量中的位置生成键,使用 c 向量,然后使用自定义函子执行reduce by键,对于每个缩减,输出具有最小值的索引。在示例中:

  input =(0,1,10,2,20,3,3,5,40,10 )
indexes =(0,1,0,1,0,1,0,1,0,1)
keys =(0,0,1,1,2,2,3,3 ,4,4)
在压缩输入和索引上按下键之后:
output =(0,1,1,0,1)

但是,如何编写这样的函数用于reduce by键操作?

解决方案

因为你的向量的长度必须相同。最好将它们连接在一起,并将它们作为一个矩阵C处理。



然后你的问题变成在行主矩阵中找到每个列的min元素的索引。


  1. 将行主题更改为col-major;



  2. 在第1步中,您建议使用 stable_sort_by_key 重新排列元素顺序,这不是一个有效的方法。由于重排可以直接计算给定的矩阵的#row和#col。在推力中,它可以用排列迭代器来完成:

      thrust :: make_permutation_iterator(
    c.begin ,
    thrust :: make_transform_iterator(
    thrust :: make_counting_iterator((int)0),
    (_1%row)* col + _1 / row)

    在步骤2中, reduce_by_key 可以做到你想要的。在你的情况下,减少二进制操作函数很容易,因为比较元组(你的压缩矢量的元素)已经被定义为比较元组的第一个元素,并且它的推力支持

      thrust :: minimum < thrust :: tuple< float,int> >()

    整个程序如下所示。

      #include< iterator>需要使用Thrust 1.6.0+,因为我们在花哨迭代器中使用了占位符。 
    #include< algorithm>

    #include< thrust / device_vector.h>
    #include< thrust / iterator / counting_iterator.h>
    #include< thrust / iterator / transform_iterator.h>
    #include< thrust / iterator / permutation_iterator.h>
    #include< thrust / iterator / zip_iterator.h>
    #include< thrust / iterator / discard_iterator.h>
    #include< thrust / reduce.h>
    #include< thrust / functional.h>

    使用命名空间thrust :: placeholder;

    int main()
    {

    const int row = 2;
    const int col = 5;
    float initc [] =
    {0,10,20,3,40,1,2,3,5,10}
    thrust :: device_vector< float> c(initc,initc + row * col);

    thrust :: device_vector< float> minval(col);
    thrust :: device_vector< int> minidx(col);

    thrust :: reduce_by_key(
    thrust :: make_transform_iterator(
    thrust :: make_counting_iterator((int)0),
    _1 / row),
    thrust :: make_transform_iterator(
    thrust :: make_counting_iterator((int)0),
    _1 / row)+ row * col,
    thrust :: make_zip_iterator(
    thrust :: make_tuple (
    thrust :: make_permutation_iterator(
    c.begin(),
    thrust :: make_transform_iterator(
    thrust :: make_counting_iterator((int)0),(_1%row)* ,
    thrust :: make_discard_iterator(),
    push :: make_counting_iterator((int)0),_1%row)
    thrust :: make_zip_iterator(
    thrust :: make_tuple(
    minval.begin(),
    minidx.begin())),
    thrust :: equal_to< int> ;(),
    thrust :: minimum< thrust :: tuple< float,int> >()
    );

    std :: copy(minidx.begin(),minidx.end(),std :: ostream_iterator< int>(std :: cout,));
    std :: cout<< std :: endl;
    return 0;
    }

    其余两个问题可能会影响效果。


    1. 必须输出最小值,这不是必需的;

    2. reduce_by_key 被设计用于具有不同长度的段,它可能不是用于在具有相同长度的段上进行缩减的最快算法。

    编写您自己的内核可能是最高性能的最佳解决方案。


    I have a fairly simple problem but I cannot figure out an elegant solution to it.

    I have a Thrust code which produces c vectors of same size containing values. Let say each of these c vectors have an index. I would like for each vector position to get the index of the c vector for which the value is the lowest:

    Example:

    C0 =     (0,10,20,3,40)
    C1 =     (1,2 ,3 ,5,10)
    

    I would get as result a vector containing the index of the C vector which has the lowest value:

    result = (0,1 ,1 ,0,1)
    

    I have thought about doing it using thrust zip iterators, but have come accross issues: I could zip all the c vectors and implement an arbitrary transformation which takes a tuple and returns the index of its lowest value, but:

    1. How to iterate over the contents of a tuple?
    2. As I understand tuples can only store up to 10 elements and there can be much more than 10 c vectors.

    I have then thought about doing it this way: Instead of having c separate vectors, append them all in a single vector C, then generate keys referencing the positions and perform a stable sort by key which will regroup the vector entries from a same position together. In the example that would give:

    C =      (0,10,20,3,40,1,2,3,5,10)
    keys =   (0,1 ,2 ,3,4 ,0,1,2,3,4 )
    after stable sort by key:
    output = (0,1,10,2,20,3,3,5,40,10)
    keys =   (0,0,1 ,1,2 ,2,3,3,4 ,4 )
    

    Then generate keys with the positions in the vector, zip the output with the index of the c vectors and then perform a reduce by key with a custom functor which for each reduction outputs the index with the lowest value. In the example:

    input =  (0,1,10,2,20,3,3,5,40,10)
    indexes= (0,1,0 ,1,0 ,1,0,1,0 ,1)
    keys =   (0,0,1 ,1,2 ,2,3,3,4 ,4)
    after reduce by keys on zipped input and indexes:
    output = (0,1,1,0,1)
    

    However, how to write such functor for the reduce by key operation?

    解决方案

    Since the length of your vectors has to be the same. It's better to concatenate them together and treat them as a matrix C.

    Then your problem becomes finding the indices of the min element of each column in a row-major matrix. It can be solved as follows.

    1. change the row-major to col-major;
    2. find indices for each column.

    In step 1, you proposed to use stable_sort_by_key to rearrange the element order, which is not a effective method. Since the rearrangement can be directly calculated given the #row and #col of the matrix. In thrust, it can be done with permutation iterators as:

    thrust::make_permutation_iterator(
        c.begin(),
        thrust::make_transform_iterator(
            thrust::make_counting_iterator((int) 0),
            (_1 % row) * col + _1 / row)
    )
    

    In step 2, reduce_by_key can do exactly what you want. In your case the reduction binary-op functor is easy, since comparison on tuple (element of your zipped vector) has already been defined to compare the 1st element of the tuple, and it's supported by thrust as

    thrust::minimum< thrust::tuple<float, int> >()
    

    The whole program is shown as follows. Thrust 1.6.0+ is required since I use placeholders in fancy iterators.

    #include <iterator>
    #include <algorithm>
    
    #include <thrust/device_vector.h>
    #include <thrust/iterator/counting_iterator.h>
    #include <thrust/iterator/transform_iterator.h>
    #include <thrust/iterator/permutation_iterator.h>
    #include <thrust/iterator/zip_iterator.h>
    #include <thrust/iterator/discard_iterator.h>
    #include <thrust/reduce.h>
    #include <thrust/functional.h>
    
    using namespace thrust::placeholders;
    
    int main()
    {
    
        const int row = 2;
        const int col = 5;
        float initc[] =
                { 0, 10, 20, 3, 40, 1, 2, 3, 5, 10 };
        thrust::device_vector<float> c(initc, initc + row * col);
    
        thrust::device_vector<float> minval(col);
        thrust::device_vector<int> minidx(col);
    
        thrust::reduce_by_key(
                thrust::make_transform_iterator(
                        thrust::make_counting_iterator((int) 0),
                        _1 / row),
                thrust::make_transform_iterator(
                        thrust::make_counting_iterator((int) 0),
                        _1 / row) + row * col,
                thrust::make_zip_iterator(
                        thrust::make_tuple(
                                thrust::make_permutation_iterator(
                                        c.begin(),
                                        thrust::make_transform_iterator(
                                                thrust::make_counting_iterator((int) 0), (_1 % row) * col + _1 / row)),
                                thrust::make_transform_iterator(
                                        thrust::make_counting_iterator((int) 0), _1 % row))),
                thrust::make_discard_iterator(),
                thrust::make_zip_iterator(
                        thrust::make_tuple(
                                minval.begin(),
                                minidx.begin())),
                thrust::equal_to<int>(),
                thrust::minimum<thrust::tuple<float, int> >()
        );
    
        std::copy(minidx.begin(), minidx.end(), std::ostream_iterator<int>(std::cout, " "));
        std::cout << std::endl;
        return 0;
    }
    

    Two remaining issues may affect the performance.

    1. min values have to be outputted, which is not required;
    2. reduce_by_key is designed for segments with variant lengths, it may not be the fastest algorithm for reduction on segments with same length.

    Writing your own kernel could be the best solution for highest performance.

    这篇关于使用CUDA Thrust确定每个矩阵列中的最小元素及其位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆