快速CUDA推力定制比较运算符 [英] fast CUDA thrust custom comparison operator

查看：223 发布时间：2017/3/4 14:49:49 cuda thrust

本文介绍了快速CUDA推力定制比较运算符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在评估CUDA，目前使用Thrust库对数字进行排序。

我想为thrust :: sort创建自己的比较器，下来drammically！
我只需从 functional.h 中复制代码即可创建自己的较少实现。

默认比较器：thrust :: less（） - 94 ms

我自己的比较器：less（） - 906

我使用Visual Studio 2010.我应该怎么做才能获得与选项1相同的性能？

完成代码：
```
  #include< stdio.h> 
 
 #include< cuda.h> 
 
 #include< thrust / host_vector.h> 
 #include< thrust / device_vector.h> 
 #include< thrust / generate.h> 
 #include< thrust / sort.h> 
 
 int myRand（）
 {
 static int counter = 0; 
 if（counter ++％10000 == 0）
 srand（time（NULL）+ counter）; 
 return（rand（）<< 16）| rand（）; 
} 
 
 template< typename T> 
 struct less：public thrust :: binary_function< T，T，bool> 
 {
 __host__ __device__ bool operator（）（const T& lhs，const T& rhs）const {
 return lhs& rhs; 
} 
}; 
 
 int main（）
 {
 thrust :: host_vector< int> h_vec（10 * 1000 * 1000）; 
 thrust :: generate（h_vec.begin（），h_vec.end（），myRand）; 
 
 thrust :: device_vector< int> d_vec = h_vec; 
 
 int clc = clock（）; 
 thrust :: sort（d_vec.begin（），d_vec.end（），less< int>（））; 
 printf（％dms\\\
，（clock（） -  clc）* 1000 / CLOCKS_PER_SEC）; 
 
 return 0; 
} 
  
```
解决方案
性能上的差异是因为Thrust根据提供给 thrust :: sort 的参数实现不同算法的排序。

在情况1中，Thrust可以证明该排序可以在具有基数排序的线性时间中实现。这是因为要排序的数据类型是内置的数字类型（ int ），比较函数是内置的小于操作 - Thrust识别 thrust :: less< int> 将产生与 x <
在情况2中，Thrust不了解您的用户提供的 less< int> ，并且必须使用基于具有不同渐近复杂度的比较排序的更保守的算法，即使事实上更少< int> code> thrust :: less< int> 。

一般来说，用户定义的比较运算符不能与更多限制性，更快的排序，其操纵数据的二进制表示，例如基数排序。在这些情况下，Thrust可以退回更一般的，但排序较慢。

I'm evaluating CUDA and currently using Thrust library to sort numbers.

I'd like to create my own comparer for thrust::sort, but it slows down drammatically! I created my own less implemetation by just copying code from functional.h. However it seems to be compiled in some other way and works very slowly.
1. default comparer: thrust::less() - 94ms
2. my own comparer: less() - 906ms
I'm using Visual Studio 2010. What should I do to get the same performance as at option 1?

Complete code:
```
#include <stdio.h>

#include <cuda.h>

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>

int myRand()
{
        static int counter = 0;
        if ( counter++ % 10000 == 0 )
                srand(time(NULL)+counter);
        return (rand()<<16) | rand();
}

template<typename T>
struct less : public thrust::binary_function<T,T,bool>
{
  __host__ __device__ bool operator()(const T &lhs, const T &rhs) const {
     return lhs < rhs;
  }
}; 

int main()
{
    thrust::host_vector<int> h_vec(10 * 1000 * 1000);
    thrust::generate(h_vec.begin(), h_vec.end(), myRand);

    thrust::device_vector<int> d_vec = h_vec;

    int clc = clock();
    thrust::sort(d_vec.begin(), d_vec.end(), less<int>());
    printf("%dms\n", (clock()-clc) * 1000 / CLOCKS_PER_SEC);

    return 0;
}
```
解决方案
The reason you are observing a difference in performance is because Thrust is implementing the sort with different algorithms depending on the arguments provided to thrust::sort.

In case 1., Thrust can prove that the sort can be implemented in linear time with a radix sort. This is because the type of the data to sort is a built-in numeric type (int), and the comparison function is the built-in less than operation -- Thrust recognizes that thrust::less<int> will produce the equivalent result as x < y.

In case 2., Thrust knows nothing about your user-provided less<int>, and has to use a more conservative algorithm based on a comparison sort which has different asymptotic complexity, even though in truth your less<int> is equivalent to thrust::less<int>.

In general, user-defined comparison operators can't be used with more restrictive, faster sorts which manipulate the binary representation of data such as radix sort. In these cases, Thrust falls back on a more general, but slower sort.

这篇关于快速CUDA推力定制比较运算符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

快速CUDA推力定制比较运算符 [英] fast CUDA thrust custom comparison operator

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

快速CUDA推力定制比较运算符 [英] fast CUDA thrust custom comparison operator

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭