如何快速推力::排序和什么是最快的基数排序实现 [英] how fast is thrust::sort and what is the fastest radix sort implementation

查看：390 发布时间：2017/3/4 15:00:45 sorting cuda thrust

本文介绍了如何快速推力::排序和什么是最快的基数排序实现的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是GPU编程的新手。最近，我试图实现基于教程的gpu bvh构建算法： http://devblogs.nvidia.com/parallelforall/thinking-parallel-part-iii-tree-construction-gpu/ 。在该算法的第一步中，计算和排序每个基元的密码（unsigned int）。本教程给出了12K对象的morton代码计算和排序的参考时间成本：

I'm a newbie to GPU programming. Recently, I'm trying to implement the gpu bvh construction algorithm based on an tutorial: http://devblogs.nvidia.com/parallelforall/thinking-parallel-part-iii-tree-construction-gpu/. In the first step of this algorithm, the morton code(unsigned int) of every primitive is computed and sorted. The tutorial gives a reference time cost of the morton code computing and sorting for 12K objects:

0.02 ms，每个对象一个线程：计算边界

0.18 ms，平行基数排序：根据对象的Morton代码对对象进行排序。
...

在我的实现中，第一步的成本为0.1ms，排序步骤的成本为1.8ms。我使用推力做排序。那么什么是GPU上基数排序的最快实现？

In my implementation, the first step cost 0.1ms and the sorting step costs 1.8ms. I'm using thrust to do the sorting. So what is the fastest implementation of radix sort on GPU?

我使用的Geforce Titan GPU应该比本教程作者使用的Geforce GTX690更快。
这是我的测试代码排序，即使大小为10，它花费大约1.5ms。

I'm using a Geforce Titan GPU which should faster than the Geforce GTX690 used by the author of the tutorial. Here is my test code for sorting, it costs about 1.5ms even when the size is 10.

void testSort()
{
    int sz = 10;
    thrust::host_vector<unsigned int> h_keys(sz);
    for(int i=0; i<sz; i++)
        h_keys[i] = rand();
    thrust::device_ptr<unsigned int> keys = thrust::device_malloc<unsigned int>(sz);
    thrust::copy(h_keys.begin(),h_keys.end(),keys);
    cudaEvent_t estart, estop;
    cudaEventCreate( &estart );
    cudaEventCreate( &estop );
    cudaEventRecord( estart, 0 );
    thrust::stable_sort(keys,keys+sz);
    cudaEventRecord( estop, 0 ) ;
    cudaEventSynchronize( estop );
    float elapsedTime;
    cudaEventElapsedTime( &elapsedTime,
        estart, estop ) ;
    printf( "Time to sort: %3.1f ms\n", elapsedTime );
    cudaEventDestroy( estart ) ;
    cudaEventDestroy( estop ) ;
}

如何快速推力::排序和什么是最快的基数排序实现 [英] how fast is thrust::sort and what is the fastest radix sort implementation

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

如何快速推力::排序和什么是最快的基数排序实现 [英] how fast is thrust::sort and what is the fastest radix sort implementation

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭