什么是CUDA的好的排序算法? [英] What is a good sorting algorithm on CUDA?

查看:833
本文介绍了什么是CUDA的好的排序算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个struct数组,我需要根据struct(N)的属性对这个数组排序。对象如下所示:

  struct OBJ 
{
int N; //根据N
对OBJ的数组进行排序OB * c; // OB是另一个结构
}

数组大小很小,但是每个元素的大小是巨大的,因此我不能将数组复制到共享内存。



最简单和好的方式来排序这个数组?我不需要一个复杂的算法,需要很多时间来实现(因为数组中的元素数量很少)我只需要一个简单的算法。



注意:我已经阅读了一些关于使用GPU的排序算法的论文,但是这些论文的速度增益只有当数组的大小非常大时才会出现。因此,我没有尝试实现他们的算法,因为我的数组的大小很小。我只需要一个简单的方法来并行排序我的数组。

解决方案

什么意思是大和小?



Bybig我假设你的意思是> 1M的元素,而小---足够小,以适合共享内存(可能< 1K元素)。如果我对小的理解与您的理解一致,我会尝试以下:




  • 只使用一个块来排序数组

  • 比特排序是可用于并行算法的好app之一。



有些网页在bitonic排序:




  • 比特排序(很好的解释,小程序可视化和java源,不占用太多空间) li>
  • 维基百科(对我的品味有点太简短,但是更多源代码 - 一些抽象语言和Java)

  • NVIDIA代码示例(CUDA中的示例源代码。我认为这是有点过度关注于杀死银行冲突。我相信更简单的代码可能实际上执行速度更快)



我也曾实现一个泡沫排序32个元素的数组。由于它的简单,它没有执行那个坏实际上。尽管如此,一个调谐良好的比特排序仍然会更快。


I have an array of struct and I need to sort this array according to a property of the struct (N). The object looks like this:

 struct OBJ
 { 
   int N; //sort array of OBJ with respect to N
   OB *c; //OB is another struct
 } 

The array size is small, about 512 elements, but the size of every element is big therefore I cannot copy the array to shared memory.

What is the simplest and "good" way to sort this array? I do not need a complex algorithm that require a lot of time to implement (since the number of elements in the array is small) I just need a simple algorithm.

Note: I have read some papers about sorting algorithms using GPUs, but the speed gain from these papers only show up when the size of the array is very big. Therefore I did not try to implement their algorithms because the size of my array is small. I only need a simple way to parallel sort my array. Thanks.

解决方案

What means "big" and "small" ?

By "big" I assume you mean something of >1M elements, while small --- small enough to actually fit in shared memory (probably <1K elements). If my understanding of "small" matches yours, I would try the following:

  • Use only a single block to sort the array (it can be then a part of some bigger CUDA kernel)
  • Bitonic sort is one of good appraches which can be adopted for parallel algorithm.

Some pages on bitonic sort:

  • Bitonic sort (nice explanation, applet to visualise and java source which does not take too much space)
  • Wikipedia (a bit too short explanation for my taste, but more source codes - some abstract language and Java)
  • NVIDIA code Samples (A sample source in CUDA. I think it is a bit ovefocused on killing bank conflicts. I believe the simpler code may actually perform faster)

I once also implemented a bubble sort (lol!) for a single warp to sort arrays of 32 elements. Thanks to its simplicity it did not perform that bad actually. A well tuned bitonic sort will still perform faster though.

这篇关于什么是CUDA的好的排序算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆