什么是CUDA的好的排序算法? [英] What is a good sorting algorithm on CUDA?
问题描述
我有一个struct数组,我需要根据struct(N)的属性对这个数组排序。对象如下所示:
struct OBJ
{
int N; //根据N
对OBJ的数组进行排序OB * c; // OB是另一个结构
}
数组大小很小,但是每个元素的大小是巨大的,因此我不能将数组复制到共享内存。
最简单和好的方式来排序这个数组?我不需要一个复杂的算法,需要很多时间来实现(因为数组中的元素数量很少)我只需要一个简单的算法。
注意:我已经阅读了一些关于使用GPU的排序算法的论文,但是这些论文的速度增益只有当数组的大小非常大时才会出现。因此,我没有尝试实现他们的算法,因为我的数组的大小很小。我只需要一个简单的方法来并行排序我的数组。
什么意思是大和小?
Bybig我假设你的意思是> 1M的元素,而小---足够小,以适合共享内存(可能< 1K元素)。如果我对小的理解与您的理解一致,我会尝试以下:
- 只使用一个块来排序数组
- 比特排序是可用于并行算法的好app之一。
有些网页在bitonic排序:
- 比特排序(很好的解释,小程序可视化和java源,不占用太多空间) li>
- 维基百科(对我的品味有点太简短,但是更多源代码 - 一些抽象语言和Java)
- NVIDIA代码示例(CUDA中的示例源代码。我认为这是有点过度关注于杀死银行冲突。我相信更简单的代码可能实际上执行速度更快)
我也曾实现一个泡沫排序32个元素的数组。由于它的简单,它没有执行那个坏实际上。尽管如此,一个调谐良好的比特排序仍然会更快。
I have an array of struct and I need to sort this array according to a property of the struct (N). The object looks like this:
struct OBJ
{
int N; //sort array of OBJ with respect to N
OB *c; //OB is another struct
}
The array size is small, about 512 elements, but the size of every element is big therefore I cannot copy the array to shared memory.
What is the simplest and "good" way to sort this array? I do not need a complex algorithm that require a lot of time to implement (since the number of elements in the array is small) I just need a simple algorithm.
Note: I have read some papers about sorting algorithms using GPUs, but the speed gain from these papers only show up when the size of the array is very big. Therefore I did not try to implement their algorithms because the size of my array is small. I only need a simple way to parallel sort my array. Thanks.
What means "big" and "small" ?
By "big" I assume you mean something of >1M elements, while small --- small enough to actually fit in shared memory (probably <1K elements). If my understanding of "small" matches yours, I would try the following:
- Use only a single block to sort the array (it can be then a part of some bigger CUDA kernel)
- Bitonic sort is one of good appraches which can be adopted for parallel algorithm.
Some pages on bitonic sort:
- Bitonic sort (nice explanation, applet to visualise and java source which does not take too much space)
- Wikipedia (a bit too short explanation for my taste, but more source codes - some abstract language and Java)
- NVIDIA code Samples (A sample source in CUDA. I think it is a bit ovefocused on killing bank conflicts. I believe the simpler code may actually perform faster)
I once also implemented a bubble sort (lol!) for a single warp to sort arrays of 32 elements. Thanks to its simplicity it did not perform that bad actually. A well tuned bitonic sort will still perform faster though.
这篇关于什么是CUDA的好的排序算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!