其中并行排序算法具有最好的平均情况下的性能? [英] Which parallel sorting algorithm has the best average case performance?

查看:232
本文介绍了其中并行排序算法具有最好的平均情况下的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

排序需要O(N日志N)在串行情况。如果我们有O(n)的处理器,我们希望有一个线性加速。 O(log n)的并行算法存在,但他们有一个非常高的常数。他们也并不适用于商品硬件,没有接近O(n)的处理器的任何地方。随着p个处理器,合理的算法应该采取O(N / P log n)的时间。

Sorting takes O(n log n) in the serial case. If we have O(n) processors we would hope for a linear speedup. O(log n) parallel algorithms exist but they have a very high constant. They also aren't applicable on commodity hardware which doesn't have anywhere near O(n) processors. With p processors, reasonable algorithms should take O(n/p log n) time.

在串行情况下,快速排序平均具有最佳运行的复杂性。并行快速排序算法容易实现(见这里和<一href="http://stackoverflow.com/questions/1784028/which-sorting-method-is-most-suitable-for-parallel-processing">here).然而,它不能很好地执行,因为第一个步骤是在单核分割整个集合。我发现很多并行排序算法的信息,但到目前为止我还没有看到任何东西指向一个明显的赢家。

In the serial case, quick sort has the best runtime complexity on average. A parallel quick sort algorithm is easy to implement (see here and here). However it doesn't perform well since the very first step is to partition the whole collection on a single core. I have found information on many parallel sort algorithms but so far I have not seen anything pointing to a clear winner.

我期待在一个JVM语言上8到32个内核上运行排序为1万到1亿的元素列表。

I'm looking to sort lists of 1 million to 100 million elements in a JVM language running on 8 to 32 cores.

推荐答案

下面的文章(PDF下载)是对各种架构的并行排序算法进行比较研究:

The following article (PDF download) is a comparative study of parallel sorting algorithms on various architectures:

对各种架构的并行排序算法

根据这篇文章,样品排序似乎是最好的许多并行体系结构类型。

According to the article, sample sort seems to be best on many parallel architecture types.

更新,以解决年龄马克的担忧:

Update to address Mark's concern of age:

下面是最新的文章介绍一些更新颖的(从2007年,其中,顺便说一句,仍然可以与样品相比排序):

Here are more recent articles introducing something more novel (from 2007, which, btw, still get compared with sample sort):

样品改进排序
AA-排序

最前沿(约公元前2010年,有的只有几个月大):

The bleeding edge (circa 2010, some only a couple months old):

并行排序模式
<一href="http://lap.epfl.ch/webdav/site/lap/shared/publications/YeApr10_HighPerformanceComparisonBasedSortingAlgorithmOnManyCoreGpus_IPDPS10.pdf">Many-core基于GPU的并行排序
混合型CPU / GPU并行排序
随机并行排序算法与实验研究
高度可扩展并行排序
排序的N-元素使用自然顺序:一种新的自适应排序方法

Parallel sorting pattern
Many-core GPU based parallel sorting
Hybrid CPU/GPU parallel sort
Randomized Parallel Sorting Algorithm with an Experimental Study
Highly scalable parallel sorting
Sorting N-Elements Using Natural Order: A New Adaptive Sorting Approach

更新为2013: 这里是最前沿大约一月,2013年(注:有几个链接对论文的Citeseer,并要求登记是免费的):

Update for 2013: Here is the bleeding edge circa January, 2013. (Note: A few of the links are to papers at Citeseer and require registration which is free):

大学演讲:
并行分区的选择和排序
并行排序算法讲座
并行排序算法讲座2
并行排序算法讲座3

其它来源和论文:
许多核心架构基于自适应双调排序
一种新的排序算法 高度可扩展并行排序2
并行合并
并行合并2
并行自分拣系统为对象
序贯快速排序和并行快速排序算法
性能比较 共享内存,消息传递,和混合合并排序进行单机和集群的SMP
各种并行算法(排序等),包括实现

GPU和CPU / GPU混合来源和论文:
并行排序算法的OpenCL的方法GPU架构
数据排序利用图形处理单元
高效的算法排序在GPU上
设计高效的排序算法的众核GPU的
确定性样品排序对于GPU的
快速就地基于双调排序与CUDA排序
快速并行GPU排序使用混合算法
快速并行排序算法在GPU上
在CPU和GPU 快速排序:带宽忘却SIMD排序
情况 GPU示例排序
GPU-ABiSort:在流架构
最优并行排序 GPUTeraSort:高性能图形协处理器排序为大型数据库管理
在众核GPU的高性能基于比较的排序算法
为支持CUDA的GPU的负载平衡和低传输开销
并行外部排序 排序在GPU上进行大规模的数据集:一个全面的比较

University lectures:
Parallel Partitioning for Selection and Sorting
Parallel Sorting Algorithms Lecture
Parallel Sorting Algorithms Lecture 2
Parallel Sorting Algorithms Lecture 3

Other sources and papers:
A novel sorting algorithm for many-core architectures based on adaptive bitonic sort
Highly Scalable Parallel Sorting 2
Parallel Merging
Parallel Merging 2
Parallel Self-Sorting System for Objects
Performance Comparison of Sequential Quick Sort and Parallel Quick Sort Algorithms
Shared Memory, Message Passing, and Hybrid Merge Sorts for Standalone and Clustered SMPs
Various parallel algorithms (sorting et al) including implementations

GPU and CPU/GPU hybrid sources and papers:
An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture
Data Sorting Using Graphics Processing Units
Efficient Algorithms for Sorting on GPUs
Designing efficient sorting algorithms for manycore GPUs
Deterministic Sample Sort For GPUs
Fast in-place sorting with CUDA based on bitonic sort
Fast parallel GPU-sorting using a hybrid algorithm
Fast Parallel Sorting Algorithms on GPUs
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
GPU sample sort
GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures
GPUTeraSort: high performance graphics co-processor sorting for large database management
High performance comparison-based sorting algorithm on many-core GPUs
Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead
Sorting on GPUs for large scale datasets: A thorough comparison

这篇关于其中并行排序算法具有最好的平均情况下的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆