为什么进行排序链表合并pferred了快速排序排序$ P $ [英] why is merge sort preferred over quick sort for sorting linked lists

查看:180
本文介绍了为什么进行排序链表合并pferred了快速排序排序$ P $的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看了下在一个论坛上:

I read the following in a forum :

合并排序是非常有效的   一成不变的数据结构类似链接   名单

Merge sort is very efficient for immutable datastructures like linked lists

快速排序通常比快   归并排序时的数据存储在   记忆。然而,当数据集是   庞大并且被存储在外部设备   如硬盘驱动器,归并排序是   明显的赢家在速度方面。它   最大限度地减少昂贵的读取   外部驱动器

Quick sort is typically faster than merge sort when the data is stored in memory. However, when the data set is huge and is stored on external devices such as a hard drive, merge sort is the clear winner in terms of speed. It minimizes the expensive reads of the external drive

在链表操作时,归并排序只需要辅助存储器将小恒量

when operating on linked lists, merge sort only requires a small constant amount of auxiliary storage

有人可以帮助我了解了上面的说法?为什么进行排序巨大的链表归并排序preferred?它是如何最大限度地减少昂贵的读取到外部驱动器?基本上,我想知道为什么人会选择合并排序进行排序一个很大的链表。

Can someone help me understand the above argument? why is merge sort preferred for sorting huge linked lists? and how does it minimize expensive reads to an external drive? basically i want to understand why one would choose merge sort for sorting a big linked list.

推荐答案

快速排序可以很好地用于就地分拣。特别是,大部分操作可以在交换中的数组元素对来定义。要做到这一点,但是,你常走通过与两个指针(或索引等)的阵列中的一个开始,在数组的开始,另一个在年底。无论然后工作他们的方式向中间(和你同当满足特定分区步骤中完成)。这是昂贵的处理文件,因为文件主要面向阅读的一个方向,从开始到结束。从年底开始,并寻求向后通常是比较昂贵的。

Quick sort works well for sorting in-place. In particular, most of the operations can be defined in terms of swapping pairs of elements in an array. To do that, however, you normally "walk" through the array with two pointers (or indexes, etc.) One starts at the beginning of the array and the other at the end. Both then work their way toward the middle (and you're done with a particular partition step when they meet). That's expensive with files, because files are oriented primarily toward reading in one direction, from beginning to end. Starting from the end and seeking backwards is usually relatively expensive.

目前至少在最简单的化身,归并排序是pretty的多相反。最简单的方法来实现它仅需要通过数据在一个方向看,,但的涉及将数据拆分两个独立的部分,整理碎片,然后将它们合并到一起。

At least in its simplest incarnation, merge sort is pretty much the opposite. The easy way to implement it only requires looking through the data in one direction, but involves breaking the data into two separate pieces, sorting the pieces, then merging them back together.

通过一个链接列表,可以很容易地采取(例如)交替在一个链表的元素和操作的链接,从这些相同的元素,而不是创建两个链接列表。对于数组,重新安排元素如此交替的元素进入单独的数组是容易的,如果你愿意创建一个副本一样大的原始数据,但在其他方面,而更不平凡。

With a linked list, it's easy to take (for example) alternating elements in one linked list, and manipulate the links to create two linked lists from those same elements instead. With an array, rearranging elements so alternating elements go into separate arrays is easy if you're willing to create a copy as big as the original data, but otherwise rather more non-trivial.

同样,阵列的合并很容易,如果你合并从源数组的元素到新数组中的数据,以便 - 但要做到到位而不创建数据的一个全新副本是一个完全不同的故事。随着链表,合并元素结合在一起,从两个源列表成为一个单一的目标列表是微不足道的 - 再次,你只是操纵链接,而不复制元素

Likewise, merging with arrays is easy if you merge elements from the source arrays into a new array with the data in order -- but to do it in place without creating a whole new copy of the data is a whole different story. With a linked list, merging elements together from two source lists into a single target list is trivial -- again, you just manipulate links, without copying elements.

至于用快速排序,产生排序运行外部归并排序,它的工作,但它的(果断),次优作为一项规则。要优化合并排序,你通常要最大限度地发挥每个排序的跑的长度,你生产。如果你只是读取数据,将适合在内存中,快速排序,并把它写了出来,每次运行将仅限于可用内存(小于一点点)的大小。

As for using Quicksort to produce the sorted runs for an external merge sort, it does work, but it's (decidedly) sub-optimal as a rule. To optimize a merge-sort, you normally want to maximize the lengths of each sorted "run" as you produce it. If you simply read in the data that will fit in memory, Quicksort it and write it out, each run will be restricted to (a little less than) the size of the available memory.

您可以尽管做了不少比这更好的规则。你开始通过读取数据块,而不是使用一个快速排序就可以了,你建立一个堆。然后,当你从堆到有序运行文件中写入每个项目出来,你从你的输入文件中读取的另一个的项目研究。如果它比你刚才写磁盘的项目时,将其插入到现有的堆,然后重复。

You can do quite a bit better than that as a rule though. You start by reading in a block of data, but instead of using a Quicksort on it, you build a heap. Then, as you write each item out from the heap into the sorted "run" file, you read another item in from your input file. If it's larger than the item you just wrote to disk, you insert it into your existing heap, and repeat.

产品的体积更小(即属于那些已经被写入项目前)你保持独立,并建立进入第二堆。当(且仅当)你的第一个堆是空的,第二堆已经接管了所有的记忆,你不干写作项目现有的运行的文件,并开始一个新的。

Items that are smaller (i.e., belong before items that have already been written) you keep separate, and build into a second heap. When (and only when) your first heap is empty, and the second heap has taken over all the memory, you quit writing items to the existing "run" file, and start on a new one.

究竟效果如何,这将是依赖于数据的初始顺序。在最坏的情况下(输入排序相反的顺序),它一点好处都没有。在最好的情况下(输入已排序),它可以让你通过输入一个运行的数据之类的。在平均情况下(输入以随机顺序)它可以让你的每个排序运行的大约两倍的长度,这将典型地通过的提高速度围绕的20-25%(取决于虽然百分比变化多少更大数据大于可用存储器)。

Exactly how effective this will be depends on the initial order of the data. In the worst case (input sorted in inverse order) it does no good at all. In the best case (input already sorted) it lets you "sort" the data in a single run through the input. In an average case (input in random order) it lets you approximately double the length of each sorted run, which will typically improve speed by around 20-25% (though the percentage varies depending on how much larger your data is than the available memory).

这篇关于为什么进行排序链表合并pferred了快速排序排序$ P $的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆