为什么Haskell会使用mergesort而不是quicksort? [英] Why does Haskell use mergesort instead of quicksort?

查看:68
本文介绍了为什么Haskell会使用mergesort而不是quicksort?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Wikibooks的 Haskell 中,有以下声明:

Data.List提供用于对列表进行排序的排序功能.它不使用快速排序;相反,它使用称为mergesort的算法的有效实现.

在Haskell中,使用mergesort而不是quicksort的根本原因是什么? Quicksort通常具有更好的实用性能,但在这种情况下可能没有.我认为,使用Haskell列表很难实现Quicksort的就地好处(不可能吗?).

与softwareengineering.SE相关的问题.真的不知道为什么使用 mergesort.

我自己实现了这两种类型的概要分析. Mergesort优越(对于2 ^ 20个元素的列表,速度大约是它的两倍),但我不确定我对quicksort的实现是否最佳.

编辑:这是我对mergesort和quicksort的实现:

 mergesort :: Ord a => [a] -> [a]
mergesort [] = []
mergesort [x] = [x]
mergesort l = merge (mergesort left) (mergesort right)
    where size = div (length l) 2
          (left, right) = splitAt size l

merge :: Ord a => [a] -> [a] -> [a]
merge ls [] = ls
merge [] vs = vs
merge first@(l:ls) second@(v:vs)
    | l < v = l : merge ls second
    | otherwise = v : merge first vs

quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort [x] = [x]
quicksort l = quicksort less ++ pivot:(quicksort greater)
    where pivotIndex = div (length l) 2
          pivot = l !! pivotIndex
          [less, greater] = foldl addElem [[], []] $ enumerate l
          addElem [less, greater] (index, elem)
            | index == pivotIndex = [less, greater]
            | elem < pivot = [elem:less, greater]
            | otherwise = [less, elem:greater]

enumerate :: [a] -> [(Int, a)]
enumerate = zip [0..]
 

编辑 2 3:我被要求提供实现时间和Data.List中的排序时间.按照@Will Ness的建议,我用-O2标志编译了此要点,更改了提供的内容每次在main中进行排序,并使用+RTS -s执行它.排序后的列表是一个廉价创建的伪随机[Int]列表,其中包含2 ^ 20个元素.结果如下:

  • Data.List.sort:0.171s
  • mergesort:1.092s(〜6x慢于Data.List.sort)
  • quicksort:1.152秒(比Data.List.sort慢7倍)

解决方案

在命令式语言中,Quicksort是通过对数组进行变异来就地执行的.正如您在代码示例中所演示的那样,您可以通过构建单链接列表来使Quicksort适应Haskell这样的纯函数式语言,但这不是那么快.

另一方面,Mergesort并不是就地算法:简单的命令式实现会将合并的数据复制到其他分配中.对于Haskell而言,这是一个更好的选择,因为Haskell本质上必须无论如何都要复制数据.

让我们退后一步:Quicksort的性能优势是传奇"-数十年前在与我们今天使用的机器大不相同的机器上建立的声誉.即使您使用相同的语言,这种知识也需要不时地重新检查,因为实际情况可能会发生变化.我在该主题上阅读的最后一篇基准测试论文仍然将Quicksort放在首位,但是即使在C/C ++中,它在Mergesort方面的领先优势也很小.

Mergesort具有其他优点:无需进行调整即可避免Quicksort的O(n ^ 2)最坏情况,并且它自然稳定.因此,如果您由于其他因素而失去了很小的性能差异,那么Mergesort就是一个明显的选择.

In Wikibooks' Haskell, there is the following claim:

Data.List offers a sort function for sorting lists. It does not use quicksort; rather, it uses an efficient implementation of an algorithm called mergesort.

What is the underlying reason in Haskell to use mergesort over quicksort? Quicksort usually has better practical performance, but maybe not in this case. I gather that the in-place benefits of quicksort are hard (impossible?) to do with Haskell lists.

There was a related question on softwareengineering.SE, but it wasn't really about why mergesort is used.

I implemented the two sorts myself for profiling. Mergesort was superior (around twice as fast for a list of 2^20 elements), but I'm not sure that my implementation of quicksort was optimal.

Edit: Here are my implementations of mergesort and quicksort:

mergesort :: Ord a => [a] -> [a]
mergesort [] = []
mergesort [x] = [x]
mergesort l = merge (mergesort left) (mergesort right)
    where size = div (length l) 2
          (left, right) = splitAt size l

merge :: Ord a => [a] -> [a] -> [a]
merge ls [] = ls
merge [] vs = vs
merge first@(l:ls) second@(v:vs)
    | l < v = l : merge ls second
    | otherwise = v : merge first vs

quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort [x] = [x]
quicksort l = quicksort less ++ pivot:(quicksort greater)
    where pivotIndex = div (length l) 2
          pivot = l !! pivotIndex
          [less, greater] = foldl addElem [[], []] $ enumerate l
          addElem [less, greater] (index, elem)
            | index == pivotIndex = [less, greater]
            | elem < pivot = [elem:less, greater]
            | otherwise = [less, elem:greater]

enumerate :: [a] -> [(Int, a)]
enumerate = zip [0..]

Edit 2 3: I was asked to provide timings for my implementations versus the sort in Data.List. Following @Will Ness' suggestions, I compiled this gist with the -O2 flag, changing the supplied sort in main each time, and executed it with +RTS -s. The sorted list was a cheaply-created, pseudorandom [Int] list with 2^20 elements. The results were as follows:

  • Data.List.sort: 0.171s
  • mergesort: 1.092s (~6x slower than Data.List.sort)
  • quicksort: 1.152s (~7x slower than Data.List.sort)

解决方案

In imperative languages, Quicksort is performed in-place by mutating an array. As you demonstrate in your code sample, you can adapt Quicksort to a pure functional language like Haskell by building singly-linked lists instead, but this is not as fast.

On the other hand, Mergesort is not an in-place algorithm: a straightforward imperative implementation copies the merged data to a different allocation. This is a better fit for Haskell, which by its nature must copy the data anyway.

Let's step back a bit: Quicksort's performance edge is "lore" -- a reputation built up decades ago on machines much different from the ones we use today. Even if you use the same language, this kind of lore needs rechecking from time to time, as the facts on the ground can change. The last benchmarking paper I read on this topic had Quicksort still on top, but its lead over Mergesort was slim, even in C/C++.

Mergesort has other advantages: it doesn't need to be tweaked to avoid Quicksort's O(n^2) worst case, and it is naturally stable. So, if you lose the narrow performance difference due to other factors, Mergesort is an obvious choice.

这篇关于为什么Haskell会使用mergesort而不是quicksort?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆