快速功能合并排序 [英] Fast functional merge sort

查看:181
本文介绍了快速功能合并排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我的实现在Scala中合并排序的:

 对象FuncSort {
  DEF合并(L:流[INT],R:流[INT]):流[INT] = {
    (L,R)的比赛{
      案例(H#:: T,空)=>升
      案例(空,H#:: T)=> ř
      情况下(X#:: XS,Y#:: YS)=>如果(X< Y)x#::合并(XS,R),否则Y#::合并(L,YS)
    }
  }

  高清排序(XS:流[INT]):流[INT] = {
    如果(xs.length == 1)XS
    其他 {
      VAL M = xs.length / 2
      VAL(L,R)= xs.splitAt(米)
      合并(排序(L),排序(R))
    }
  }
}
 

它工作正常,似乎渐近这是很好很好,但它的方式从这里的 http://algs4.cs.princeton.edu/22mergesort/Merge.java.html ,并使用了大量的内存。有没有更快的实现合并排序是的功能的?显然,这是可能的端口Java版本一行行,但是这不是我要找的。

UPD:我已经改变了列表#:: :: 和排序例程变得越来越快,只有三到四倍比Java版本慢。但我不明白为什么没有与堆栈溢出崩溃? 合并不是尾递归,所有参数都严格评估......这怎么可能?

解决方案

您已经提出了多个问题。我试着回答这些问题的逻辑顺序:

的流版本没有堆栈溢出

您真的不问这个,但它导致了一些有趣的发现。

在流版本使用的是#::合并(...)合并函数内。通常,这将是一个递归调用,并可能导致对足够大的输入数据堆栈溢出。但不是在这种情况下。操作#::( A,B)类ConsWrapper [A] (有一个隐含的转换来实现),是一个同义词 cons.apply [A](高清:A,TL:⇒流[A]):缺点[A] 。正如你所看到的,第二个参数是按名称调用,这意味着它是懒洋洋地评估。

这意味着合并返回类型的新创建的对象缺点这最终将调用再次合并。换句话说:递归不会发生在栈上,但在堆上。通常你有足够的堆。

使用堆递归是一个很好的方法来处理非常深的递归。但它比使用堆栈慢得多。所以,你交易的速度递归深度。这是最主要的原因,所以使用是如此之慢。

第二个原因是,对于获得的长度,Scala有兑现的整个 。但在排序反正它会兑现每一个元素,所以这不会伤害很大。

在列表中的版本号堆栈溢出

当你改变流列表中,您确实使用的堆栈递归。现在,一个堆栈溢出可能发生。但随着排序,你平时有日志(大小)一个递归深度,基地通常对数 2 。因此,到4十亿输入项目进行排序,你需要一个约32堆栈帧。至少为320K一个默认的堆栈大小(在Windows,其他系统有较大的默认值),这个地方留下了很多递归的,因此对于大量输入数据进行排序。

更快职能部门的实施

这取决于: - )

您应该使用栈,而不是堆的递归。而且你应该决定你的策略取决于输入数据:

  1. 对于小数据块,对它们进行排序的地方用一些简单的算法。该算法的复杂性不会咬你,你可以不必在缓存中的所有数据获得了很多的性能。当然,欧仍然可以手工code 排序网络,在给定的尺寸。
  2. 如果您有数字输入数据时,您可以使用基数排序和处理工作的向量单位对你的处理器或你的GPU(更复杂的算法可以在 GPU精粹中找到
  3. 对于中等大小的数据块,您可以使用分而治之的策略,以将数据拆分到多个线程(如果你有多个内核!)
  4. 对于庞大的数据块,使用合并排序,分裂的适合的内存块吧。如果你愿意,你可以在网络上分发这些块和排序在内存中。

不要使用交换和使用您的缓存。使用可变数据结构,如果你能和排序到位。我认为,功能性和快速排序不能正常工作很好地结合在一起。为了使分选的真快,你将不得不使用状态操作(如就地归并的可变数组)。

我通常会尝试这对我所有的计划:用纯函数的风格尽可能而是使用状态操作的小零件可行时(例如,​​因为它具有更好的性能或code只是要处理大量的状态并成为更好的可读性,当我使用 VAR 的S代替 VAL S)。

Here's my implementation of merge sort in Scala:

object FuncSort {
  def merge(l: Stream[Int], r: Stream[Int]) : Stream[Int] = {
    (l, r) match {
      case (h #:: t, Empty) => l
      case (Empty, h #:: t) => r
      case (x #:: xs, y #:: ys) => if(x < y ) x #:: merge(xs, r) else y #:: merge(l, ys)
    }
  }

  def sort(xs: Stream[Int]) : Stream[Int] = {
    if(xs.length == 1) xs
    else {
      val m = xs.length / 2
      val (l, r) = xs.splitAt(m)
      merge(sort(l), sort(r))
    }
  }
}

It works correctly and it seems that asymptotically it is fine as well but it is way slower (approx 10 times) than Java implementation from here http://algs4.cs.princeton.edu/22mergesort/Merge.java.html and uses a lot of memory. Is there a faster implementation of merge sort which is functional? Obviously, it's possible to port Java version line by line but that's not what I'm looking for.

UPD: I've changed Stream to List and #:: to :: and the sorting routine became faster, only three to four times slower than Java version. But I don't understand why doesn't it crashes with stack overflow? merge isn't tail-recursive, all arguments are strictly evaluated...how is it possible?

解决方案

You have raised multiple questions. I try to answer them in a logical order:

No stack overflow in the Stream version

You did not really ask this one, but it leads to some interesting observations.

In the Stream version you are using #:: merge(...) inside the merge function. Usually this would be a recursive call and might lead to a stack overflow for big enough input data. But not in this case. The operator #::(a,b) is implemented in class ConsWrapper[A] (there is an implicit conversion) and is a synonym for cons.apply[A](hd: A, tl: ⇒ Stream[A]): Cons[A]. As you can see, the second argument is call by name, meaning it is evaluated lazily.

That means merge returns a newly created object of type cons which will eventually call merge again. In other words: The recursion does not happen on the stack, but on the heap. And usually you have plenty of heap.

Using the heap for recursion is a nice technique to handle very deep recursions. But it is much slower than using the stack. So you traded speed for recursion depth. This is the main reason, why using Stream is so slow.

The second reason is, that for getting the length of the Stream, Scala has to materialize the whole Stream. But during sorting the Stream it would have to materialize each element anyway, so this does not hurt very much.

No stack overflow in List version

When you are changing Stream for List, you are indeed using the stack for recursion. Now a Stack overflow could happen. But with sorting, you usually have a recursion depth of log(size), usually the logarithm of base 2. So to sort 4 billion input items, you would need a about 32 stack frames. With a default stack size of at least 320k (on Windows, other systems have larger defaults), this leaves place for a lot of recursions and hence for lots of input data to be sorted.

Faster functional implementation

It depends :-)

You should use the stack, and not the heap for recursion. And you should decide your strategy depending on the input data:

  1. For small data blocks, sort them in place with some straight forward algorithm. The algorithmic complexity won't bite you, and you can gain a lot of performance from having all data in cache. Of course, ou could still hand code sorting networks for the given sizes.
  2. If you have numeric input data, you can use radix sort and handle the work to the vector units on you processor or your GPU (more sophisticated algorithms can be found in GPU Gems).
  3. For medium sized data blocks, you can use a divide-and-conquer strategy to split the data to multiple threads (only if you have multiple cores!)
  4. For huge data blocks use merge sort and split it of in blocks that fit in memory. If you want, you can distribute these blocks on the network and sort in memory.

Don't use swap and use your caches. Use mutable data structures if you can and sort in place. I think that functional and fast sorting does not work very well together. To make sorting really fast, you will have to use stateful operations (e.g. in-place mergesort on mutable arrays).

I usually try this on all my programs: Use pure functional style as far as possible but use stateful operations for small parts when feasible (e.g. because it has better performance or the code just has to deal with lots of states and becomes much better readable when I use vars instead of vals).

这篇关于快速功能合并排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆