为什么F#的Seq.sortBy比LINQ的IEnumerable< OrderBy扩展方法要慢得多? [英] Why is F#'s Seq.sortBy much slower than LINQ's IEnumerable<T>.OrderBy extension method?

查看:127
本文介绍了为什么F#的Seq.sortBy比LINQ的IEnumerable< OrderBy扩展方法要慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近编写了一段代码,以从文件中读取一些数据,将其存储在元组中,并按元组的第一个元素对所有收集的数据进行排序.经过一些测试,我注意到使用Seq.sortBy(和Array.sortBy)比使用IEnumerable.OrderBy极其慢. 以下是两个代码段,应显示我在谈论的行为:

I've recently written a piece of code to read some data from a file, store it in a tuple and sort all the collected data by the first element of the tuple. After some tests I've noticed that using Seq.sortBy (and Array.sortBy) is extremely slower than using IEnumerable.OrderBy. Below are two snippets of code which should show the behaviour I'm talking about:


(filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries) 
                                |> Array.map(double) 
                                |> Array.sort in arr.[0], arr.[1])
).OrderBy(new Func(fun (a,b) -> a))


filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries) |> Array.map(double) |> Array.sort in arr.[0], arr.[1])
|> Seq.sortBy(fun (a,_) -> a)

在包含由两个双打构成的100000行的文件中,在我的计算机上,后一个版本的占用时间是第一个双倍的两倍(如果使用Array.sortBy,则无法获得任何改进). 想法?

On a file containing 100000 lines made of two doubles, on my computer the latter version takes over twice as long as the first one (no improvements are obtained if using Array.sortBy). Ideas?

推荐答案

f#实现使用结果键的结构比较.

the f# implementation uses a structural comparison of the resulting key.

let sortBy keyf seq =
    let comparer = ComparisonIdentity.Structural
    mkDelayedSeq (fun () -> 
        (seq 
        |> to_list 
        |> List.sortWith (fun x y -> comparer.Compare(keyf x,keyf y)) 
        |> to_array) :> seq<_>)

(也可以排序)

let sort seq =
    mkDelayedSeq (fun () -> 
        (seq 
        |> to_list 
        |> List.sortWith Operators.compare 
        |> to_array) :> seq<_>)

Operators.compare和CompareIdentity.Structural.Compare都(最终)成为

both Operators.compare and the ComparisonIdentity.Structural.Compare become (eventually)

let inline GenericComparisonFast<'T> (x:'T) (y:'T) : int = 
    GenericComparisonIntrinsic x y
        // lots of other types elided
        when 'T : float = if (# "clt" x y : bool #) 
                          then (-1) 
                          else (# "cgt" x y : int #)

,但是操作员的路由完全是内联的,因此JIT编译器最终将插入直接的双重比较指令,而没有(无论如何,在两种情况下都需要)委托调用,而没有额外的方法调用开销.

but the route to this for the Operator is entirely inline, thus the JIT compiler will end up inserting a direct double comparison instruction with no additional method invocation overhead except for the (required in both cases anyway) delegate invocation.

sortBy使用比较器,因此将进行附加的虚拟方法调用,但基本上是相同的.

The sortBy uses a comparer so will go through an additional virtual method call but is basically about the same.

相比之下,OrderBy函数还必须经过虚拟方法调用以确保相等性(使用EqualityComparer<T>.Default),但是最大的区别在于它排序到位并使用为此创建的缓冲区.相比之下,如果您查看sortBy,您会发现它对列表进行了排序(不在适当的位置,它使用了看起来像是合并排序的StableSortImplementation),然后创建了它的副本作为新数组.尽管根据不同的排序实现 也可能会产生影响,但这种额外的副本(根据输入数据的大小)可能是导致速度变慢的主要原因.

In comparison the OrderBy function also must go through virtual method calls for the equality (Using EqualityComparer<T>.Default) but the significant difference is that it sorts in place and uses the buffer created for this as the result. In comparison if you take a look at the sortBy you will see that it sorts the list (not in place, it uses the StableSortImplementation which appears to be merge sort) and then creates a copy of it as a new array. This additional copy (given the size of your input data) is likely the principle cause of the slow down though the differing sort implementations may also have an effect.

这就是所有猜测.如果从性能方面来说,这是您所关心的问题,那么您只需个人资料以找出花费时间的时间.

That said this is all guessing. If this area is a concern for you in performance terms then you should simply profile to find out what is taking the time.

如果您希望查看排序/复制更改的效果,请尝试以下替代方法:

If you wish to see what effect the sorting/copying change would have try this alternate:

// these are taken from the f# source so as to be consistent
// beware doing this, the compiler may know about such methods
open System.Collections.Generic
let mkSeq f = 
    { new IEnumerable<'b> with 
        member x.GetEnumerator() = f()
      interface System.Collections.IEnumerable with 
        member x.GetEnumerator() = (f() :> System.Collections.IEnumerator) }

let mkDelayedSeq (f: unit -> IEnumerable<'T>) = 
    mkSeq (fun () -> f().GetEnumerator())

// the function
let sortByFaster keyf seq =
    let comparer = ComparisonIdentity.Structural
    mkDelayedSeq (fun () -> 
        let buffer = Seq.to_array seq
        Array.sortInPlaceBy (fun x y -> comparer.Compare(keyf x,keyf y)) buffer
        buffer :> seq<_>)

在具有很大(>百万)输入序列的repl中,我得到了一些合理的加速百分比,但是却没有一个数量级.与往常一样,您的里程可能会有所不同.

I get some reasonable percentage speedups within the repl with very large (> million) input sequences but nothing like an order of magnitude. Your mileage, as always, may vary.

这篇关于为什么F#的Seq.sortBy比LINQ的IEnumerable&lt; OrderBy扩展方法要慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆