为什么我的Trie查找比标准F#Map的慢? [英] Why is my Trie lookup slower than that of the standard F# Map's?

查看:124
本文介绍了为什么我的Trie查找比标准F#Map的慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我刚从OCaml移植了Trie。不幸的是,它比tryFind的标准Map运行速度慢。我不明白这个 - 这个特技似乎应该更快。 F#的代码库是以一些特殊的方式构建的,使它们比用户通常部署的代码更快吗?

So, I just ported the Trie from OCaml. Unfortunately, it runs slower than the standard Map in terms of tryFind. I don't understand this - the trie seems like it should be faster. Is F#'s code libraries built in some special way as to make them faster than the code that the user typically deploys?

这里是代码 -

[<RequireQualifiedAccess>]
module Trie

type Node<'k, 'v when 'k : comparison> =
    { TrieMap : Map<'k, Node<'k, 'v>>
      TrieKvp : ('k list * 'v) option }
    member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty

let inline make map kvp =
    { TrieMap = map
      TrieKvp = kvp }

let inline makeEmpty () : Node<'k, 'v> = make Map.empty None

let inline isEmpty (node : Node<'k, 'v>) = node.IsEmpty

let rec tryFind (key : 'k list) node =
    if key.IsEmpty then
        match node.TrieKvp with
        | Some (_, value) -> Some value
        | None -> None
    else
        let keyHead = key.Head
        let keyTail = key.Tail
        let optSubNode = Map.tryFind keyHead node.TrieMap
        match optSubNode with
        | Some subNode -> tryFind keyTail subNode
        | None -> None

let inline containsKey key node =
    (tryFind key node).IsSome

let rec addInternal (key : 'k list) value node =
    if key.IsEmpty then make node.TrieMap (Some (key, value))
    else
        let keyHead = key.Head
        let keyTail = key.Tail
        let newTrie =
            match Map.tryFind keyHead node.TrieMap with
            | Some subTrie -> subTrie
            | None -> makeEmpty ()
        let newTrie2 = addInternal keyTail value newTrie
        make (Map.add keyHead newTrie2 node.TrieMap) node.TrieKvp

let inline add key value node =
    addInternal key value node

let rec addMany kvps node =
    if Seq.isEmpty kvps then node
    else
        let kvpHead = Seq.head kvps
        let kvpTail = Seq.skip 1 kvps
        let newTrie = add (fst kvpHead) (snd kvpHead) node
        addMany kvpTail newTrie

let inline ofList kvps =
    addMany kvps (makeEmpty ())

let inline ofListBy by kvps =
    let pairs = List.map by kvps
    ofList pairs

let rec foldInternal folder rev node state =
    match node.TrieKvp with
    | Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value
    | None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap

let inline fold folder state node =
    foldInternal folder [] node state

let rec map (mapper : 'k list -> 'v -> 'a) (node : Node<'k, 'v>) : Node<'k, 'a> =
    match node.TrieKvp with
    | Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value))
    | None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None

let inline toValueList node =
    fold (fun state _ value -> value :: state) [] node

let inline singleton (key, value) =
    add key value (makeEmpty ())

这是Jon Harrop提供的性能测试,我觉得足以衡量改进 -

let xs = Array.init 1000000 (fun i -> [i])
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Trie.makeEmpty()
for i=0 to xs.Length-1 do
    t <- Trie.add xs.[i] xs.[i] t
printfn "Trie took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
    for i=0 to xs.Length-1 do
        ignore(Trie.tryFind xs.[i])
printfn "Trie took %fs to search" timer.Elapsed.TotalSeconds

let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Map.empty
for i=0 to xs.Length-1 do
    t <- Map.add xs.[i] xs.[i] t
printfn "Map took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
    for i=0 to xs.Length-1 do
        ignore(Map.tryFind xs.[i])
printfn "Map took %fs to search" timer.Elapsed.TotalSeconds



<注意:如果考虑到更快的查找数据结构,请注意我需要持久的数据结构。

推荐答案


不幸的是,它比tryFind的标准地图运行速度慢。我不明白这一点 - 这个特技似乎应该更快。

Unfortunately, it runs slower than the standard Map in terms of tryFind. I don't understand this - the trie seems like it should be faster.

这里的一个快速基准表明你的特技已经更快了比地图至少简单的情况:

A quick benchmark here suggests that your trie is already faster than Map for at least simple case:

do
    let n = 0
    let xs = Array.init 1000000 (fun i -> [i])
    let timer = System.Diagnostics.Stopwatch.StartNew()
    let mutable t = Trie.makeEmpty()
    for i=0 to xs.Length-1 do
        t <- Trie.add xs.[i] xs.[i] t
    printfn "Trie took %fs to build" timer.Elapsed.TotalSeconds
    timer.Restart()
    for _ in 1..100 do
        for i=0 to xs.Length-1 do
            ignore(Trie.tryFind xs.[i])
    printfn "Trie took %fs to search" timer.Elapsed.TotalSeconds

    let timer = System.Diagnostics.Stopwatch.StartNew()
    let mutable t = Map.empty
    for i=0 to xs.Length-1 do
        t <- Map.add xs.[i] xs.[i] t
    printfn "Map took %fs to build" timer.Elapsed.TotalSeconds
    timer.Restart()
    for _ in 1..100 do
        for i=0 to xs.Length-1 do
            ignore(Map.tryFind xs.[i])
    printfn "Map took %fs to search" timer.Elapsed.TotalSeconds

我得到4s来建立你的Trie,8.7s建立一个地图和大约 0.7 来搜索在这两种情况下。

I get 4s to build your Trie, 8.7s to build a Map and about 0.7 to search in both cases.

但是,实现方面还有很大的改进空间。我最近写了一篇关于F#中优化的通用持久性哈希特技实现的文章,发布了这里

However, there is a lot of room for improvement in your implementation. I recently wrote an article about an optimized generic persistent hash trie implementation in F# that was published here.

您稍后的评论意味着您只想使用它来映射字符串。如果是这样,专门为您提供字符串键的线索会更有效。

Your later comments imply that you only want to use this to map over strings. If so, it would be vastly more efficient to specialize your trie for string keys.

编辑

KVB建议我详细说明改进的余地,所以这里有一些反馈意见:

KVB suggested that I elaborate on the "room for improvement" so here's some feedback:


  • 使用 inline 作为优化,只有在引人注目的绩效测量的基础上。

  • 使 a值而不是一个函数。

  • 避免 List.head List.tail 尽可能的使用模式匹配。

  • 尽可能避免一般性(例如在这种情况下用于字符串键的硬编码)。

  • Use inline sparingly as an optimization and only on the basis of compelling performance measurements.
  • Make empty a value rather than a function.
  • Avoid List.head and List.tail whenever possible. Use pattern matching instead.
  • Avoid genericity when possible (e.g. hard-code for string keys in this case).

这篇关于为什么我的Trie查找比标准F#Map的慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆