合并具有公共字段的列表的最快方法? [英] Fastest way to merge lists that have a common field?

查看:57
本文介绍了合并具有公共字段的列表的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习F#,并且正在做赔率比较服务(ala www.bestbetting.com),以将pu理论付诸实践. 到目前为止,我具有以下数据结构:

I am learning F# and I'm doing and odds comparison service (ala www.bestbetting.com) to pu theory into practice. So far I have the following structures of data:

type price = {    Bookie : string;    Odds : float32;    }

type selection = {
    Prices : list<price>;
    Name : string;
    }

type event = {    Name : string;    Hour : DateTime;    Sport : string;    Selections : list<selection>;    }

因此,我有几个来自不同来源的事件".而且,我需要一种非常快速的方式来将具有相同名称和小时的事件合并,然后将具有相同名称的不同选择的价格合并.

So, I have several of these "Events" coming from several sources. And I would need a really fast way of merging events with the same Name and Hour, and once that is done merge the prices of its different selections that have the same Name.

我已经考虑过要获取第一个列表,然后对其他列表进行一个一个的搜索,当指定的字段匹配时,将返回一个包含两个列表合并的新列表.

I've thought about getting the first list and then do a one-by-one search on the other lists and when the specified field matches return a new list containing both lists merged.

我想知道这样做是否有更快的方法,因为性能很重要.我已经看过了这个合并多个数据列表通过F#中的通用ID一起使用. ...尽管这很有用,但我仍在寻求最佳的性能解决方案.也许使用了不是列表的任何其他结构或将它们合并的另一种方法……因此,任何建议都将不胜感激.

I'd like to know if there's a faster way of doing this as performance would be important. I have already seen this Merge multiple lists of data together by common ID in F# ... And although that was helpful, I am asking for the best performance-wise solution. Maybe using any other structure that it's not a list or another way of merging them... so any advice would be greatly appreciated.

谢谢!

推荐答案

正如丹尼尔(Daniel)在评论中提到的那样,关键问题是,与基于标准Seq.groupBy函数的解决方案相比,性能需要改善多少?如果您要处理大量数据,那么为此目的实际上使用一些数据库可能会更容易.

As Daniel mentioned in the comment, the key question is, how much better does the performance need to be compared to a solution based on standard Seq.groupBy function? If you have a lot of data to process, then it may be actually easier to use some database for this purpose.

如果只需要快约1.7倍(或者可能更多,取决于内核数:-),则可以尝试使用基于并行LINQ的并行版本替换Seq.groupBy,该并行版本在F#PowerPack中可用.使用PSeq.groupBy(和其他PSeq函数),您可以编写如下内容:

If you only need something ~1.7 times faster (or possibly more, depending on the number of cores :-)), then you can try replacing Seq.groupBy with parallel version based on Parallel LINQ that is available in F# PowerPack. Using PSeq.groupBy (and other PSeq functions), you can write something like this:

#r "FSharp.PowerPack.Parallel.Seq.dll"
open Microsoft.FSharp.Collections

// Takes a collection of events and merges prices of events with the same name/hour
let mergeEvents (events:seq<event>) = 
  events 
  |> PSeq.groupBy (fun evt -> evt.Name, evt.Hour)
  |> PSeq.map (fun ((name, hour), events) ->
      // Merge prices of all events in the group with the same Selections.Name
      let selections = 
        events 
        |> PSeq.collect (fun evt -> evt.Selections)
        |> PSeq.groupBy (fun sel -> sel.Name)
        |> PSeq.map (fun (name, sels) ->
            { Name = name
              Prices = sels |> Seq.collect (fun s -> s.Prices) |> List.ofSeq } )
        |> PSeq.toList
      // Build new Event as the result - since we're grouping just using 
      // name & hour, I'm using the first available 'Sport' value 
      // (which may not make sense)
      { Name = name
        Hour = hour
        Sport = (Seq.head events).Sport
        Selections = selections })   
  |> PSeq.toList

我没有测试该版本的性能,但是我认为它应该更快.您也不需要引用整个程序集-您可以从 PowerPack源代码.上次我检查时,将功能标记为inline时,性能会更好,而在当前源代码中情况并非如此,因此您可能也要检查一下.

I didn't test the performance of this version, but I believe it should be faster. You also don't need to reference the entire assembly - you can just copy source for the few relevant functions from PowerPack source code. Last time I checked, the performance was better when the functions were marked as inline, which is not the case in the current source code, so you may want to check that too.

这篇关于合并具有公共字段的列表的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆