合并具有公共字段的列表的最快方法? [英] Fastest way to merge lists that have a common field?
问题描述
我正在学习F#,并且正在做赔率比较服务(ala www.bestbetting.com),以将pu理论付诸实践. 到目前为止,我具有以下数据结构:
I am learning F# and I'm doing and odds comparison service (ala www.bestbetting.com) to pu theory into practice. So far I have the following structures of data:
type price = { Bookie : string; Odds : float32; }
type selection = {
Prices : list<price>;
Name : string;
}
type event = { Name : string; Hour : DateTime; Sport : string; Selections : list<selection>; }
因此,我有几个来自不同来源的事件".而且,我需要一种非常快速的方式来将具有相同名称和小时的事件合并,然后将具有相同名称的不同选择的价格合并.
So, I have several of these "Events" coming from several sources. And I would need a really fast way of merging events with the same Name and Hour, and once that is done merge the prices of its different selections that have the same Name.
我已经考虑过要获取第一个列表,然后对其他列表进行一个一个的搜索,当指定的字段匹配时,将返回一个包含两个列表合并的新列表.
I've thought about getting the first list and then do a one-by-one search on the other lists and when the specified field matches return a new list containing both lists merged.
我想知道这样做是否有更快的方法,因为性能很重要.我已经看过了这个合并多个数据列表通过F#中的通用ID一起使用. ...尽管这很有用,但我仍在寻求最佳的性能解决方案.也许使用了不是列表的任何其他结构或将它们合并的另一种方法……因此,任何建议都将不胜感激.
I'd like to know if there's a faster way of doing this as performance would be important. I have already seen this Merge multiple lists of data together by common ID in F# ... And although that was helpful, I am asking for the best performance-wise solution. Maybe using any other structure that it's not a list or another way of merging them... so any advice would be greatly appreciated.
谢谢!
推荐答案
正如丹尼尔(Daniel)在评论中提到的那样,关键问题是,与基于标准Seq.groupBy
函数的解决方案相比,性能需要改善多少?如果您要处理大量数据,那么为此目的实际上使用一些数据库可能会更容易.
As Daniel mentioned in the comment, the key question is, how much better does the performance need to be compared to a solution based on standard Seq.groupBy
function? If you have a lot of data to process, then it may be actually easier to use some database for this purpose.
如果只需要快约1.7倍(或者可能更多,取决于内核数:-),则可以尝试使用基于并行LINQ的并行版本替换Seq.groupBy
,该并行版本在F#PowerPack中可用.使用PSeq.groupBy
(和其他PSeq
函数),您可以编写如下内容:
If you only need something ~1.7 times faster (or possibly more, depending on the number of cores :-)), then you can try replacing Seq.groupBy
with parallel version based on Parallel LINQ that is available in F# PowerPack. Using PSeq.groupBy
(and other PSeq
functions), you can write something like this:
#r "FSharp.PowerPack.Parallel.Seq.dll"
open Microsoft.FSharp.Collections
// Takes a collection of events and merges prices of events with the same name/hour
let mergeEvents (events:seq<event>) =
events
|> PSeq.groupBy (fun evt -> evt.Name, evt.Hour)
|> PSeq.map (fun ((name, hour), events) ->
// Merge prices of all events in the group with the same Selections.Name
let selections =
events
|> PSeq.collect (fun evt -> evt.Selections)
|> PSeq.groupBy (fun sel -> sel.Name)
|> PSeq.map (fun (name, sels) ->
{ Name = name
Prices = sels |> Seq.collect (fun s -> s.Prices) |> List.ofSeq } )
|> PSeq.toList
// Build new Event as the result - since we're grouping just using
// name & hour, I'm using the first available 'Sport' value
// (which may not make sense)
{ Name = name
Hour = hour
Sport = (Seq.head events).Sport
Selections = selections })
|> PSeq.toList
我没有测试该版本的性能,但是我认为它应该更快.您也不需要引用整个程序集-您可以从 PowerPack源代码.上次我检查时,将功能标记为inline
时,性能会更好,而在当前源代码中情况并非如此,因此您可能也要检查一下.
I didn't test the performance of this version, but I believe it should be faster. You also don't need to reference the entire assembly - you can just copy source for the few relevant functions from PowerPack source code. Last time I checked, the performance was better when the functions were marked as inline
, which is not the case in the current source code, so you may want to check that too.
这篇关于合并具有公共字段的列表的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!