创建3个IEnumerable的并集时,实现O(n)性能的最简单方法是什么? [英] What is the simplest way to achieve O(n) performance when creating the union of 3 IEnumerables?

查看:36
本文介绍了创建3个IEnumerable的并集时,实现O(n)性能的最简单方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说a,b,c都是List<t>,我想创建它们的未分类联合.尽管性能并不是至关重要的,但它们可能每个都有10,000个条目,因此我渴望避免使用O(n ^ 2)解决方案.

Say a, b, c are all List<t> and I want to create an unsorted union of them. Although performance isn't super-critical, they might have 10,000 entries in each so I'm keen to avoid O(n^2) solutions.

就不同类型而言,AFAICT的MSDN文档没有说明联合的性能特征.

AFAICT the MSDN documentation doesn't say anything about the performance characteristics of union as far as the different types are concerned.

我的直觉说,如果我只是做a.Union(b).Union(c),这将花费O(n ^ 2)时间,但是new Hashset<t>(a).Union(b).Union(c)将是O(n).

My gut instinct says that if I just do a.Union(b).Union(c), this will take O(n^2) time, but new Hashset<t>(a).Union(b).Union(c) would be O(n).

有人有任何文件或度量标准来确认或否认这一假设吗?

Does anyone have any documentation or metrics to confirm or deny this assumption?

推荐答案

您应该使用Enumerable.Union,因为它与HashSet方法一样有效.复杂度为O(n + m),因为:

You should use Enumerable.Union because it is as efficient as the HashSet approach. Complexity is O(n+m) because:

Enumerable.Union

枚举此方法返回的对象时,Union<TSource> e 按该顺序依次编号第一和第二,并得出每个 尚未产生.

When the object returned by this method is enumerated, Union<TSource> enumerates first and second in that order and yields each element that has not already been yielded.

源代码此处.

Ivan 是正确的,如果您将Enumerable.Union与多个集合一起使用,则会产生开销,因为必须使用新集合为每个链接的呼叫创建.因此,如果使用以下方法之一,则可能会更高效(就内存消耗而言):

Ivan is right, there is an overhead if you use Enumerable.Union with multiple collections since a new set must be created for every chained call. So it might be more efficient(in terms of memory consumption) if you use one of these approaches:

  1. Concat + Distinct:

a.Concat(b).Concat(c)...Concat(x).Distinct()

  • Union + Concat

  • Union + Concat

    a.Union(b.Concat(c)...Concat(x))
    

  • HashSet<T>构造函数需要IEnumerable<T>(带有int的fe):

  • HashSet<T> constructor that takes IEnumerable<T>(f.e. with int):

    new HashSet<int>(a.Concat(b).Concat(c)...Concat(x))
    

  • 前两者之间的差异可以忽略不计.第三种方法不使用延迟执行,而是在内存中创建HashSet<>.这是一种好而有效的方法:1.如果需要此集合类型,或者2.如果这是查询的最终操作.但是,如果您需要对该链式查询进行进一步的操作,则应该选择Concat + DistinctUnion + Concat.

    The difference between the first two might be negligible. The third approach is not using deferred execution, it creates a HashSet<> in memory. It's a good and efficient way 1. if you need this collection type or 2. if this is the final operation on the query. But if you need to to further operations on this chained query you should prefer either Concat + Distinct or Union + Concat.

    这篇关于创建3个IEnumerable的并集时,实现O(n)性能的最简单方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆