高效的算法来寻找添加和移除从2集 [英] Efficient algorithm to find additions and removals from 2 collections

查看:113
本文介绍了高效的算法来寻找添加和移除从2集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好我想实现一个高效的算法处理以下情况:

让我们假设我们有2列出了包含下列元素:

来源:[A,B,C,D,E] 新:[D,E,F,G]

现在我需要更新源使用新的信息。该算法应该能够发现,F和G是新的项目,即A,B和C已被删除,而D和E都没有被修改。

涉及的操作是源和新建,反之亦然之间设置相交操作。我在寻找一个有效的算法在C#来实现对任意非排序枚举。

在此先感谢,

解决方案

  VAR添加= New.Except(来源);
VAR删除= Source.Except(新);
VAR notModified = Source.Intersect(新);
 

如果你想有你在哪里展现你运作的方法,我建议把他们每进HashSets,作为能够快速包含检查,与其他枚举相比。

编辑:

好吧,如果我们想要总速度的EX pression效率的成本,然后用以下假设:

  1. 我们有一个合理的哈希能够类型的项目(如果没有,但可以绝对排序,那么排序列表可能击败哈希集)。
  2. 我们不能predict来源或新是否会更大(在本例中,有围绕这样做的另一种方法我怎么有这样的一个微弱的优势,但我猜想这只是偶然的数据和我们有相同的可能性的期望每个。

那么我会建议:

 的HashSet< T>删除=来源为HashSet的< T> ?新的HashSet< T>(来源);
链表< T>添加=新的LinkedList< T>();
链表< T> notModified =新的LinkedList< T>();
的foreach(在新电讯项)
    如果(removed.Remove(项目))
        notModified.AddLast(项目);
    其他
        added.AddLast(项目);
 

在建立删除我测试,如果它已经HashSet的,以避免新的浪费的建筑(我假设输入的类型为 IEnumerable的< T> )。当然,这是一种破坏性的行为,所以我们可能希望避免也无妨。

另请注意,我修改HashSet的同时,通过它枚举。这是由HashSet的允许,而是由调查员提供的担保外,因此是实现依赖。不过,随着目前的框架implement执行。它是更有效的这样做比试验,并加入到一个不同的除去集合

我去链接列出了另外两个收藏品,因为它们往往也站出来的插入成本(不只是O(1),但快速O(1)相比,使用另一套)。<条款/ P>

现在,如果你想进一步还是去,那里很可能微的优化在执行现有散集,如果你滚你自己的。

Hi I would like to implement an efficient algorithm to handle the following case:

Lets assume we have 2 lists with the following elements:

Source: [a,b,c,d,e] New: [d,e,f,g]

Now I have to update source with the new information. The algorithm should be able to find that 'f' and 'g' are new entries, that 'a', 'b' and 'c' has been removed and that 'd' and 'e' have not being modified.

The operations involved are set-intersect operations between Source and New, and viceversa. I am looking for an efficient algorithm to implement in C# for arbitrary non-sorted enumerations.

Thanks in advance,

解决方案

var added = New.Except(Source);
var removed = Source.Except(New);
var notModified = Source.Intersect(New);

If you want to have an approach where you "show your workings", I'd suggest putting them each into HashSets, as that allows for a fast Contains check, compared with other enumerations.

Edit:

Okay, if we're going for total speed at the cost of efficiency of expression, then with the following assumptions:

  1. We have a reasonably hash-able type of item (if not, but they can be absolutely sorted, then a SortedList might beat a hash-set).
  2. We cannot predict whether Source or New will be larger (in the example, there's a slight advantage of doing this the other way around to how I have this, but I'm assuming that is just by chance in the data and that we have to expect each with equal likelihood.

Then I would suggest:

HashSet<T> removed = Source as HashSet<T> ?? new HashSet<T>(Source);
LinkedList<T> added = new LinkedList<T>();
LinkedList<T> notModified = new LinkedList<T>();
foreach(T item in New)
    if(removed.Remove(item))
        notModified.AddLast(item);
    else
        added.AddLast(item);

In setting up removed I test if it's already a hashset to avoid a wasteful building of a new one (I assume the input is typed as IEnumerable<T>). Of course, this is a destructive action so we may wish to avoid it anyway.

Note also that I modify the hashset while enumerating through it. This is allowed by hashset, but outside of the guarantees given by the enumerators, so is implementation-depended. Still, with the current framework impl. it's more efficient to do so than test and add to a different removed collection.

I went for linked-lists for the two other collections, as they tend to come out well in terms of insertion cost (not just O(1), but a fast O(1) compared to using another set).

Now, if you want to go further still, there're probably micro-optimisations available in the implementation of hash-set if you roll your own.

这篇关于高效的算法来寻找添加和移除从2集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆