快速的方法来检查,如果IEnumerable的< T>不包含重复(=分明) [英] Fast way to check if IEnumerable<T> contains no duplicates (= is distinct)

查看:221
本文介绍了快速的方法来检查,如果IEnumerable的< T>不包含重复(=分明)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有的快速的内置的方式来检查,如果一个的IEnumerable<串> 只包含不同的字符串



在开始的时候我开始:

  VAR enumAsArray = enum.ToArray(); 
如果(enumAsArray.Length!= enumAsArray.Distinct()。COUNT())
扔...

不过,这看起来是O(2N) - 是什么呢? ToArray的()可能是O(1)



这看起来更快:

  VAR集=新的HashSet<串GT;(); 
的foreach(在枚举变量STR)
{
如果(!set.Add(STR))
扔...
}

这应该是为O(n),然而,有一个内置的方法呢?



编辑:也许鲜明的()使用这个内部






解决方案:
考虑所有意见,并回答后,我写了我的第二个解决方案的扩展方法,因为这似乎是最快的版本,最可读太:

 公共静态布尔ContainsDuplicates< T>(这个IEnumerable的< T> E)
{
VAR集=新的HashSet< T>();
// ReSharper的禁用LoopCanBeConvertedToQuery
的foreach(在电子VAR项)
// ReSharper的恢复LoopCanBeConvertedToQuery
{
如果(!set.Add(项目))
返回真;
}
返回FALSE;
}


解决方案

您第二个代码示例是短暂的,简单,清晰有效,如果不是完全完美的理想解决方案,显然是相当接近的。这似乎是一个完全可以接受的解决您的具体问题。



除非你显示特定解决方案的使用会导致性能问题你已经注意到问题后进行性能测试,我会离开它当作是。考虑到我的房间怎么一点一般可以看到改进的地方,这似乎不太可能。这不是试图找到一些短或更简洁的将是值得你花时间和精力足够长的或复杂的解决方案。



在短,几乎在你的代码花时间肯定更好的地方;你已经什么是好的



要回答您的具体问题:




  1. 不过,这看起来是O(2N) - 是什么呢?



    是的,是的。


  2. ToArray的()可能是O(1)?



    没有,不是这样的。


  3. 也许鲜明的()使用这个内部?



    这确实使用 HashSet的,它看起来很相似,但简单地忽略重复的项目;它不提供任何指示,即它刚通过重复项目的呼叫者。其结果是,需要迭代整个序列两次,看它是否移除任何东西,而不是当遇到第一个重复时停止。这一点是始终迭代完整序列两次,一些可能迭代一次完整序列之间的差异,但可短路,当它已确保一个答案立即停止


  4. 有一个内置的方法呢?



    嗯,你表现之一,它只是效率不高。我认为没有整个基于LINQ解决方案,有效地为你显示什么。我能想到的最好的是: data.Except(数据)。任何()。这是一个有点比你的独特相比,经常算好第二次迭代可以短路(但不是第一),但它也遍历序列两次,仍是比你的非LINQ的解决方案更糟糕,所以它仍然不是值得使用。



Is there a fast built-in way to check if an IEnumerable<string> contains only distinct strings?

In the beginning I started with:

var enumAsArray = enum.ToArray();
if(enumAsArray.Length != enumAsArray.Distinct().Count())
    throw ...

However, this looks like it is O(2n) - is it? ToArray() might be O(1)?

This looks faster:

var set = new HashSet<string>();
foreach(var str in enum)
{
    if (!set.Add(str))
        throw ...
}

This should be O(n), however, is there a built-in way too?

Edit: Maybe Distinct() uses this internally?


Solution: After considering all the comments and the answer, I wrote an extension method for my second solution, as this seems to be the fastest version and the most readable too:

    public static bool ContainsDuplicates<T>(this IEnumerable<T> e)
    {
        var set = new HashSet<T>();
        // ReSharper disable LoopCanBeConvertedToQuery
        foreach (var item in e)
        // ReSharper restore LoopCanBeConvertedToQuery
        {
            if (!set.Add(item))
                return true;
        }
        return false;
    }

解决方案

Your second code sample is short, simple, clearly effective, and if not the completely perfect ideal solution, is clearly rather close to it. It seems like a perfectly acceptable solution to your particular problems.

Unless your use of that particular solution is shown to cause performance problems after you've noticed issues and done performance testing, I'd leave it as is. Given how little room I can see for improvement in general, that doesn't seem likely. It's not a sufficiently lengthy or complex solution that trying to find something "shorter" or more concise is going to be worth your time and effort.

In short, there are almost certainly better places in your code to spend your time; what you have already is fine.

To answer your specific questions:

  1. However, this looks like it is O(2n) - is it?

    Yes, it is.

  2. ToArray() might be O(1)?

    No, it's not.

  3. Maybe Distinct() uses this internally?

    It does use a HashSet, and it looks pretty similar, but it simply ignores duplicate items; it doesn't provide any indication to the caller that it has just passed a duplicate item. As a result, you need to iterate the whole sequence twice to see if it removed anything, rather than stopping when the first duplicate is encountered. This is the difference between something that always iterates the full sequence twice and something that might iterate the full sequence once, but can short circuit and stop as soon as it has ensured an answer.

  4. is there a built-in way too?

    Well, you showed one, it's just not as efficient. I can think of no entire LINQ based solution as efficient as what you showed. The best I can think of would be: data.Except(data).Any(). This is a bit better than your distinct compared to the regular count in that the second iteration can short circuit (but not the first) but it also iterates the sequence twice, and still is worse than your non-LINQ solution, so it's still not worth using.

这篇关于快速的方法来检查,如果IEnumerable的&LT; T&GT;不包含重复(=分明)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆