检查数组是否重复,只返回出现不止一次的项 [英] Check array for duplicates, return only items which appear more than once

查看:110
本文介绍了检查数组是否重复,只返回出现不止一次的项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一封电子邮件的文本文件,例如

  Google12@gmail.com,
MyUSERNAME @ me。 com,
ME@you.com,
ratonabat@co.co,
iamcool@asd.com,
ratonabat@co.co,

我需要检查所述文档的重复项,并从中创建一个独特的数组(所以如果ratonabat@co.co出现500次新数组他只会出现一次。)



编辑:
例如:

  username1@hotmail.com 
username2@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1 @ hotmail。 com
username1@hotmail.com

这是我的数据(在数组中或者文本文档,我可以处理)



我想要看到是否有重复,并将重复的ONCE移动到另一个数组。所以输出将是

  username1@hotmail.com 


解决方案

你可以简单地使用Linq的 Distinct 扩展方法:

  var input = new string [] {...}; 
var output = input.Distinct()。ToArray();

您可能还需要考虑重构代码以使用 HashSet< string> 而不是一个简单的数组,因为它将优雅地处理重复。






要获取一个只包含那些重复的记录的数组,它有一个小小的moe复杂,但你仍然可以有一点Linq:

  var output = input.GroupBy(x => x)
.Where(g => g.Skip(1).Any())
.Select(g => g.Key)
.ToArray();

说明:




  • .GroupBy 将相同的字符串组合在一起

  • 。按以下条件分组

    • .Skip(1).Any()如果有2个或更多项目,返回true在组中。这相当于 .Count()> 1 ,但它稍微更有效率,因为它找到第二个项目后停止计数。


  • 。选择返回仅由单个字符串组成的集合(而不是组)

  • .ToArray 将结果集转换为数组。






自定义扩展方法

  public static class MyExtensions 
{
public static IEnumerable< T>副本< T>(该IEnumerable< T>)输入
{
var a = new HashSet T();
var b = new HashSet< T>();
foreach(var x in input)
{
if(!a.Add(x)&& b.Add(x))
yield return x;
}
}
}

然后你可以称之为方法如下:

  var output = input.Duplicates()。ToArray(); 

我没有对此进行基准测试,但它应该比以前的方法更有效。 >

I have an text document of emails such as

Google12@gmail.com,
MyUSERNAME@me.com,
ME@you.com,
ratonabat@co.co,
iamcool@asd.com,
ratonabat@co.co,

I need to check said document for duplicates and create a unique array from that (so if "ratonabat@co.co" appears 500 times in the new array he'll only appear once.)

Edit: For an example:

username1@hotmail.com
username2@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1@hotmail.com

This is my "data" (either in an array or text document, I can handle that)

I want to be able to see if there's a duplicate in that, and move the duplicate ONCE to another array. So the output would be

username1@hotmail.com

解决方案

You can simply use Linq's Distinct extension method:

var input = new string[] { ... };
var output = input.Distinct().ToArray();

You may also want to consider refactoring your code to use a HashSet<string> instead of a simple array, as it will gracefully handle duplicates.


To get an array containing only those records which are duplicates, it's a little moe complex, but you can still do it with a little Linq:

var output = input.GroupBy(x => x)
                  .Where(g => g.Skip(1).Any())
                  .Select(g => g.Key)
                  .ToArray();

Explanation:

  • .GroupBy group identical strings together
  • .Where filter the groups by the following criteria
    • .Skip(1).Any() return true if there are 2 or more items in the group. This is equivalent to .Count() > 1, but it's slightly more efficient because it stops counting after it finds a second item.
  • .Select return a set consisting only of a single string (rather than the group)
  • .ToArray convert the result set to an array.

Here's another solution using a custom extension method:

public static class MyExtensions
{
    public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
    {
        var a = new HashSet<T>();
        var b = new HashSet<T>();
        foreach(var x in input)
        {
            if (!a.Add(x) && b.Add(x))
                yield return x;
        }
    }
}

And then you can call this method like this:

var output = input.Duplicates().ToArray();

I haven't benchmarked this, but it should be more efficient than the previous method.

这篇关于检查数组是否重复,只返回出现不止一次的项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆