检查数组是否重复,只返回出现不止一次的项 [英] Check array for duplicates, return only items which appear more than once
问题描述
我有一封电子邮件的文本文件,例如
Google12@gmail.com,
MyUSERNAME @ me。 com,
ME@you.com,
ratonabat@co.co,
iamcool@asd.com,
ratonabat@co.co,
我需要检查所述文档的重复项,并从中创建一个独特的数组(所以如果ratonabat@co.co出现500次新数组他只会出现一次。)
编辑:
例如:
username1@hotmail.com
username2@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1 @ hotmail。 com
username1@hotmail.com
这是我的数据(在数组中或者文本文档,我可以处理)
我想要看到是否有重复,并将重复的ONCE移动到另一个数组。所以输出将是
username1@hotmail.com
你可以简单地使用Linq的 Distinct
扩展方法:
var input = new string [] {...};
var output = input.Distinct()。ToArray();
您可能还需要考虑重构代码以使用 HashSet< string>
而不是一个简单的数组,因为它将优雅地处理重复。
要获取一个只包含那些重复的记录的数组,它有一个小小的moe复杂,但你仍然可以有一点Linq:
var output = input.GroupBy(x => x)
.Where(g => g.Skip(1).Any())
.Select(g => g.Key)
.ToArray();
说明:
-
.GroupBy
将相同的字符串组合在一起 -
。按以下条件分组
-
.Skip(1).Any()
如果有2个或更多项目,返回true在组中。这相当于.Count()> 1
,但它稍微更有效率,因为它找到第二个项目后停止计数。
-
-
。选择
返回仅由单个字符串组成的集合(而不是组) -
.ToArray
将结果集转换为数组。
自定义扩展方法:
public static class MyExtensions
{
public static IEnumerable< T>副本< T>(该IEnumerable< T>)输入
{
var a = new HashSet T();
var b = new HashSet< T>();
foreach(var x in input)
{
if(!a.Add(x)&& b.Add(x))
yield return x;
}
}
}
然后你可以称之为方法如下:
var output = input.Duplicates()。ToArray();
我没有对此进行基准测试,但它应该比以前的方法更有效。 >
I have an text document of emails such as
Google12@gmail.com,
MyUSERNAME@me.com,
ME@you.com,
ratonabat@co.co,
iamcool@asd.com,
ratonabat@co.co,
I need to check said document for duplicates and create a unique array from that (so if "ratonabat@co.co" appears 500 times in the new array he'll only appear once.)
Edit: For an example:
username1@hotmail.com
username2@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1@hotmail.com
This is my "data" (either in an array or text document, I can handle that)
I want to be able to see if there's a duplicate in that, and move the duplicate ONCE to another array. So the output would be
username1@hotmail.com
You can simply use Linq's Distinct
extension method:
var input = new string[] { ... };
var output = input.Distinct().ToArray();
You may also want to consider refactoring your code to use a HashSet<string>
instead of a simple array, as it will gracefully handle duplicates.
To get an array containing only those records which are duplicates, it's a little moe complex, but you can still do it with a little Linq:
var output = input.GroupBy(x => x)
.Where(g => g.Skip(1).Any())
.Select(g => g.Key)
.ToArray();
Explanation:
.GroupBy
group identical strings together.Where
filter the groups by the following criteria.Skip(1).Any()
return true if there are 2 or more items in the group. This is equivalent to.Count() > 1
, but it's slightly more efficient because it stops counting after it finds a second item.
.Select
return a set consisting only of a single string (rather than the group).ToArray
convert the result set to an array.
Here's another solution using a custom extension method:
public static class MyExtensions
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
{
var a = new HashSet<T>();
var b = new HashSet<T>();
foreach(var x in input)
{
if (!a.Add(x) && b.Add(x))
yield return x;
}
}
}
And then you can call this method like this:
var output = input.Duplicates().ToArray();
I haven't benchmarked this, but it should be more efficient than the previous method.
这篇关于检查数组是否重复,只返回出现不止一次的项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!